FPGA Accelerated Seam Carving for Video A Design Overview B2: Kimberly Lim, Eshani Mishra, Shruti Narayan
Application Area Content-aware re-scaling intelligently targets parts of the frame to remove. - Reduced video size (users often run out of space) - Carve out unwanted pixels and save what’s important - Draw attention to important aspects of video - Highlight important aspects by removing unwanted seams - Video processing is often slow - FPGA for Acceleration
Application Area Naive implementation of restitching of seam-carved images shown on left (spatial only). Static seam carving on right uses temporal and spatial so less distortion. Computational complexity becomes the bottleneck of the implementation of the algorithm. A hardware-oriented seam carving algorithm using FPGA is proposed to improve performance.
Overview of MVP Monitor Camera input of 360x240 resolution video at 30 fps FPGA user input 5 seams at a time
Hardware Video Array Video Array Block Diagram: Data Transfer through Hardware D8M-GPIO DE10-Standard FPGA Camera ARM Processor Key Linux Hardware SDRAM Preprocessing and Video V i d e o I n p u t Array Formatting Script Video Array Software Seam Removal Script V i d e o O u t p u t Video Array Seam to Remove Newly FPGA Designed Cyclone V LE’s SDRAM M10K Embedded Memory Monitor Borrowed Blocks Seam Carving Frame by Frame Load Algorithm Bought
Algorithm Overview Stage 1 Stage 2 Stage 3
Memory Allocation in FPGA Stage 1 Stage 2 Stage 3 Spatial Energy Map Energy Map 5 Accumulation Path (75 blocks) (75 blocks) Copies (75 blocks each, 5 Temporal Energy Map Accumulation Paths 375 total) 5 (150 blocks) (75 blocks) 7 320 Accumulation Cell M Loading Frame Copies (120 blocks) (1 block each, 320 1 blocks) 0 Processing Frame K (120 blocks) 92 blocks left 87 blocks left 182 blocks left
Algorithm Implementation Stage 1 Stage 2 Stage 3
Metrics and Validation ● Compare against benchmark C++ Timing implementation of seam carving 01 Goal: 5x speedup ● ● PSNR, Spatio-temporal SSIM User testing ● Video Quality 02 ● Goal: <10% error compared to results from C++ implementation Remove less seams at a time (<5) ● ● Change blocking to optimize 03 Risk Factors/Unknowns parallelization ● Memory consumption + timing analysis for test matrices using Quartus
Benchmark Analysis (360x240 at 30 fps with 1.4 GHz Intel Core i5) Cycles Seams Time ● 30 vertical 132360837 ● 10.646925 ● seams cycles seconds
Eshani Work Distribution Quality Metrics, HPS to FPGA communication High-level Algorithm Design Hardware Software Benchmarking Implementation Design Kimberly Shruti
Schedule Ongoing and future tasks up until spring break. (Post spring break for slack time as well as adding planned extension steps)
Recommend
More recommend