An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, TAIWAN 300 ylin@cs.nthu.edu.tw 2006/08/16 MPSOC Colorado, USA
Main Points • Hardwired design has excellent area, performance, power advantages • If it is to be used by 1B people everyday, every bit and every cycle count • It is not difficult – 15 CS student-years, no background in video nor HDL-based design; neither is the professor – RTL design is the easy part; – Understanding algorithm and designing architecture are most critical YLLIN NTHU-CS 2
Video Coding Standards Standard MPEG-1 MPEG-2 MPEG-4 H.264 16*16(frame) MB size 16*16 16*16 16*16 16*16, 16*8, 8*16, 8*8, 8*4, 4*8, Block size 8*8 8*8 16*16, 8*8 4*4 Transform DCT DCT DCT/ Wavelet 4*4 int transform VLC, CAVLC and CABAC Entropy coding VLC VLC VLC ME, MC Yes Yes Yes 41 MVs per MB ½ pel ½ pel ¼ pel ¼ pel Pixel accuracy Reference frames One frame One frame One frame Multiple (5) frames Picture type I, P, B I, P, B I, P, B I, P, B Transmission rate Up to 2-15 Mbps 64kbps~2Mbps 64kbps ~ 150Mbps 1.5 Mbps YLLIN NTHU-CS 3
Get More for Less H.264 MPEG-2 YLLIN NTHU-CS 4
H.264/AVC Profiles Extended profile SP, SI B slice Interlace Main slice profile Data Weighted CABAC partition prediction FREext Slice (High) I slice group 8x8 profile transform ASO P slice Quantization Baseline matrix profile Redundant CAVLC Color 8/10/12 bit Slice Sampling sampling YLLIN NTHU-CS 5
NTHU H.264/AVC Main Profile Video Decoder Prototype Multimedia SOC Platform FPGA @ 10MHz Main Profile CIF(352x288)@ 30 fps FPGA @ 24MHz Main Profile D1 (720x480)@30fps YLLIN NTHU-CS 6
A Multimedia SOC Platform ROM/ Accelerator USB(PHY) CPU Flash Memory SDRAM Daughter Board (FPGA) SRAM FPGA Static VIC USB 2.0 SDRAM Controller(4-CH) memory High-Speed Bus JPEG APB Display DMA SRAM PWM WDT TIMER Capture Codec Bridge Controller Peripheral Bus DAI SSI SD SM UART GPIO 12C Audio Codec Flash memory Video-In Button LED TV/LCD Flash Card I2S with SSI CCIR601 YLLIN NTHU-CS 7
H.264/AVC Decoder System Diagram UART TV Timer SD Card Slave Master Slave Slave ARM926EJS AHB1 Slave Slave Slave SDC SDRAM SDC H264 Slave Slave Master SDRAM LM AHB2 YLLIN NTHU-CS 8
H.264/AVC Decoder Architecture SD Card SDRAM Input/Ref./Display Storage Device CPU Display Frame AHB MV SRAM Ref idx Para Parser SRAM reconstruct SRAM SRAM MC Pred unfilter MBinfo Pic DF CABAD SRAM SRAM SRAM Rec Intra pred Coeff CAVLD SRAM DECODER Residual IQ/IDCT YLLIN NTHU-CS 9 SRAM
Hierarchical FSM in Main Controller Frame Level MB Level CABAC CABAC FSM MC MC FSM IPRED rden IPRED FSM Type rd_addr Main decoder IQ/IDCT FSM rd_data IQ/IDCT FSM PICREC PICREC FSM DF DF FSM Main controller YLLIN NTHU-CS 10
AMBA interface AHB A LM slave wrapper control register H.264 Decoder MFU VLC & TV OUT DF & MC arbiter 1 arbiter 2 SDC master wrapper 1 master wrapper 2 YLLIN NTHU-CS 11 AHB B
Our Design Flow Software spec. in C & Acceleration User Spec. specify SW lib. Platform spec. C models, drivers API System configuration System.h Embedded Software System Compilation Acceleration description Acceleration Software System image HW IP generation Synthesizer HW lib. HW/SW Co-Sim HDL IPs co-simulation Parameterized Accelerator.v System.v ISS System Area & Timing Integrate Evaluation & Power evaluation Platform Integration No model Performance constraint Yes FPGA Hardware Pin assignment & Hardware compilation prototyping image FPGA Verify YLLIN NTHU-CS 12
Memory Traffic Consideration SDRAM One SDRAM for All External Storage Encoded Bitstream Reference Frames SDRAM Burst Mode Currently Reconstructed Frame Display Buffer Internal Storage for Compact Access & Data Reuse YLLIN NTHU-CS 13
Buffer Size vs Bus Traffic 60 Frame per Sec 50 40 16MHz/TV 16MHz/LCD 30 24MHz/TV 20 24MHz/LCD 10 0 Buffer Size 1 3 5 7 9 11 13 15 17 19 21 YLLIN NTHU-CS 14
Performance Comparison DSP Core HW Accelerated Gate Count 230K 180K 180K MHz 200 10 24 Profile Baseline Main Main Resolution QCIF D1 CIF (352x288) (176x144) (720x480) Frame Rate 15 30 30 YLLIN NTHU-CS 15
Summary • An H.264/AVC main profile decoder on an ad hoc multimedia SOC platform • Hardware-accelerated approach is high- performance and energy-efficient • Memory traffic has major impact on performance • It is not as difficult as you may think; algorithm and architecture are critical; writing Verilog is no difference from writing C • Do not try to parallelize Reference Software; it is just proof of concept; not an implementation YLLIN NTHU-CS 16
Demo Video YLLIN NTHU-CS 17
Recommend
More recommend