hardware acceleration of hardware acceleration of
play

Hardware Acceleration of Hardware Acceleration of Graphics and - PowerPoint PPT Presentation

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging Algorithms Using FPGAs Algorithms Using FPGAs Pavel Zemk Pavel Zemk Department of Computer Graphics and Multimedia, Department of Computer


  1. Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging Algorithms Using FPGAs Algorithms Using FPGAs Pavel Zemčík Pavel Zemčík Department of Computer Graphics and Multimedia, Department of Computer Graphics and Multimedia, Faculty of Information Technology, Faculty of Information Technology, Brno University of Technology, Brno University of Technology, Czech Republic Czech Republic e-mail: zemcik@fit.vutbr.cz e-mail: zemcik@fit.vutbr.cz

  2. Overview Overview • History and current state of the art History and current state of the art • Possible future development Possible future development • FPGA capabilities and configuration, DSP FPGA capabilities and configuration, DSP • Design examples Design examples • Algorithm examples Algorithm examples • Conclusion Conclusion

  3. History and current History and current state of the art state of the art

  4. Historical development Historical development 1) Special hardware - “the only way” 1) Special hardware - “the only way” Graphics Computer device

  5. Historical development Historical development 2) Integration - “the cheaper way” 2) Integration - “the cheaper way” Graphics Computer RAM

  6. Historical development Historical development 3) On board memory - “the faster way” 3) On board memory - “the faster way” Graphics Computer Dedicated RAM

  7. Historical development Historical development 3) Acceleration - “the modern way” 3) Acceleration - “the modern way” Graphics Graphics Computer Dedicated pipeline RAM

  8. Historical development Historical development Bandwidth Bandwidth • ISA 8/16 bit - 5MB/s ISA 8/16 bit - 5MB/s • PCI 32 bit - 132MB/s PCI 32 bit - 132MB/s • AGP 64 bit - 512 MB/s AGP 64 bit - 512 MB/s

  9. Historical development Historical development Resolution Resolution • CGA - 320x240x4 colours CGA - 320x240x4 colours • VGA - 640x480x256 colours VGA - 640x480x256 colours • XGA, … 1600x1200 and more, 32-bit XGA, … 1600x1200 and more, 32-bit RGB RGB

  10. Current state of the art Current state of the art Typical configuration Typical configuration • AGP interface AGP interface • 64 MB RAM - 256 bit, >3GB/s 64 MB RAM - 256 bit, >3GB/s • Graphics pipeline (3D transformation, Graphics pipeline (3D transformation, clipping, shading, 2D/3D textures, clipping, shading, 2D/3D textures, partially programmable, etc.) partially programmable, etc.) • >100 000 polygons/s, >1G pixels/s >100 000 polygons/s, >1G pixels/s

  11. Current state of the art Current state of the art Pixel oriented architectures (Pixel Planes) Pixel oriented architectures (Pixel Planes) • Each pixel has its own “CPU” Each pixel has its own “CPU” • Extremely high pixel rates Extremely high pixel rates • Limited features in texturing etc. Limited features in texturing etc.

  12. Current state of the art Current state of the art Volume rendering architecture (VolumePro) Volume rendering architecture (VolumePro) • Very high voxel rate Very high voxel rate • Shadows and perspective projection Shadows and perspective projection (so far) not included (so far) not included

  13. Possible Possible future development future development

  14. Possible future development Possible future development Frequency cannot increase much Frequency cannot increase much • Frequency - light travels only 30cm/1ns Frequency - light travels only 30cm/1ns and electrical signal propagation is much and electrical signal propagation is much slower inside the chips slower inside the chips

  15. Possible future development Possible future development Parallelism and configurable devices Parallelism and configurable devices • Parallelism - semantic problems Parallelism - semantic problems • Device configuration - difficult to handle Device configuration - difficult to handle for “programmers” for “programmers”

  16. Possible future development Possible future development Questions Questions • What is the frequency limit? What is the frequency limit? • What is the limit in the bus width? What is the limit in the bus width? • Why would programmable logic be Why would programmable logic be implemented only in graphics implemented only in graphics accelerators and not in CPUs? accelerators and not in CPUs?

  17. FPGA capabilities FPGA capabilities and configuration, DSP and configuration, DSP

  18. FPGA features FPGA features General purpose versus application specific General purpose versus application specific • Current processors (and DSPs) are Current processors (and DSPs) are suitable for any algorithm but are not suitable for any algorithm but are not fast enough (e.g. For 2D or 3D data) fast enough (e.g. For 2D or 3D data) • Speed in processing can be achieved by Speed in processing can be achieved by hard-wired digital circuits but these are hard-wired digital circuits but these are considered application-specific considered application-specific

  19. 1 Processor Processor MUL • Sequential processing Sequential processing ADD – poor performance poor performance – easy to program easy to program • Fixed architecture Fixed architecture 2 MUL • Cheap Cheap ADD

  20. More Processors More Processors • Multiprocessing is difficult to handle Multiprocessing is difficult to handle algorithmically algorithmically • Memory throughput or communication Memory throughput or communication speed is the limiting factor speed is the limiting factor • Price and power efficiency poor Price and power efficiency poor

  21. More Processors More Processors 1 2 N MUL MUL MUL ADD ADD ADD • Parallel processing Parallel processing • Fixed architecture Fixed architecture • Programmable Programmable

  22. Hard-wired Hard-wired • Each circuit useable only for one task or Each circuit useable only for one task or very limited set of tasks very limited set of tasks • Maximum size complexity is the limiting Maximum size complexity is the limiting factor factor • Expensive to design Expensive to design

  23. Hard-wired Hard-wired N 1 3 2 MUL MUL MUL MUL ADD • Parallel processing Parallel processing • Fixed architecture Fixed architecture • Few functions Few functions • Not programmable Not programmable

  24. FPGAs FPGAs N 1 3 2 N 3 1 2 MUL MUL MUL MUL MUL MUL MUL MUL ADD ADD • Parallel processing Parallel processing • Flexible architecture Flexible architecture • Configurable for different tasks Configurable for different tasks

  25. Benchmark Example Benchmark Example • FPGA is 10 times faster than DSP FPGA is 10 times faster than DSP (source Xilinx) (source Xilinx)

  26. FPGA DSP is Lower Cost FPGA DSP is Lower Cost • Price per Million MACs per Second Price per Million MACs per Second (source Xilinx) (source Xilinx)

  27. FPGA structure FPGA structure

  28. FPGA configurable logic block FPGA configurable logic block

  29. CPU vs. FPGA Speedup Example CPU vs. FPGA Speedup Example - Experimental data Experimental data - Average Speedup = 24 Average Speedup = 24

  30. Price versus Performance Price versus Performance • Software solution = cheap, but slow Software solution = cheap, but slow • Hardware Solution = fast, but expensive Hardware Solution = fast, but expensive Performance HW HW+SW MIN SW MAX possible price required performance Price

  31. Solution? Solution? Coupled DSPs and FPGAs Coupled DSPs and FPGAs • Potentially the best solution Potentially the best solution – provides both programmability and provides both programmability and performance performance • Possible trend Possible trend – modify the concept of DSPs modify the concept of DSPs – include programmable circuits inside the include programmable circuits inside the DSP DSP – cannot be quite affected by the developers cannot be quite affected by the developers

  32. Dynamical Reconfiguration Dynamical Reconfiguration • DSP DSP • Algorithms example Algorithms example (3D Graphics) (3D Graphics) – one function at a time one function at a time – Texture Texture • FPGA FPGA – Shadow Shadow – more functions at a time more functions at a time – Reflections Reflections • Re-configurable FPGA Re-configurable FPGA – Perspective Perspective – all functions done in time all functions done in time – Edge Edge – some functions run while some functions run while others are loading others are loading

  33. Reconfiguration Advantages Reconfiguration Advantages • Lower cost by reusing silicon for multiple Lower cost by reusing silicon for multiple functions over time functions over time • Significant performance increase in FPGA Significant performance increase in FPGA hardware versus software DSP hardware versus software DSP implementation implementation • Possible partial reconfiguration - function Possible partial reconfiguration - function swapping swapping

  34. DSP development DSP development

  35. DSP example DSP example TI 320C32 TI 320C32

  36. DSP example II DSP example II DSP I/O DSP I/O

  37. DSP algorithm example DSP algorithm example

  38. Algorithm example Algorithm example • Image erosion using 3x3 square mask Image erosion using 3x3 square mask

Recommend


More recommend