fpga specific arithmetic pipeline design using flopoco
play

FPGA-specific arithmetic pipeline design using FloPoCo Bogdan - PowerPoint PPT Presentation

FPGA-specific arithmetic pipeline design using FloPoCo Bogdan Pasca, Ar enaire CARAMEL, 17/02/2011 Outline FPGAs and floating-point Datapath design using FloPoCo Inside FloPoCo Back-end for HLS Conclusion Bogdan Pasca, Ar enaire


  1. FPGA-specific arithmetic pipeline design using FloPoCo Bogdan Pasca, Ar´ enaire CARAMEL, 17/02/2011

  2. Outline FPGAs and floating-point Datapath design using FloPoCo Inside FloPoCo Back-end for HLS Conclusion Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 1

  3. FPGAs and floating-point FPGAs and floating-point Datapath design using FloPoCo Inside FloPoCo Back-end for HLS Conclusion Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 2

  4. What’s an FPGA? F ield P rogrammable G ate A rray integrated circuit has a regular architecture (hence array ) logic elements can be programmed to perform various functions Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 3

  5. Modern FPGA Architecture a set of configurable logic elements Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  6. Modern FPGA Architecture RAM RAM RAM RAM a set of configurable logic elements on chip memory blocks Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  7. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  8. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  9. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  10. Modern FPGA Architecture DSP LUT RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  11. Modern FPGA Architecture DSP LUT RAM RAM DSP 18 DSP RAM RAM 18 shift 17 DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  12. A bit of history Year 1995 2011 FPGA XC4010 XC6VHX565T 5SGXAB Capacity ( K LE) 1 500 1.000 DSPs - 1K 1.5K Bock RAM - 2K (18Kb) 2K (20Kb) Frequency ( MHz) 10 600 FPAdder 28% 0.05% 0.025% ( w E = 6 , w F = 9) 1 * 2 FPMultiplier ( w E = 6 , w F = 9) 44% * FPDivider 46% 0.1% 0.05% ( w E = 6 , w F = 9) 1 Shirazi et al., Quantitative Analysis of Floating Point Arithmetic on FPGA Based Custom Computing Machines (1995) 2 Multiplications are usually implemented using DSPs on modern FPGAs Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 5

  13. A bit of history Year 1995 2011 FPGA XC4010 XC6VHX565T 5SGXAB Capacity ( K LE) 1 500 1.000 DSPs - 1K 1.5K Bock RAM - 2K (18Kb) 2K (20Kb) Frequency ( MHz) 10 600 FPAdder 28% 0.05% 0.025% ( w E = 6 , w F = 9) 1 * 2 FPMultiplier ( w E = 6 , w F = 9) 44% * FPDivider 46% 0.1% 0.05% ( w E = 6 , w F = 9) FPGAs are now large enough to implement complex datapaths 1 Shirazi et al., Quantitative Analysis of Floating Point Arithmetic on FPGA Based Custom Computing Machines (1995) 2 Multiplications are usually implemented using DSPs on modern FPGAs Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 5

  14. So, are FPGAs any good at floating-point in 2011? Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 6

  15. So, are FPGAs any good at floating-point in 2011? Today’s basic operations: + , − , × j Highly optimized FPU in the processor j Each operator 10x slower in an FPGA ⋆ Massive parallelism on an FPGA → FPGA faster than PC, but no match to GPGPU, Cell ... Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 6

  16. So, are FPGAs any good at floating-point in 2011? Today’s basic operations: + , − , × j Highly optimized FPU in the processor j Each operator 10x slower in an FPGA ⋆ Massive parallelism on an FPGA → FPGA faster than PC, but no match to GPGPU, Cell ... If you lose according to a metric, change the metric. Peak figures for double-precision floating-point exponential 3 . Pentium core: 20 cycles / DPExp @ 3GHz: 150 MDPExp/s FPGA: 1 DPExp/cycle @ 400MHz: 400 MDPExp/s Chip vs chip: 8 Pentium cores vs 150 FPExp/FPGA ⋆ Power consumption also better (Intel MKL vector libm, vs FPExp in FloPoCo version 2.0.0) 3 de Dinechin, Pasca. Floating-point exponential functions for DSP-enabled FPGAs (2010) Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 6

  17. The FloPoCo project: Not your neighbour’s FPU Useful operators that would not be economical in a processor Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 7

  18. The FloPoCo project: Not your neighbour’s FPU Useful operators that would not be economical in a processor ⋆ Elementary functions (sine, exponential, logarithm...) x ⋆ Algebraic functions ( x 2 + y 2 , polynomials, ...) � ⋆ Compound functions (log 2 (1 ± 2 x ), e − Kt 2 , ...) ⋆ Floating-point sums, dot products, sums of squares ⋆ Specialized operators: constant multipliers, squarers, ... Complex arithmetic ⋆ LNS arithmetic ⋆ Decimal arithmetic Interval arithmetic ... Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 7

  19. The FloPoCo project: Not your neighbour’s FPU Useful operators that would not be economical in a processor ⋆ Elementary functions (sine, exponential, logarithm...) x ⋆ Algebraic functions ( x 2 + y 2 , polynomials, ...) � ⋆ Compound functions (log 2 (1 ± 2 x ), e − Kt 2 , ...) ⋆ Floating-point sums, dot products, sums of squares ⋆ Specialized operators: constant multipliers, squarers, ... Complex arithmetic ⋆ LNS arithmetic ⋆ Decimal arithmetic Interval arithmetic ... Oh yes, basic operations, too. Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 7

  20. VHDL Limitations One instance: double-precision, Virtex4, 400MHz - FPExp: 52 pipeline stages 37 subcomponents 6000 lines of VHDL Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 8

  21. VHDL Limitations One instance: double-precision, Virtex4, 400MHz - FPExp: 52 pipeline stages 37 subcomponents 6000 lines of VHDL vs 600 lines of FloPoCo Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 8

  22. VHDL Limitations One instance: double-precision, Virtex4, 400MHz - FPExp: 52 pipeline stages 37 subcomponents 6000 lines of VHDL vs 600 lines of FloPoCo Our questions for today: How to productively design an optimized architecture? Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 8

  23. VHDL Limitations One instance: double-precision, Virtex4, 400MHz - FPExp: 52 pipeline stages 37 subcomponents 6000 lines of VHDL vs 600 lines of FloPoCo Our questions for today: How to productively design an optimized architecture? How to be future-proof ? need a different precision target a different FPGA family (different multiplier sizes) need faster frequency Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 8

  24. Datapath design using FloPoCo FPGAs and floating-point Datapath design using FloPoCo Inside FloPoCo Back-end for HLS Conclusion Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 9

  25. A question of granularity productivity performance FloPoCo abstraction high low system builder loop C−like FPGA management arithmetic primitives datapath Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 10

  26. Sum of squares: performance approach x 2 + y 2 + z 2 (not a toy example but a useful building block) Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 11

  27. Sum of squares: performance approach x 2 + y 2 + z 2 (not a toy example but a useful building block) A square is simpler than a multiplication half the hardware required Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 11

  28. Sum of squares: performance approach x 2 + y 2 + z 2 (not a toy example but a useful building block) A square is simpler than a multiplication half the hardware required x 2 , y 2 , and z 2 are positive: one half of your FP adder is useless Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 11

Recommend


More recommend