pipelined compressor tree optimization using integer
play

Pipelined Compressor Tree Optimization using Integer Linear - PowerPoint PPT Presentation

Pipelined Compressor Tree Optimization using Integer Linear Programming International Conference on Field Programmable Logic 03.09.2014 Martin Kumm, Peter Zipf University of Kassel, Germany C ONTENTS 1. Introduction to Compressor Trees 2.


  1. Pipelined Compressor Tree Optimization using Integer Linear Programming International Conference on Field Programmable Logic 03.09.2014 Martin Kumm, Peter Zipf University of Kassel, Germany

  2. C ONTENTS 1. Introduction to Compressor Trees 2. Compressor Trees on FPGAs 3. Optimal Compressor Tree Synthesis 2

  3. C OMPRESSOR T REES A compressor tree realizes the addition of many (>2) bit-shifted numbers The applications are versatile: Multiplier (real, complex, squarer) Evaluation of polynomials 
 (e.g., for function approximation) Linear transforms (e.g., FFT, DCT) Digital filters … 3

  4. 
 E XAMPLE 1: M ULTI -I NPUT A DDITION Dot representation 
 Formula: 
 5 bit, 5-input addition:  X  S = X i         i         input  vectors                   2 4 2 3 2 2 2 1 2 0 4

  5. 
 E XAMPLE 1: M ULTI -I NPUT A DDITION Dot representation 
 Formula: 
 5 bit, 5-input addition:  X 1 0 1 0 1 21  S = X i         i  1 1 0 1 1 +27        input  +13 0 1 1 0 1 vectors       +7 0 0 1 1 1          1 0 1 1 0 +22    = 90 3 · 2 4 +2 · 2 3 +4 · 2 2 +3 · 2 1 +4 · 2 0 = 90 5

  6. 
 E XAMPLE 2: M ULTIPLIER Dot Representation 
 Formula: 
 5x5 Multiplication: 6

  7. E XAMPLE 3: A DVANCED A RITHMETIC sine/cosine computation: Dot representation for Z-Z 3 /6 : [Dinechin HEART’13] 7

  8. B ASIC C OMPRESSION Full adder/ 
 Ripple carry adder: (3;2) counter: FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA 8

  9. F LOW OF C OMPRESSION ⇓ 9

  10. T ABULAR R EPRESENTATION 5 5 5 5 5 bits in stage 0 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 = 1 4 4 4 4 3 bits in stage 1 10

  11. T ABULAR R EPRESENTATION 1 4 4 4 4 3 bits in stage 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 = 1 3 3 3 3 1 bits in stage 2 11

  12. T ABULAR R EPRESENTATION 1 3 3 3 3 1 bits in stage 2 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 = 2 2 2 2 1 1 bits in stage 3 12

  13. T ABULAR R EPRESENTATION 2 2 2 2 1 1 bits in stage 3 2 2 2 2 o − ripple carry adder + 1 1 1 1 1 = 1 1 1 1 1 1 1 bits in final stage 13

  14. A PPLICATION TO FPGA S The compression using full adders is unsuitable for FPGAs: Mapping of a full adder on FPGA LUTs is inefficient and slow ( ➯ large routing delays) Fast carry chain is not exploited Conventional Solution: Ripple-carry adder tree Delay reduction possible by using Generalized Parallel Counters (GPCs) [Parandeh–Afshar TRETS’11] 14

  15. (1,5;3) GPC ON FPGA Dot transform: Realization: FA FA FA ⇓ 15

  16. (1,5;3) GPC ON FPGA (1,5;3) GPC Mapping [Parandeh-Afshar TRETS’11]: Efficiency = bits reduced/#LUTs = (1+5-3)/3 = 1.0 
 [Dinechin FPL’13] FA FA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 16

  17. E FFICIENT GPC S ON FPGA S (1,4,1,5;5) GPC [Kumm MBMV’14]: Efficiency = 1.5 FA FA FA FA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 17

  18. E FFICIENT GPC S ON FPGA S (1,4,0,6;5) GPC [Kumm MBMV’14]: Efficiency = 1.5 FA FA FA FA HA HA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 18

  19. E FFICIENT GPC S ON FPGA S (1,3,2,5;5) GPC (proposed): Efficiency = 1.5 FA FA FA FA FA FA FA FA FA HA HA FA FA FA FA FA FA FA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 19

  20. E FFICIENT GPC S ON FPGA S (6,0,6;5) GPC (proposed): Efficiency = 1.75 FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 20

  21. C OMPRESSOR T REE O PTIMIZATION Problem 1: The presented GPCs have irregular input pattern How to select them to get the least LUT resources? Problem 2: Pipelining is important on FPGAs to obtain a high throughput. How to select them to get the least LUT/FF resources? 
 (least pipeline balancing FFs) 21

  22. E XAMPLE FOR P ROBLEM 1 5 5 5 5 5 bits in stage 0 1 4 1 5 o − (1,4,1,5;5) GPC + 1 1 1 1 1 1 4 1 4 o − (1,4,1,5;5) GPC + 1 1 1 1 1 = 1 6 2 2 2 1 bits in stage 1 1 6 2 2 2 1 bits in stage 1 6 o − (6;3) GPC + 1 1 1 = 1 2 1 2 2 2 1 bits in stage 2 22

  23. E XAMPLE FOR P ROBLEM 2 5 5 5 5 5 bits in stage 0 2 0 4 5 o − (2,0,4,5;5) GPC + 1 1 1 1 1 5 0 5 o − (6,0,6;5) GPC + 1 1 1 1 1 3 1 o − 4 FF for pipeline balancing + 3 1 = 1 1 2 5 2 2 1 bits in stage 1 1 1 2 5 2 2 1 bits in stage 1 1 1 2 5 o − (1,3,2,5;5) GPC + 1 1 1 1 1 2 2 1 o − 5 FF for pipeline balancing + 2 2 1 = 1 1 1 1 1 2 2 1 bits in stage 2 23

  24. P ROPOSED O PTIMIZATION A generic ILP optimizer was used Main idea of the ILP formulation is to count GPCs for each column [Matsunaga’13] and to `cover´ all bits in each stage by GPCs For that, a `pseudo compressor´ with one input and one output is introduced (no compression) To optimize a combinatorial compressor tree 
 (problem 1) the cost are set to zero (a wire) To optimize a pipelined compressor tree 
 (problem 2) the cost are set to the flip flop cost 24

  25. ILP F ORMULATION ILP variables: No. of bits in stage s and column c : N s,c No. of GPCs in stage s , of type e and column c : k s,e,c No. of inputs and outputs of GPC (Typ e ) in column c : 
 and , respectively M e,c K e,c LUT cost of GPC e : c e Binary variable to select the active stage: ( if stage s is used 
 1 D s = otherwise 0 25

  26. ILP F ORMULATION S − 1 C − 1 E − 1 X X X minimize c e k s,e,c s =0 c =0 e =0 subject to s = 1 . . . S − 1 , E − 1 C e − 1 ) X X M e,c + c 0 k s − 1 ,e,c + c 0 C1: N s − 1 ,c ≤ c = 0 . . . C − 1 , if D s = 0 e =0 c 0 =0 E − 1 C e − 1 ) s = 1 . . . S − 1 , X X K e,c + c 0 k s − 1 ,e,c + c 0 C2: N s,c = c = 0 . . . C − 1 e =0 c 0 =0 ⇢ 2 for two-input VMA C3: N s,c ≤ if D s = 1 3 for ternary VMA S − 1 X C4: D s = 1 s =1 26

  27. 
 
 
 
 
 
 
 ILP F ORMULATION C1 and C3 have to be linearized: 
 E − 1 C e − 1 X X M e,c + c 0 k s − 1 ,e,c + c 0 + ID s C1’: N s − 1 ,c ≤ e =0 c 0 =0 ⇢ 2 + (1 − D s ) I for two-input VMA C3’: N s,c ≤ 3 + (1 − D s ) I for ternary VMA I must be a sufficiently large integer. 
 27

  28. R ESULTS (a) 250 700 600 200 500 150 400 #LUT #LUT 300 100 200 Heuristic [8] Heuristic [8] 50 100 prop. ILP prop. ILP 0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Compressed bits Compressed bits (a) Virtex 4 FPGA Virtex 6 FPGA The required LUTs could be reduced by 
 23% (Virtex 4) and 30% (Virtex 6) compared to 
 Dinechin (FPL’13) [8] The slice reduction was 12.5% (Virtex 4) and 19.5% (Virtex 6) after synthesis. 28

  29. E XAMPLE C OMPRESSION T REE WITH 16 I NPUTS , 16 B IT E ACH FloPoCo 
 Proposed ILP [Dinechin FPL’13] 29

  30. C ONCLUSION & O UTLOOK A novel ILP formulation for the optimization of pipelined compressor trees was presented There is a notable gap between the former 
 state-of-the-art heuristic and our optimal solution Extensions are proposed for minimal stage count or variable column counters like 4:2 compressors Good heuristics are still required for problem sizes >100 bit due to the runtime of the ILP solver So far there is no heuristic considering pipelining 30

  31. T HANK Y OU !

  32. L ITERATURE [Parandeh-Afshar TRETS’11]: H. Parandeh-Afshar, A. Neogy, P. Brisk, and P. Inne, “Compressor Tree Synthesis on Commercial High-Performance FPGAs,” ACM TRETS , 2011 [Dinechin HEART’13]: F. de Dinechin, M. Istoan, and G. Sergent, “Fixed-Point Trigonometric Functions on FPGAs,” HEART 2013 , Jun. 2013. [Dinechin FPL’13]: N. Brunie, F. de Dinechin, M. Istoan, G. Sergent, K. Illyes, and B. Popa, “Arithmetic Core Generation Using Bit Heaps,” FPL 2013 [Matsunaga’13]: T. Matsunaga, S. Kimura, and Y. Matsunaga, “An Exact Approach for GPC-Based Compressor Tree Synthesis,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences , Dec. 2013.

  33. ATTACHMENTS 34

Recommend


More recommend