optimal sparseness in binary adders
play

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 - PowerPoint PPT Presentation

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders Structural features Recurrence algorithms Weinberger Ling Minimum depth structures Kogge-Stone Ladner-Fischer


  1. Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015

  2. Outline • Parallel Adders – Structural features – Recurrence algorithms • Weinberger • Ling – Minimum depth structures • Kogge-Stone • Ladner-Fischer • Sparse Adders – Sparse adders in literature • Energy Optimal Sparseness – Limits on sparseness – Effect of increased sparseness on adder energy • Implementation results • Conclusion

  3. Parallel Adder Structure

  4. Structural Features of Parallel Adders • Logic Depth (LD): maximum number of stages from input to output • Prefix (P): number of signals (or maximum fan-in) processed at each stage. – Prefix 2 means two signals are processed in a node. – Logical depth changes depending on the prefix. • minimum possible number of stages = log R N ( N -bit adder, prefix R ). – For N =64 : LD min = 6 for prefix 2, LD min = 3 for prefix 4. • Fan-out (F): The maximum number of logical branching in the prefix tree. • Wiring Complexity (WC): The maximum number of wire tracks passing along a bit-pitch of the technology in any stage of the prefix tree.

  5. Recurrence Algorithms Weinberger Ling

  6. Minimum Depth Adders Kogge-Stone Ladner-Fisher - Minimum depth (log 2 N) - Minimum depth (log 2 N) - Minimum fanout (2) - Maximum fanout (N/2) - Maximum wiring (N/2) - Minimum wiring (1) P.M. Kogge and H.S. Stone, “A parallel algorithm for the efficient solution R.E. Ladner, M.J. Fischer; ‘Parallel Prefix Computation’ JACM, 27(4):831- of a general class of recurrence equations”, IEEE Trans. Computers Vol. C- 838, Oct. 1980. 22, No. 8, Aug. 1973, pp.786-793.

  7. SPARSE ADDER

  8. Sparse Adder Structure • Critical path in prefix adder – Sum block: 1 gate – Carry block: 1+log 2 N gates • Cannot reduce critical path length beyond log 2 N, however can move complexity to less critical sum block. • Solution: Sparse adder – Generate every M th carry signal – Pre-compute sum signals for missing carry signals – Select true sum signal based on computed carry signals • Dilutes carry block, complicates sum block • Saves area, power without changing critical path length

  9. Prefix Graphs for Sparse Adders

  10. SPARSE ADDERS IN LITERATURE

  11. Conditional Sum (COS) Adder 32-bit prefix 2 COS adder prefix scheme. Sklansky, J.; , "Conditional-Sum Addition Logic," Electronic Computers, IRE Transactions on , vol.EC-9, no.2, pp.226-231, June 1960.

  12. Carry Select (CSL) Adder 64-bit prefix 4 sparse 4 CSL adder prefix scheme. Bedrij, O. J.; , "Carry-Select Adder," Electronic Computers, IRE Transactions on , vol.EC-11, no.0, pp.340-346, June 1962.

  13. Sparse Adder [Mathew, 2003] 32-bit prefix 2 sparse 4 LF prefix scheme Weinberger adder Mathew, S.; Anders, M.; Krishnamurthy, R.K.; Borkar, S.; , "A 4-GHz 130-nm address generation unit with 32-bit sparse-tree adder core," Solid-State Circuits, IEEE Journal of , vol.38, no.5, pp. 689- 695, May 2003.

  14. ENERGY OPTIMAL SPARSENESS

  15. Carry Tree Sparseness • Sparse carry trees reduce energy in parallel adders • Energy improvement is due to the complexity reduction of the carry path by reduced wiring and number of gates. • A certain amount of complexity is moved to the sum path implying a limit on the sparseness of the carry tree.

  16. Carry Tree Sparseness cont. • Making the carry tree sparse does not change the critical path length of the carry block. • However, increases the critical path length for the sum block. • Critical path length of carry block for an N -bit Ling adder using prefix 2 computations is log 2 N • A sparse M adder uses M -bit parallel adders in the sum block to compute conditional sum signals • Hence, critical path length for sum block is 2+log 2 M

  17. Limit on Sparseness • Weinberger recurrence – Carry critical path: 1+log 2 N – Sum critical path: 2+log 2 M 2+log 2 M ≤ 1 + log 2 N ⇒ M ≤ N /2 • Ling recurrence – Carry critical path: log 2 N – Sum critical path: 2+log 2 M 2+log 2 M ≤ log 2 N ⇒ M ≤ N /4

  18. SUM PATH DESIGN IN A SPARSE ADDER

  19. Sum Path Weinberger Ling c i = t i −1 h i −1

  20. RCA vs PPA in Partial Sum Computation RCA (Ripple Carry Adder) PPA (Parallel Prefix Adder) Depth = 5 Depth = 4

  21. RCA vs PPA: Critical path length Degree of Ripple carry Parallel prefix Sparseness ( M ) (1+ M ) (2+log 2 M ) 2 3 3 4 5 4 8 9 5 16 17 6

  22. 8-bit Partial Sum Computation using PPA Structure

  23. Theoretical results EFFECT OF INCREASED SPARSENESS

  24. Total gate count -Gate counts are equal for KS and LF adders.

  25. Total Gate Complexity - Complexity for a gate is defined as the number of inputs (for inverter 1, two-input nand 2, etc.) - For KS sparse 4 gives least complexity for 32 to 256 bit adders. - For LF sparse 2 gives least complexity for 32 and 64, and sparse 4 for 128 and 256 bit adders.

  26. Normalized Gate Complexity - Complexities are normalized to their full carry tree (sparseness 1) complexities. - For KS sparseness achieves 30% reduction in complexity. - For LF sparseness achieves 20% reduction in complexity.

  27. Total Wire Complexity - Wire complexity is defined as the total wire length (e.g. a wire from bit 32 to 64 will have a length of 32 units). - For KS complexity reduces as sparseness increases. - For LF wire cmplx. optimum sparseness is 2 for 32 and 64 bit, and 4 for 128 and 256 bit adders.

  28. Normalized Wire Complexity - Complexities are normalized to their full carry tree (sparseness 1) complexities. - For KS sparseness achieves 80% reduction in complexity. - For LF sparseness achieves 20% reduction in complexity.

  29. Theoretical Results • For 64-bit LF adders, sparse 2 yields both minimum gate complexity and total wire length – It must be noted that the reduction in gate complexity in LF adder is due to removal of buffers as opposed to the more complex AND-OR gates in KS adder. – Hence, the improvement in gate complexity for LF adder is smaller compared to the improvement in KS adder. – The increase in gate complexity beyond sparse 8 in KS adder will circumvent energy savings achieved through reduced wiring complexity. • Energy optimum sparseness degree will be determined by the gate capacitance to the wire capacitance ratio. – For low performance design region, gate sizes are small hence wire capacitances will dominate and KS sparse 8 is expected to outperform KS sparse 4 in terms of energy at same performance. – For LF adder on the other hand, it is not worth going beyond sparse 4 due to increased complexity in both measures. • For 128- and 256-bit adders sparse 4 yields the most savings for both KS and LF structures.

  30. RESULTS

  31. Technology Technology STDCELL Library • 45nm TSMC Gate Available Strength • VDD= 1.1V • Temp = 25`C AOI21 1x,2x,4x,6x,8x AOI22 1x,2x,4x,6x,8x • Typical process corner INV 1x,2x,4x,6x,8x,12x,16x,32x • Multi-Vth standard cell NAND2 1x,2x,4x,6x,8x library (low, standard, high) NOR2 1x,2x,4x,6x,8x OAI21 1x,2x,4x,6x,8x OAI22 1x,2x,4x,6x,8x

  32. Design Environment • Designed adders • Input driver: 16x inverter – KS adder w/ full, sparse 2, • Output load: 16x inverter sparse 4, and sparse 8 • 25% activity at inputs carry trees • Adders designed for – LF adder w/ full, sparse 2, minimum energy using sparse 4, and sparse 8 carry trees delay targets between • Circuit sizing using Design 300ps to 400ps. Compiler • Placement and routing using Encounter • Post layout simulations using Primetime

  33. Energy-Delay

  34. Leakage Power

  35. Wire Energy

  36. Conclusion • Energy savings of 50% and 22%, and leakage power savings of 70% and 40% are achieved with increased sparseness degree of carry trees for KS and LF adders, respectively. • For 64-bit KS Ling adder, energy optimal sparseness is 4. For 64-bit LF Ling adder, energy optimal sparseness is 2. • Both optimal KS and LF adders reach the same minimum delay target of 300ps. • Experimental results suggest that LF S2 is 7% more energy efficient than KS S4 at minimum delay point. • Theoretical results suggest that sparse 4 carry tree should be used for both KS and LF adders of sizes 128-bit and above.

  37. Questions? THANK YOU …

Recommend


More recommend