power optimization of fpga interconnect via circuit and
play

Power Optimization of FPGA Interconnect Via Circuit and CAD - PowerPoint PPT Presentation

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason Anderson International Symposium on Physical Design Santa Rosa, CA, April 6, 2016 1 Motivation * Google data centre FPGA power increasingly


  1. Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason Anderson International Symposium on Physical Design Santa Rosa, CA, April 6, 2016 1

  2. Motivation * Google data centre • FPGA power increasingly critical because of new markets – Data centers – Mobile electronics 2

  3. Motivation • FPGAs typically have underutilized wires • We ask: Can we take advantage of unused wires? • This work: 3 techniques to reduce power w/ unused wires – Charge recycling (dynamic) – Effective capacitance reduction (dynamic) – Pulse-based signalling (static) 3

  4. Dynamic Power Reduction Techniques 4

  5. Motivation * Figure taken from [Tuan07 ] • Routing power is prime component of FPGA dynamic power 5

  6. Charge Recycling in FPGA Interconnect 6

  7. Dynamic Power in Conv. CCTs V DD V DD C L V DD coulombs of charge drawn from supply V DD 0 C L C L • Switching from “0” to “1” draws C L V DD 2 joules 7

  8. Dynamic Power in Conv. CCTs V DD V DD C L V DD coulombs V DD 0 of charge C L C L dissipated • All of the stored energy in C L is dissipated • Can we use the energy that is being dissipated? 8

  9. Charge Recycling (CR) Concept Initial Phase V DD V DD 0 C L C R • During “ 1 ” → “ 0 ” transition, output starts at V DD • PDN disconnected, PUN connected 9

  10. Charge Recycling (CR) Concept Charge Recovery Phase V DD *Assume C L = C R V DD /2 V DD /2 C L C R • PUN, PDN disconnected, C L connected to C R • ½ C L V DD coulombs of charge transferred 10

  11. Charge Recycling (CR) Concept Final Phase V DD V DD /2 0 C L C R • PDN connected, PUN disconnected • Output pulled to GND, ½ C L V DD coulombs dissipated 11

  12. Charge Recycling (CR) Concept Initial Phase V DD V DD /2 0 C L C R • Output initially at GND during a “ 0 ” → ” 1 ” transition • ½ C L V DD coulombs stored in C R 12

  13. Charge Recycling (CR) Concept Charge Recycling Phase V DD *Assume C L = C R V DD /4 V DD /4 C L C R • PUN, PDN disconnected, C L connected to C R • ¼ C L V DD coulombs of charge transferred to C L 13

  14. Charge Recycling (CR) Concept Final Phase V DD V DD V DD /4 C L C R • ¾ C L V DD coulombs of charge drawn from supply • Implies 25% reduction in energy consumption 14

  15. Observations • We can reduce power if reservoir capacitors are available – Use unused wires as reservoirs! • CR requires complex set of steps -- area penalty to implement? • FPGA routing circuits are big to begin with – Large routing multiplexers – Several SRAM cells – Large output buffers to drive long capacitive wires • Incremental area overhead of complex circuitry may not be too bad … 15

  16. CR in FPGAs • Designs on FPGAs typically have paths with lots of slack – Can trade-off the delay of these paths for power savings using CR • Opportunity: [Anderson09] showed that 75% of switches in a design can be slowed down by 50% • Target CR in FPGA routing switches Routing ¡Switch Target output buffer for charge recovery/recycling Routing ¡Wire V IN V IN Inputs ¡ ` 16

  17. Proposed FPGA Routing Arch. CR Buffer CLB CLB “Friend” Conductors (2 way sharing) SB CLB CLB 17

  18. CR Routing Buffer V DD V DD CR TS M10 CR TS Gating V IN Circuitry M9 C Wire Delay Line V IN_D D IN CR CR CR D IN Circuit Unused Routing Conductor CR CR TS TS SRAM SRAM C Wire Cell Cell C R = C WIRE • CR sets state of buffer – CR mode vs. Normal mode • TS sets one of two “friend” buffers in tristate mode 18

  19. Functional Simulation 1.20 Output Recovery Phase 1.00 Node ¡Voltage ¡[V] 0.80 Reservoir 0.60 0.40 Recycling Phase 0.20 0.00 35 40 45 50 55 Time ¡[ns] • Simulated in ST65 process • Approx. 26% power reduction • Theoretical reduction of 33% - circuit overheads • Assuming 200fF interconnect load 19

  20. CAD Tool Support • Power can be reduced for a routing switch if: – 1) “Friend” conductor is unoccupied – 2) Switch lies along path with sufficient slack • To optimize CR in FPGAs, we need CAD which: – Maximizes the availability of free reservoirs for nets with high activity and sufficient slack – Optimizes the mode selection of switches 20

  21. CAD Flow CIRCUIT Packing ¡ Conven&onal ¡VPR ¡6.0 ¡ packing ¡and ¡placement ¡ Placement Modified ¡VPR ¡6.0 ¡Router ¡ Op&mizes ¡availability ¡of ¡free ¡ CR-­‑aware ¡ Net ¡Activities Router reservoirs ¡ %CR ¡Capable ¡ Switches Post-­‑rou&ng ¡phase ¡to ¡select ¡ Switch ¡Mode ¡ Timing ¡ opera&ng ¡mode ¡of ¡switches ¡ Selection Constraint (CR ¡vs. ¡Normal) ¡ .net, ¡ f max , .route, ¡ power ¡ .place ¡ estimate files ¡ 21

  22. Results !"#$%&'$( )*+,&-$+-.*$( )*+,&-$+-.*$( 5678(9'+*$"#$(&'( (/&-,(01(0"2"3&%&-4( 0*&:+"%(;"-,( • Arch. with 100% CR capable switches • Best case 1.3% degradation CP delay – Due to increased delay of CR capable switches • Extra ~3% power reduction as delay constraints relaxed 22

  23. Effective Interconnect Capacitance Reduction 25

  24. VLSI Wire Capacitance M5 C P C P C P C P C P C P C C C C C C C C C C C C M4 C P C P C P C P C P C P M3 • Wire capacitance consists of: – Coupling capacitance (C C ) – between adjacent wires on same layer – Plate capacitance (C P ) – between adjacent wires on different layers • Due to aspect ratio of wires, C C is dominant 28

  25. Wire Capacitance Optimization in ASICs (1) s 1 Total ¡channel ¡width, ¡W net ¡ i w 1 s 2 net ¡ j w 2 s 3 net ¡ k w 3 s 4 • In ASICs, have freedom to optimize wire width and spacing • Can optimize w i and s i to maximize timing, minimize power • Optimize w i and s i subject to Σ w i + Σ s i = W 30

  26. Wire Capacitance Optimization in ASICs (2) net ¡ i w 1 Total ¡channel ¡width, ¡W s 2 net ¡ j w 2 s 3 net ¡ k w 3 • If net j is timing/power critical: – Can increase s 2 and s 3 to reduce C C – Reduces capacitance on net j , improves speed and reduces power • Can also optimize w 1 , w 2 , w 3 for speed and power 31

  27. In FPGAs? Routing Option 1 Routing Option 2 UNUSED ¡Conductors net ¡ i UNUSED ¡Conductors USED ¡Conductors net ¡ i net ¡ j USED ¡Conductors net ¡ j net ¡ k net ¡ k • FPGA wiring prefabricated, width and spacing fixed • Can’t space used wires apart, unused wires in the way • Capacitance on wires in two routing options the same – Despite the fact that nets i,j,k are now spaced further apart 32

  28. Wire Cap. Optimization (1) C C2 C C1 ¡= ¡C C C C1 C P Z IN ( s ) Routing ¡ Conductor ¡3 C C2 ¡ + ¡C P ¡ R EQ IN 2 C P Routing ¡ Conductor ¡2 IN 1 C P Routing ¡ Conductor ¡1 • What’s the total impedance seen by Routing Conductor 1, looking towards Routing Conductor 2? 33

  29. Wire Cap. Optimization (2) C C1 ¡= ¡C C C C1 ¡= ¡C C Z IN ( s ) Z IN ( s ) C C2 ¡ + ¡C P ¡ C C2 ¡ + ¡C P ¡ R EQ R EQ • If R eq is small, capacitor C C2 + C P is shorted out • Impedance looking towards Routing Conductor 2 is the capacitor C c 34

  30. Wire Cap. Optimization (3) C C1 ¡= ¡C C C C1 ¡= ¡C C Z IN ( s ) Z IN ( s ) C C2 ¡ + ¡C P ¡ C C2 ¡ + ¡C P ¡ R EQ R EQ • If R eq is large, we approximate as an open circuit • Z IN equal to series combination of C C and C C2 + C P 35

  31. Wire Cap. Optimization (3) • Series combinations of capacitors result in reduced capacitance: – If C 1 in series with C 2 , eq. capacitance C eq = C 1 C 2 /(C 1 + C 2 ) < C1 • Therefore can reduce capacitance if R eq is large enough • Making R eq large is bad … – buffer delay ~ R eq C wire --> increase in R eq increases delay • What if we made R eq large only for unused conductors? – Would not result in increased delay of used conductors – Neighbouring used conductors would see benefit of reduced cap. • Need to be able to set R eq large for unused conductors, but small for used conductors – Used tri-state buffers! 36

  32. This Work Nets ¡ i ¡and ¡ j ¡ net ¡ i still ¡see ¡ reduced ¡ UNUSED ¡Conductors Tristated USED ¡Conductors coupling ¡ capacitance net ¡ j Tristated net ¡ k • If intermediate wires are tristated, see reduced C C !! • In this work we tristate unused wires to reduce wire cap – Proposed a novel, lightweight TSB topology – Used similar CAD techniques to CR work (won’t cover in this talk) 40

  33. Proposed Tristate Buffer 41

  34. Traditional Tri-state Buffers V DD TS M5 V DD M2 M4 M3 IN OUT M1 M6 • Header transistor M5 cuts off pull up path to output • Unused buffer would have IN at VDD – M 1 pulls gate of M 6 to GND • Large area cost: M 2 , M 4 and M 5 must be big due to of stacking 42

  35. Optimized Headerless TSB V DD V DD V DD M2 M7 TS M9 V DD V DD OUT M4 TS M3 M5 IN M8 M1 • No stacking in output stage • Leverages fact that unused buffers have their input pulled high (details in paper) 43

Recommend


More recommend