Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason Anderson International Symposium on Physical Design Santa Rosa, CA, April 6, 2016 1
Motivation * Google data centre • FPGA power increasingly critical because of new markets – Data centers – Mobile electronics 2
Motivation • FPGAs typically have underutilized wires • We ask: Can we take advantage of unused wires? • This work: 3 techniques to reduce power w/ unused wires – Charge recycling (dynamic) – Effective capacitance reduction (dynamic) – Pulse-based signalling (static) 3
Dynamic Power Reduction Techniques 4
Motivation * Figure taken from [Tuan07 ] • Routing power is prime component of FPGA dynamic power 5
Charge Recycling in FPGA Interconnect 6
Dynamic Power in Conv. CCTs V DD V DD C L V DD coulombs of charge drawn from supply V DD 0 C L C L • Switching from “0” to “1” draws C L V DD 2 joules 7
Dynamic Power in Conv. CCTs V DD V DD C L V DD coulombs V DD 0 of charge C L C L dissipated • All of the stored energy in C L is dissipated • Can we use the energy that is being dissipated? 8
Charge Recycling (CR) Concept Initial Phase V DD V DD 0 C L C R • During “ 1 ” → “ 0 ” transition, output starts at V DD • PDN disconnected, PUN connected 9
Charge Recycling (CR) Concept Charge Recovery Phase V DD *Assume C L = C R V DD /2 V DD /2 C L C R • PUN, PDN disconnected, C L connected to C R • ½ C L V DD coulombs of charge transferred 10
Charge Recycling (CR) Concept Final Phase V DD V DD /2 0 C L C R • PDN connected, PUN disconnected • Output pulled to GND, ½ C L V DD coulombs dissipated 11
Charge Recycling (CR) Concept Initial Phase V DD V DD /2 0 C L C R • Output initially at GND during a “ 0 ” → ” 1 ” transition • ½ C L V DD coulombs stored in C R 12
Charge Recycling (CR) Concept Charge Recycling Phase V DD *Assume C L = C R V DD /4 V DD /4 C L C R • PUN, PDN disconnected, C L connected to C R • ¼ C L V DD coulombs of charge transferred to C L 13
Charge Recycling (CR) Concept Final Phase V DD V DD V DD /4 C L C R • ¾ C L V DD coulombs of charge drawn from supply • Implies 25% reduction in energy consumption 14
Observations • We can reduce power if reservoir capacitors are available – Use unused wires as reservoirs! • CR requires complex set of steps -- area penalty to implement? • FPGA routing circuits are big to begin with – Large routing multiplexers – Several SRAM cells – Large output buffers to drive long capacitive wires • Incremental area overhead of complex circuitry may not be too bad … 15
CR in FPGAs • Designs on FPGAs typically have paths with lots of slack – Can trade-off the delay of these paths for power savings using CR • Opportunity: [Anderson09] showed that 75% of switches in a design can be slowed down by 50% • Target CR in FPGA routing switches Routing ¡Switch Target output buffer for charge recovery/recycling Routing ¡Wire V IN V IN Inputs ¡ ` 16
Proposed FPGA Routing Arch. CR Buffer CLB CLB “Friend” Conductors (2 way sharing) SB CLB CLB 17
CR Routing Buffer V DD V DD CR TS M10 CR TS Gating V IN Circuitry M9 C Wire Delay Line V IN_D D IN CR CR CR D IN Circuit Unused Routing Conductor CR CR TS TS SRAM SRAM C Wire Cell Cell C R = C WIRE • CR sets state of buffer – CR mode vs. Normal mode • TS sets one of two “friend” buffers in tristate mode 18
Functional Simulation 1.20 Output Recovery Phase 1.00 Node ¡Voltage ¡[V] 0.80 Reservoir 0.60 0.40 Recycling Phase 0.20 0.00 35 40 45 50 55 Time ¡[ns] • Simulated in ST65 process • Approx. 26% power reduction • Theoretical reduction of 33% - circuit overheads • Assuming 200fF interconnect load 19
CAD Tool Support • Power can be reduced for a routing switch if: – 1) “Friend” conductor is unoccupied – 2) Switch lies along path with sufficient slack • To optimize CR in FPGAs, we need CAD which: – Maximizes the availability of free reservoirs for nets with high activity and sufficient slack – Optimizes the mode selection of switches 20
CAD Flow CIRCUIT Packing ¡ Conven&onal ¡VPR ¡6.0 ¡ packing ¡and ¡placement ¡ Placement Modified ¡VPR ¡6.0 ¡Router ¡ Op&mizes ¡availability ¡of ¡free ¡ CR-‑aware ¡ Net ¡Activities Router reservoirs ¡ %CR ¡Capable ¡ Switches Post-‑rou&ng ¡phase ¡to ¡select ¡ Switch ¡Mode ¡ Timing ¡ opera&ng ¡mode ¡of ¡switches ¡ Selection Constraint (CR ¡vs. ¡Normal) ¡ .net, ¡ f max , .route, ¡ power ¡ .place ¡ estimate files ¡ 21
Results !"#$%&'$( )*+,&-$+-.*$( )*+,&-$+-.*$( 5678(9'+*$"#$(&'( (/&-,(01(0"2"3&%&-4( 0*&:+"%(;"-,( • Arch. with 100% CR capable switches • Best case 1.3% degradation CP delay – Due to increased delay of CR capable switches • Extra ~3% power reduction as delay constraints relaxed 22
Effective Interconnect Capacitance Reduction 25
VLSI Wire Capacitance M5 C P C P C P C P C P C P C C C C C C C C C C C C M4 C P C P C P C P C P C P M3 • Wire capacitance consists of: – Coupling capacitance (C C ) – between adjacent wires on same layer – Plate capacitance (C P ) – between adjacent wires on different layers • Due to aspect ratio of wires, C C is dominant 28
Wire Capacitance Optimization in ASICs (1) s 1 Total ¡channel ¡width, ¡W net ¡ i w 1 s 2 net ¡ j w 2 s 3 net ¡ k w 3 s 4 • In ASICs, have freedom to optimize wire width and spacing • Can optimize w i and s i to maximize timing, minimize power • Optimize w i and s i subject to Σ w i + Σ s i = W 30
Wire Capacitance Optimization in ASICs (2) net ¡ i w 1 Total ¡channel ¡width, ¡W s 2 net ¡ j w 2 s 3 net ¡ k w 3 • If net j is timing/power critical: – Can increase s 2 and s 3 to reduce C C – Reduces capacitance on net j , improves speed and reduces power • Can also optimize w 1 , w 2 , w 3 for speed and power 31
In FPGAs? Routing Option 1 Routing Option 2 UNUSED ¡Conductors net ¡ i UNUSED ¡Conductors USED ¡Conductors net ¡ i net ¡ j USED ¡Conductors net ¡ j net ¡ k net ¡ k • FPGA wiring prefabricated, width and spacing fixed • Can’t space used wires apart, unused wires in the way • Capacitance on wires in two routing options the same – Despite the fact that nets i,j,k are now spaced further apart 32
Wire Cap. Optimization (1) C C2 C C1 ¡= ¡C C C C1 C P Z IN ( s ) Routing ¡ Conductor ¡3 C C2 ¡ + ¡C P ¡ R EQ IN 2 C P Routing ¡ Conductor ¡2 IN 1 C P Routing ¡ Conductor ¡1 • What’s the total impedance seen by Routing Conductor 1, looking towards Routing Conductor 2? 33
Wire Cap. Optimization (2) C C1 ¡= ¡C C C C1 ¡= ¡C C Z IN ( s ) Z IN ( s ) C C2 ¡ + ¡C P ¡ C C2 ¡ + ¡C P ¡ R EQ R EQ • If R eq is small, capacitor C C2 + C P is shorted out • Impedance looking towards Routing Conductor 2 is the capacitor C c 34
Wire Cap. Optimization (3) C C1 ¡= ¡C C C C1 ¡= ¡C C Z IN ( s ) Z IN ( s ) C C2 ¡ + ¡C P ¡ C C2 ¡ + ¡C P ¡ R EQ R EQ • If R eq is large, we approximate as an open circuit • Z IN equal to series combination of C C and C C2 + C P 35
Wire Cap. Optimization (3) • Series combinations of capacitors result in reduced capacitance: – If C 1 in series with C 2 , eq. capacitance C eq = C 1 C 2 /(C 1 + C 2 ) < C1 • Therefore can reduce capacitance if R eq is large enough • Making R eq large is bad … – buffer delay ~ R eq C wire --> increase in R eq increases delay • What if we made R eq large only for unused conductors? – Would not result in increased delay of used conductors – Neighbouring used conductors would see benefit of reduced cap. • Need to be able to set R eq large for unused conductors, but small for used conductors – Used tri-state buffers! 36
This Work Nets ¡ i ¡and ¡ j ¡ net ¡ i still ¡see ¡ reduced ¡ UNUSED ¡Conductors Tristated USED ¡Conductors coupling ¡ capacitance net ¡ j Tristated net ¡ k • If intermediate wires are tristated, see reduced C C !! • In this work we tristate unused wires to reduce wire cap – Proposed a novel, lightweight TSB topology – Used similar CAD techniques to CR work (won’t cover in this talk) 40
Proposed Tristate Buffer 41
Traditional Tri-state Buffers V DD TS M5 V DD M2 M4 M3 IN OUT M1 M6 • Header transistor M5 cuts off pull up path to output • Unused buffer would have IN at VDD – M 1 pulls gate of M 6 to GND • Large area cost: M 2 , M 4 and M 5 must be big due to of stacking 42
Optimized Headerless TSB V DD V DD V DD M2 M7 TS M9 V DD V DD OUT M4 TS M3 M5 IN M8 M1 • No stacking in output stage • Leverages fact that unused buffers have their input pulled high (details in paper) 43
Recommend
More recommend