near threshold computing reclaiming moore s law
play

Near-Threshold Computing: Reclaiming Moores Law Dr. Ronald G. - PowerPoint PPT Presentation

1 Near-Threshold Computing: Reclaiming Moores Law Dr. Ronald G. Dreslinski Research Fellow University of Michigan Ann Arbor 1 University of Michigan EnA-HPC -- September 7, 2011 1 1 Motivation 1000000 Transistors


  1. 1 Near-Threshold Computing: Reclaiming Moore’s Law Dr. Ronald G. Dreslinski Research Fellow University of Michigan – Ann Arbor 1 University of Michigan EnA-HPC -- September 7, 2011 1 1

  2. Motivation 1000000 ¡ Transistors ¡(100,000's) ¡ 100000 ¡ Power ¡(W) ¡ Performance ¡(GOPS) ¡ 10000 ¡ Efficiency ¡(GOPS/W) ¡ 1000 ¡ 100 ¡ 10 ¡ Limits ¡on ¡heat ¡extrac6on ¡ 1 ¡ Stagnates ¡performance ¡growth ¡ 0.1 ¡ 0.01 ¡ Limits ¡on ¡energy-­‑efficiency ¡of ¡opera6ons ¡ 0.001 ¡ 1985 ¡ 1990 ¡ 1995 ¡ 2000 ¡ 2005 ¡ 2010 ¡ 2015 ¡ 2020 ¡ 2 ¡ 2 University of Michigan EnA-HPC -- September 7, 2011

  3. Motivation 1000000 ¡ Transistors ¡(100,000's) ¡ Result: ¡Con6nue ¡scaling ¡trends ¡ 100000 ¡ Power ¡(W) ¡ that ¡fueled ¡the ¡compu6ng ¡ revolu6on ¡ Performance ¡(GOPS) ¡ 10000 ¡ Efficiency ¡(GOPS/W) ¡ 1000 ¡ With ¡the ¡help ¡of ¡some ¡beBer ¡ 100 ¡ thermal ¡management… ¡ 10 ¡ Goal: ¡To ¡increase ¡energy-­‑ efficiency ¡of ¡operaGons ¡ 1 ¡ 0.1 ¡ 0.01 ¡ 0.001 ¡ 1985 ¡ 1990 ¡ 1995 ¡ 2000 ¡ 2005 ¡ 2010 ¡ 2015 ¡ 2020 ¡ Era ¡of ¡High ¡Performance ¡Compu6ng ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Era ¡of ¡Energy-­‑Efficient ¡Compu6ng ¡ c. ¡2000 ¡ 3 ¡ 3 University of Michigan EnA-HPC -- September 7, 2011

  4. Outline 4  Define a new region of operation, Near-Threshold Computing  Explore new architectures enabled by key insights of computing in the NTC region  Present an initial design of a 3D stacked NTC system, Centip3De 4 University of Michigan EnA-HPC -- September 7, 2011 4 4

  5. Power Density Limitations 5 Circuit supply Power does not decrease at the voltages are no same rate that transistor count longer scaling… increases Environmental Form factor vs. Concerns Battery Life Stagnant Shrinking A = gate area  scaling 1/s 2 C = capacitance  scaling < 1/s Dynamic dominates Dark Silicon — The emerging dilemma: More and more gates can fit on a die, but not all can be turned on at the same time 5 University of Michigan EnA-HPC -- September 7, 2011 5 5

  6. Today: Super-V th , High Performance, Power Constrained 6 Super-V th Energy / Operation 3+ GHz 0.5 mW/MHz Normalized Power, Energy, & Performance Energy per operation is the key metric for efficiency. Goal: same performance, low energy per operation Log (Delay) 0 V th V nom Supply Voltage Core i7 6 University of Michigan EnA-HPC -- September 7, 2011 6 6

  7. Subthreshold Design 7 Super-V th Sub-V th Energy / Operation 12-16X Log (Delay) 500 – 1000X Operating in the sub-threshold gives us huge power gains at the expense of performance  OK for sensors! 0 V th V nom Supply Voltage 7 University of Michigan EnA-HPC -- September 7, 2011 7 7

  8. Evolution of Subthreshold Designs 8 Subliminal 1 Design (2006) -0.13 µ m CMOS -Used to investigate existence of Vmin -2.60 µ W/MHz Phoneix 1 Design (2008) - 0.18 µ m CMOS -Used to investigate sleep current -2.8 µ W/MHz / 30pW sleep power Subliminal 2 Design (2007) -0.13 µ m CMOS -Used to investigate process variation -3.5 µ W/MHz Phoenix 2 Design (2010) - 0.18 µ m CMOS -Commercial ARM M3 Core -Used to investigate: • Energy harvesting • Power management -37.4 µ W/MHz 8 University of Michigan EnA-HPC -- September 7, 2011 8 8

  9. Near-Threshold Computing (NTC) 9 Super-V th Sub-V th Energy / Operation ~6-8X ~2X Near-Threshold Computing (NTC): • >60X power reduction • 6-8X energy reduction • Invest portion of extra transistors from scaling to overcome barriers Log (Delay) ~50-100X ~10X 0 V th V nom Supply Voltage 9 University of Michigan EnA-HPC -- September 7, 2011 9 9

  10. Silicon Verification of Trends 10 Phoenix 2 Processor Phoenix 2 Design [Seok’11] 180nm Design 1.8V -> 700mV ~10x NTC Performance Loss ~7x NTC Energy Reduction Seok ISSCC 2011 10 University of Michigan EnA-HPC -- September 7, 2011 10 10

  11. NTC – Opportunities and Challenges 11  Opportunities:  New architectures  Optimized Processes  3D Integration – less thermal restrictions  Challenges:  Low Voltage Memory  New SRAM designs  Robustness analysis at near-threshold  Variation  Razor [Ernst’03] and other in-situ delay monitoring  Adaptive body biasing  Performance Loss  Many-core designs to improve parallelism  Core boosting to improve single thread performance 11 University of Michigan EnA-HPC -- September 7, 2011 11 11

  12. Outline 12  Define a new region of operation, Near-Threshold Computing  Explore new architectures enabled by key insights of computing in the NTC region  Present an initial design of a 3D stacked NTC system, Centip3De 12 University of Michigan EnA-HPC -- September 7, 2011 12 12

  13. Minimum Energy SRAM 13 Total Dynamic — Leakage  SRAM has a lower activity rate than logic  VDD for minimum energy operation (V MIN ) is higher  Running logic at V MIN for SRAM has a small energy penalty with increased performance 13 University of Michigan EnA-HPC -- September 7, 2011 13 13

  14. New NTC Architectures 14 Next Level Memory Next Level Memory BUS / Switched Network BUS / Switched Network L1 L1 L1 L1 L1 Cluster Cluster Cluster Core Core Core Core Core Cluster L1 L1 L1 L1 L1 Core Core Core Core Key Insight: • SRAM is run at a higher V DD than cores with little energy penalty, allowing caches to operate faster than the core Design Levers: • Operating Voltage • L1 Size • Number of Cores per Cluster • Number of Clusters 14 University of Michigan EnA-HPC -- September 7, 2011 14 14

  15. L1 Cache Size Tradeoff 15 Core Core Decreased Miss Rate L1 L1 Higher Energy/Access L2 L2 15 University of Michigan EnA-HPC -- September 7, 2011 15 15

  16. Results – Energy Optimal L1 Size (Single Core) 16  Energy dependency on L1 size  Trade-off between L1 and L2 access 16 University of Michigan EnA-HPC -- September 7, 2011 16 16

  17. Clustering Tradeoffs 17 CPU CPU CPU CPU CPU CPU CPU CPU L1 L1 L1 L1 L1 L1 O X X Tradeoffs ----------------------- + Clustered Sharing L2 L2 - Cluster Conflict - New Bus - L1 Speed 17 University of Michigan EnA-HPC -- September 7, 2011 17 17

  18. Energy Optimal Cluster-based CMP (Fixed Die Size) 18 18 University of Michigan EnA-HPC -- September 7, 2011 18 18

  19. Full Space Analysis 19 19 University of Michigan EnA-HPC -- September 7, 2011 19 19

  20. Various Scaling Methods 20  Baseline Normalized Energy/Operation  Single CPU @ 1 233MHz L2 38% 0.8 L1 71% 4 Cores  Simple CMP Core 4 L1’s 0.6  One core per L1  Vdd scaling 53% 0.4 2 Cores/Cluster 3 Clusters  Proposed cluster- 0.2 based CMP  Multiple cores per L1 0  Vdd scaling Uniprocessor CMP w/ NTC DVFS 20 University of Michigan EnA-HPC -- September 7, 2011 20 20

  21. Energy Optima for SPLASH2 21  Cluster based architecture with Vdd and Vth scaling  Optimal cluster size is 2 for most of the apps  Rad choose non-clustered CMP  Average: 74% over baseline, 55% over simple CMP energy savings energy savings over n c k L1 size/kB over baseline simple CMP Cho 3 2 64 70.8% 52.8% Fft 2 2 32 72.6% 68.5% fmm � 8 � 2 � 128 � 79.7% � 41.6% luc � 3 � 2 � 32 � 77.8% � 64.4% lun � 2 � 2 � 64 � 69.2% � 58.0% rad � 16 � 1 � 128 � 84.2% � 35.1% ray � 3 � 2 � 128 � 65.1% � 54.9% -21- 21 University of Michigan EnA-HPC -- September 7, 2011 21 21

  22. Energy Optima w/ Performance Requirements 22  Cluster based approach provides best savings  Traditional approach only saves energy at high end 53% 20% 32% 22 University of Michigan EnA-HPC -- September 7, 2011 22 22

  23. Outline 23  Define a new region of operation, Near-Threshold Computing  Explore new architectures enabled by key insights of computing in the NTC region  Present an initial design of a 3D stacked NTC system, Centip3De 23 University of Michigan EnA-HPC -- September 7, 2011 23 23

  24. A Closer Look at Wafer-Level Stacking 24 Oxide Silicon Dielectric(SiO2/SiN) “Super-Contact” Gate Poly STI (Shallow Trench Isolation) W (Tungsten contact & via) Al (M1 – M5) Cu (M6, Top Metal) Illustration from Bob Patti, Tezzaron 24 University of Michigan EnA-HPC -- September 7, 2011 24 24

  25. Next, Stack a Second Wafer & Thin: 25 25 University of Michigan EnA-HPC -- September 7, 2011 25 25

  26. Then, Stack a Third Wafer: 26 3rd wafer 2nd wafer 1st wafer: controller 26 University of Michigan EnA-HPC -- September 7, 2011 26 26

  27. Centip3De – 3D NTC Prototype 27 Logic - A Logic - B F2F Bond Logic - B Logic - A DRAM Sense/Logic – Bond Routing DRAM F2F Bond DRAM Centip3De Design • 130nm, 7-Layer 3D-Stacked Chip • 128 - ARM M3 Cores • 150mm 2 27 University of Michigan EnA-HPC -- September 7, 2011 27 27

Recommend


More recommend