fast st on die st on die statistical thermal hotspot
play

Fast st On-Die St On-Die Statistical Thermal Hotspot atistical - PowerPoint PPT Presentation

Fast st On-Die St On-Die Statistical Thermal Hotspot atistical Thermal Hotspot Analy Analysis: is: Considering Local Statistical Variations Considering Local Statistical Variations Palkesh Jain 1 Manoj Mehrotra 1 Qualcomm Technologies Inc.,


  1. Fast st On-Die St On-Die Statistical Thermal Hotspot atistical Thermal Hotspot Analy Analysis: is: Considering Local Statistical Variations Considering Local Statistical Variations Palkesh Jain 1 Manoj Mehrotra 1 Qualcomm Technologies Inc., India email: palkesh@qti.qualcomm.com 1

  2. Statistical Variations: Premise Core-to-core leakage variations are on rise. Such variations are expected to increase with increasing number of cores Core-to-core leakage variations for a multi-core chip (4  8  10 and so on) Source: S. Dighe et al., IEEE JSSSC, 2011 and reducing feature size Calls for: [Re-assessment of leakage power computation methodology] Impact assessment on SoC thermal hotspots  varies based on statistical leakage distribution of the chip (unique per chip) 2 5/17/2016

  3. Types of Leakage Variations Die-to-die: − Modeled well through global variations Within-die: primary cause of concern in this work − Systematic –across-shot litho induced variations; − Exhibit high amount of correlation (cells adjacent to each other tend to move similarly) − Random – primarily through random dopant fluctuations − Independent random variations; uncorrelated − Also called as ‘local variations’ ; This work 3 5/17/2016

  4. Random Variations: Sensitivity and Large Distributions Impact of Random Local Variations ON/Drive Current: OFF/Leakage Current: − Linear; swings -20% to 30% − Exponential: swings -100% to 500% Good news: Leakage spread reduces with increasing number of uncorrelated distributions (Central Limit Theorem) For SoC Leakage  billions of uncorrelated transistor-leakage- distributions: practically we can ignore σ of individual transistor distributions and just bother about µ Works as multiplier for incorporating on-die variations for leakage n n   calculations       2 ; sum i sum i   i 1 i 1 4 5/17/2016

  5. However.. (bad news) Thermal runaway can get triggered Collection of even by leakage-temperature 100 inverters Frequency (PDF) dependency of very small chip areas Single (1 grid ~ 10um2 x 10um2) Inverter − Local variations do not completely cancel out for small areas (fewer distributions) Log-Leakage (Normalized) High drive cell Grid local-variations are also a Frequency (PDF) function of the grid- composition 1x drive cell Smaller sized/drive cells − could potentially see a higher statistical spread of leakage Log-Leakage (Normalized) 5 5/17/2016

  6. Assessing Grid Composition Impact Localized composition (types and count of the cells) within a small grid alters the thermal sensitivity! Overall, chip leakage distribution may still remain narrow  3 design methodologies chosen to study the impact of design-style (grid-composition) on the thermal and power a Metrics Monit Metrics Monitored: d: As the leakage variations are statistical, for every configuration: − 100 Monte Carlo runs are performed. − For every run, we monitor: block’s total leakage and the individual grid temperatures b (> a) 6 5/17/2016

  7. Assessing Grid Composition Impact … (2) Single grid structure – defines the sensitivity of the grid to leakage variations  thermal Standar St andard: : Modified S Modified Sensitivit nsitivity with y with Modified sensitivity Modified sensitivit y Toler lerant Inne ant Inner Cor r Core − No alterations to the original with T with Toler lerant ant Out Outer r grid-compositions. Core Cor − Altered grid composition − All grids in the design have � ���� − Inner core has tolerant grids − Inner Inner � ���� = b = b similar sensitivity to leakage to variations � ���� variations − Outer Outer � ���� = a (< b = a (< b) � ���� − Inner Inner � ���� = a = a � ���� − Outer Outer � ���� = b (>a) = b (>a) 7 5/17/2016

  8. Statistical Impact of Grid Composition: Total Leakage Des Design 100 Mont 100 Monte Carlo runs @ e Carlo runs @ TT TT For the full block: St Style yle Max. Power Average − For the 100 MC runs for each design configuration, the local random Standard 1.17 1.086 variations result in only about 8% Tolerant 1.169 1.078 variation from the average current Inner Local variations effect gets Tolerant 1.184 1.062 averaged for block level leakage Outer 8 5/17/2016

  9. Statistical Impact of Grid Composition: Thermal Impact (1) (2) (3) Thermal map for most probable hotspot case Design Type: 2 Design Type: 3 From thermal perspective, Design Type: 1 there could be as much as 10C variation in hotspot temperature 9 5/17/2016

  10. Design Flow/Methodology Library Characterization Using MM models  � �,�� and � �,�� For every cell in the design ∆� ���,��,� � � �,�� � � For every grid, compute the � �,����  propose grid-changes if sigma is high Check for high-drive, low-finger cells in the Grid grid; space them apart Optimization Yes Required? No Compute hotspot and the max. junction temperature 10 5/17/2016

  11. Conclusions Accurate leakage calculation is integral to accurate thermal & hotspot predictions Random/Statistical variations significantly alters leakage: − Results in a low Chip-wide impact due to averaging out (Central Limit Applicability; ~ 5-6% variation from estimated global worst) − Very high impact on small-scale (>40% leakage variations) − Small scale (~ 10um2x10um2) variations in leakage alter temperature evolution (due to strong leakage—Temp loop) In this work, we shared: − A fast methodology to incorporate local variations into thermal estimations − A physical design methodology to reduce the SoC's final variability and thermal impact 11 5/17/2016

Recommend


More recommend