Low Power SOC Design and Automation Matt Severson Qualcomm CDMA Technologies July 27, 2009
Outline Introduction Overview of Serra (Qualcomm’s first 45nm tapeout) Features / Technology / Low Power Techniques Tradeoffs of Automated vs. Custom Design for Low Power Memory IP Standard Cell Mixed-Vt Design Clock Power Clock Gating Clock Tree Synthesis Multiple Power Domains Voltage Scaling Voltage Islands with Power Gating Conclusions & Future Directions July 2009 2
Introduction Power consumption is a key differentiator in wireless communications products. Power Consumption Limits Battery Life Performance Feature set Form Factor July 2009 3
Introduction – Form Factor Phone Surface Temperature Rise Above Ambient 100 80 Surface Power Densities greater than 0.22W/sq-in Phone should be redesigned 50 30 Surface Power Densities between 0.1 and 0.22W/sq-in Phone is likely to have local hot spots 20 Temperature Rise, [C] 10 7 5 3 2 Surface Power Densities less than 0.1 W/sq-in This is the recommended design area 1 0.01 0.02 0.03 0.05 0.07 0.1 0.2 0.3 0.5 0.8 1 Surface Power Density [W/sq-in] Power Densities Increasing Overheating Limit Form Factors July 2009 4
Introduction - Battery Life Battery Life Analysis Verizon™ A T & T™ 860 mAh 800 mAh 800 mAh 1400 mAh 930 mAh 1300 mAh 1130 mAh 1500 mAh 910 mAh 880 mAh Battery Capacity 240 x 320 240 x 320 240 x 320 360 x 480 240 x 320 240 x 400 240 x 320 240 x 320 240 x 320 176 x 220 Screen Pixels 100% 80% % of Total Reviews 60% 44% 48% 43% 40% 35% 28% 26% 21% 20% 19% 14% 12% 0% Phone Device 1 Device 2 Device 3 Device 4 Device 5 Device 6 Device 7 Device 8 Device 9 Device 10 Chipset 1 Star (Bad) 2 Stars 3 Stars 4 Stars 5 Stars (Good) Expressed Dissatisfaction July 2009 5
Qualcomm’s First 45nm Tapeout SERRA July 2009 6
Serra Feature Set Modem o CDMA 1xEV-DO revA & B o UMTS (includes HSDPA, HSUPA) o GSM (includes GPRS and EDGE) o Unified GPS engine for both CDMA and UMTS modes. Processors o QDSP4u8 based MDSP core o ARM11 core with 32KB I/D cache o ARM926 core with 32KB I/D cache o QDSP5u4 based ADSP core w/ 256 KB L2 cache Multimedia o 24 bit WVGA w/ LCDC (active refresh) o ATI LT graphics core (Open GL 2.0) o 22M Triangles/Sec o 8 Mpixel Camera support Peripherals 2 HS USB interfaces o MDDI gen 1.5 July 2009 7
Serra Physical Characteristics Die size: 8200.08 x 6500.34 um (53.3 mm^2) Signal I/Os: 419 Process: tsmc45lp Metal Layers: 6 (5 thin, 1 thick (4x)) and 1 AP RDL layer Total # Transistors: 170 Million Total # RAM bits: 13.7 Mbits Total # ROM bits: 1.1 Mbits Static IR Drop: < 10mV (@ Worst case 800 mA) Leakage: ~450 uA (TT,25c, 1.125V) 671 pin 13x13 NSP Package 0.5mm ball pitch Includes Serra (Digital die) + Analog + Memory July 2009 8
Serra Low Power Design Goals Background Leakage Power is increasing due to process 45nm Sub-Threshold is worse than 65nm (pA/um) Gate leakage is increasing. Junction Diode leakage is increasing. 45nm Process has no HVt transistor. Simple scaling of Dynamic Power is not Enough. + Dynamic Power will scale down with process geometries (-C)(-V) However increased wire cap will temper the reduction o - Increased performance demands and more applications (+f) (+C) - Aggressive Product requirements for battery life Conclusion: More aggressive leakage and active power management techniques are required in 45nm Low Power Priorities / Goals for Serra: 1 Decrease Dynamic Power 2 Maintain the total static leakage power. 3 Keep Active Leakage a “small” percentage of Dynamic Power (< ~15%) July 2009 9
Serra Low Power Features Low Power Multi-Threshold Qualcomm Standard Cell Library 2 Vt and 2 Channel Lengths o Low Power Memory Power Collapsing of RAM/ROM periphery and core o Independent Bank Collapsing for Large High Density Memories o Advanced Low Power Clocking ~105 Master Clock Domains (I/O or Independent Frequency) o ~230 Total Clock Domains (Synchronous, Iso-Synchronous, Asynchronous) o Automatically inserted Fine grained clock gating o Manually inserted Architectural clock gates o Static SW control and Dynamic HW control of clock gating. o Custom Raw Clock Tree Routing o Low Power CTS with Qualcomm Custom Clock Tree cells. o 24 Analog and Pad power domains 2 Digital Power domains Independent Voltage Scaling o – Active and Sleep modes Power Collapsing o 8 Digital Power Islands with Power Gating All Low Power Features fully Verified Power Aware simulation Power Structural Checks July 2009 10
Serra Floorplan July 2009 11
Serra Static IR Drop Map July 2009 12
Serra Dynamic IR drop Map July 2009 13
DESIGN AUTOMATION
Design Automation Design Automation is Mandatory Design complexity Time to Market is Critical Fewer design resources required Quality o Through Standardized flows and tools Automated Design tools and flows have several limitations that affect low power Many automated tools don’t consider power Others don’t make the correct tradeoffs between power and area/timing. This Presentation focuses on the tradeoffs involved with several low power techniques used on Serra and the limitations of automated design for low power July 2009 15
Customized Design for Low Power Custom design flows and circuits can produce better results Lower Power, Higher Speeds, Less Area Custom design requires more design effort and time Use customized Design and signoff ONLY in critical areas Pick areas of customization to get the greatest benefit Clock Trees Raw Clock Trees Raw Clock Dividers Memory IP Standard Cell o Move the customization into IP o Use automation to insert the IP, check the IP and optimize with IP. July 2009 16
Customization of Raw Clock Network Raw clock networks are high speed, high power nets from PLLs to dividers PLL PLL Raw clock dividers are stacked and custom routed. Width and spacing are chosen for optimal clock isolation while maintaining fast transition times. PLL PLL Use minimal clock buffers to distribute clocks within the Raw Clock Network network but maintain desired transition delay. 10-input tri-state mux Selected Clock Path (green) Reduces insertion delay and power Custom Layout of raw dividers Reduces critical path delay, voltage noise and optimizes rise/fall times. ~4x reduction in Raw clock Power Non-Selected Active Clocks (red) (Compared to Previous chip) Traditional Wide Mux Structure July 2009 17
LOW POWER IP
Low Power Memory Core Array Periphe Bit-cell Sleep with Sleep ral array without data with with retention data footer header retention 90nm Yes No Yes No 65nm Yes No Yes No Periphery 45nm Yes Yes Yes Yes • Bit-cell leakage is up 6X in 45nm. Function array • No hVt devices. • All memories need to have leakage peri control Leakage Function • Circuit + System solutions Sleep w/ Sleep - Peripheral footer retention - Bit cell header Function Sleep w/o Sleep w/ Sleep w/o - Vdd scaling Sleep retention retention retention Maintain only the useful data with without with 90nm 65nm array header and reduce Vdd Vdd scaling Vdd scaling 45nm during sleep mode to manage the leakage. July 2009 19
Memory Partial Bank Collapse Power Gating portions of the bit-cell array that are not needed Standby/Active Leakage reduction Some active power reduction since clock/data is gated to banks that are not accessed. Requires Proper memory management in SW and FW. July 2009 20
STANDARD CELL POWER REDUCTION
Standard Cell Leakage 45nm Standard Cell Challenges Length Spacing Pitch Increase Ioff increase and no HVT device (L) (S) compared to 65nm 40n 40n 180n Performance provided by NVT is not required everywhere 50n 45n 200n 11% Power Gating not possible in all 60n 45n 210n 17% blocks TT, 25c Leakage Scaling Factor, 65nm to 45nm QCT45 H2N H2NL N2N L2L L2N L L N 24.71 9.45 1.61 10.48 0.56 P 27.58 5.68 3.00 26.80 1.51 Ave 26.14 7.57 2.31 18.64 1.03 45nm OPTIONS Use Longer Channel length NVT S S device Min channel length is 40nm in 45nm tech o 60n Use Stacked NVT devices Replace every device with a stack of 2 o devices Pitch = L/2 + S + 60n + S + L/2 July 2009 22
Simulation Results TSMC TT, 1.1V, 25C Leakage Savings Delay Increase 9.0 3.5 8.0 3.0 7.0 2.5 6.0 Factor Factor 2.0 5.0 4.0 1.5 3.0 1.0 2.0 0.5 1.0 0.0 0.0 NVT , 40n NVT , 50n NVT , 60n NVT , 70n Stacked NVT, 40n NVT, 50n NVT, 60n NVT, 70n Stacked NVT NVT Candidates Candidates Area Increase Switching Cap Increase Area : Comb. 70% 10% Switching Cap Increase (%) 60% Area Increase (%) 8% 50% 6% 40% Not 30% 4% Available 20% 2% 10% 0% 0% NVT, 40n NVT, 50n NVT, 60n NVT, 70n Stacked NVT, 40n NVT, 50n NVT, 60n NVT, 70n Stacked NVT NVT Candidates Candidates July 2009 23
Recommend
More recommend