greendroid a mobile
play

GreenDroid: A Mobile Application Processor for a Future of Dark - PowerPoint PPT Presentation

GreenDroid: A Mobile Application Processor for a Future of Dark Silicon Nathan Goulding, Jack Sampson, Ganesh Venkatesh, Saturnino Garcia, Joe Auricchio, Jonathan Babb + , Michael B. Taylor, and Steven Swanson Department of Computer Science and


  1. GreenDroid: A Mobile Application Processor for a Future of Dark Silicon Nathan Goulding, Jack Sampson, Ganesh Venkatesh, Saturnino Garcia, Joe Auricchio, Jonathan Babb + , Michael B. Taylor, and Steven Swanson Department of Computer Science and Engineering, University of California, San Diego + CSAIL, Massachusetts Institute of Technology Hot Chips 22 Aug. 23, 2010

  2. We've Hit The Utilization Wall Utilization Wall: With each successive process generation, the percentage of a chip that can actively switch drops exponentially due to power constraints. 2

  3. We've Hit The Utilization Wall Utilization Wall: With each successive process generation, the percentage of a chip that can actively switch drops exponentially due to power constraints.  Scaling theory – Transistor and power budgets are no longer balanced – Exponentially increasing problem!  Experimental results – Replicated a small datapath – More "dark silicon" than active  Observations in the wild – Flat frequency curve – "Turbo Mode" – Increasing cache/processor ratio 3

  4. We've Hit The Utilization Wall Utilization Wall: With each successive process generation, the percentage of a chip that can actively switch drops exponentially due to power constraints.  Scaling theory Classical scaling – Transistor and power budgets Device count S 2 are no longer balanced Device frequency S – Exponentially increasing Device power (cap) 1/S problem! Device power (V dd ) 1/S 2 Utilization 1  Experimental results – Replicated a small datapath Leakage-limited scaling – More "dark silicon" than active S 2 Device count  Observations in the wild Device frequency S – Flat frequency curve Device power (cap) 1/S – "Turbo Mode" Device power ( V dd ) ~1 – Increasing cache/processor ratio 1/S 2 Utilization 4

  5. We've Hit The Utilization Wall Utilization Wall: With each successive process generation, the percentage of a chip that can actively switch drops exponentially due to power constraints.  Scaling theory Expected utilization for fixed area – Transistor and power budgets and power budget 1.0 are no longer balanced 0.9 – Exponentially increasing 0.8 problem! 2x 0.7  Experimental results 0.6 – Replicated a small datapath 0.5 2x – More "dark silicon" than active 0.4 0.3  Observations in the wild 2x 0.2 – Flat frequency curve 0.1 – "Turbo Mode" 0.0 – Increasing cache/processor ratio 90 nm 65 nm 45 nm 32 nm 5

  6. We've Hit The Utilization Wall Utilization Wall: With each successive process generation, the percentage of a chip that can actively switch drops exponentially due to power constraints.  Scaling theory Utilization @ 40 mm 2 , 3 W – Transistor and power budgets 0.06 are no longer balanced 5.0% 0.05 – Exponentially increasing problem! 0.04 2.8x  Experimental results 0.03 – Replicated a small datapath 1.8% – More "dark silicon" than active 0.02 2x 0.9%  Observations in the wild 0.01 – Flat frequency curve 0.00 – "Turbo Mode" 90 nm 45 nm 32 nm – Increasing cache/processor ratio TSMC TSMC ITRS 6

  7. We've Hit The Utilization Wall Utilization Wall: With each successive process generation, the percentage of a chip that can actively switch drops exponentially due to power constraints.  Scaling theory Utilization @ 40 mm 2 , 3 W – Transistor and power budgets 0.06 are no longer balanced 5.0% 0.05 – Exponentially increasing problem! 0.04 2.8x  Experimental results 0.03 – Replicated a small datapath 1.8% – More "dark silicon" than active 0.02 2x 0.9%  Observations in the wild 0.01 – Flat frequency curve 0.00 – "Turbo Mode" 90 nm 45 nm 32 nm – Increasing cache/processor ratio TSMC TSMC ITRS 7

  8. We've Hit The Utilization Wall Utilization Wall: With each successive process generation, the percentage of a chip that can actively switch drops exponentially due to power constraints.  Scaling theory Utilization @ 40 mm 2 , 3 W – Transistor and power budgets 0.06 are no longer balanced 5.0% 0.05 – Exponentially increasing The utilization wall will change the way problem! everyone builds processors. 0.04 2.8x  Experimental results 0.03 – Replicated a small datapath 1.8% – More "dark silicon" than active 0.02 2x 0.9%  Observations in the wild 0.01 – Flat frequency curve 0.00 – "Turbo Mode" 90 nm 45 nm 32 nm – Increasing cache/processor ratio TSMC TSMC ITRS 8

  9. Utilization Wall: Dark Implications for Multicore .… Spectrum of tradeoffs between # of cores and 2x4 cores @ 1.8 GHz frequency (8 cores dark, 8 dim) Example: 65 nm  32 nm (S = 2) ( Industry’s Choice ) .… 4 cores @ 1.8 GHz 4 cores @ 2x1.8 GHz .… (12 cores dark) 9 65 nm 32 nm

  10. What do we do with dark silicon?  Goal: Leverage dark silicon to scale the utilization wall  Insights: – Power is now more expensive than area – Specialized logic can improve energy efficiency (10 – 1000x)  Our approach: – Fill dark silicon with specialized cores to save energy on common applications – Provide focused reconfigurability to handle evolving workloads 10 10

  11. Conservation Cores "Conservation Cores: Reducing the Energy of Mature Computations," Venkatesh et al., ASPLOS '10  Specialized circuits for Hot code reducing energy – Automatically generated from hot regions of program source code D-cache – Patching support future-proofs the C-core hardware  Fully-automated toolchain – Drop-in replacements for code Host I-cache – Hot code implemented by c-cores, CPU cold code runs on host CPU (general-purpose – HW generation/SW integration processor)  Energy-efficient – Up to 18x for targeted hot code 11 Cold code

  12. The C-core Life Cycle 12

  13. Outline  Utilization wall and dark silicon  GreenDroid  Conservation cores  GreenDroid energy savings  Conclusions 13

  14. Emerging Trends The utilization wall is exponentially worsening the dark silicon problem. Specialized architectures are receiving more and more attention because of energy efficiency. Mobile application processors are becoming a dominant computing platform for end users. 1Q Shipments, 20000 Android iPhone Thousands 18000 16000 14000 12000 Dell 10000 8000 6000 4000 Historical Data: Gartner 2000 14 0 1Q'07 1Q'08 1Q'09 1Q'10 1Q'11

  15. Mobile Application Processors Face the Utilization Wall  The evolution of mobile application processors mirrors that of microprocessors Cortex-A9 Intel  Application processors MPCore Core Duo ARM multicore face the utilization wall 686 Cortex-A9 out-of-order – Growing performance demands 586 Cortex-A8 superscalar – Extreme power 486 StrongARM constraints pipelining 1985 1990 1995 2000 2005 2010 2015 15

  16. Android™  Google’s OS + app. environment for mobile devices  Java applications run on the Applications Dalvik virtual machine Libraries Dalvik  Apps share a set of libraries (libc, OpenGL, SQLite, etc.) Linux Kernel Hardware 16

  17. Applying C-cores to Android  Android is well-suited for c-cores – Core set of commonly used applications – Libraries are hot code Applications – Dalvik virtual machine is hot code – Libraries, Dalvik, and kernel & application hotspots  c-cores Libraries Dalvik – Relatively short hardware replacement cycle Linux Kernel C-cores Hardware 17

  18. Android Workload Profile  Profiled common Android apps to find the hot spots, including: – Google: Browser, Gallery, Mail, Maps, Music, Video – Pandora – Photoshop Mobile Targeted – Robo Defense game Broad-based  Broad-based c-cores – 72% code sharing  Targeted c-cores – 95% coverage with just 43,000 static instructions (approx. 7 mm 2 ) 18

  19. GreenDroid: Applying Massive Specialization to Mobile Application Processors Android workload L1 L1 L1 L1 CPU CPU CPU CPU L1 L1 L1 L1 CPU CPU CPU CPU Automatic c-core L1 L1 L1 L1 generator CPU CPU CPU CPU L1 L1 L1 L1 CPU CPU CPU CPU Low-power tiled multicore Conservation cores lattice 19 (c-cores)

  20. GreenDroid Tiled Architecture  Tiled lattice of 16 cores  Each tile contains L1 L1 L1 L1 CPU CPU CPU CPU – 6-10 Android c-cores (~125 total) L1 L1 L1 L1 – 32 KB D-cache CPU CPU CPU CPU (shared with CPU) – MIPS processor L1 L1 L1 L1 CPU CPU CPU CPU • 32 bit, in-order, 7-stage pipeline L1 L1 L1 L1 • 16 KB I-cache CPU CPU CPU CPU • Single-precision FPU – On-chip network router 20

  21. GreenDroid Tile Floorplan OCN C C C C  1.0 mm 2 per tile C I $ C  50% C-cores D $ 1 mm  25% D-cache C CPU  25% MIPS core, I-cache, and C C C on-chip network 1 mm 21

  22. GreenDroid Tile Skeleton OCN C-cores  45 nm process  1.5 GHz I $  ~30k instances D $  Blank space is filled with CPU a collection of c-cores  Each tile contains different c-cores 22

  23. Outline  Utilization wall and dark silicon  GreenDroid  Conservation cores  GreenDroid energy savings  Conclusions 23

Recommend


More recommend