hardware software co design not just a clich
play

HardwareSoftware Co-Design: Not Just a Clich Adrian Sampson James - PowerPoint PPT Presentation

HardwareSoftware Co-Design: Not Just a Clich Adrian Sampson James Bornholt Luis Ceze sa pa University of Washington SNAPL 2015 time 2005 2015 immemorial (not to scale) free lunch time 2005 2015 immemorial exponential


  1. Hardware–Software Co-Design: Not Just a Cliché Adrian Sampson James Bornholt Luis Ceze sa pa University of Washington SNAPL 2015

  2. time 2005 2015 immemorial (not to scale)

  3. free lunch time 2005 2015 immemorial exponential single-threaded performance scaling! (not to scale)

  4. Clock Frequency (MHz) 10,000 1,000 100 2020 2015 2010 2005 2000 1995 10 Year of Introduction 1990 1985 om 1986 to 2008 as measured by the bench- l Technology

  5. free lunch multicore era time 2005 2015 immemorial we’ll scale the number of cores instead

  6. The multicore transition was a stopgap, not a panacea.

  7. free lunch multicore era who knows? time 2005 2015 immemorial ? ? ? ? ?

  8. Application Language Architecture Circuits

  9. Application Language hardware–software abstraction boundary Architecture parallelism data guard energy movement bands costs Circuits

  10. Application Language parallelism data guard energy hardware–software abstraction boundary movement bands costs Architecture Circuits

  11. lessons learned from Approximate Computing New Opportunities for hardware–software co-design

  12. lessons learned from Approximate Computing New Opportunities for hardware–software co-design

  13. Application Language new abstractions for incorrectness Architecture Circuits

  14. Application probabilistic Language type systems debuggers guarantees auto-tuning new abstractions for incorrectness flaky lossy cache Architecture neural drowsy functional units compression acceleration SRAMs Circuits

  15. The von Neumann curse useful work other crud we don’t care about and can’t fix

  16. Hardware design costs sanity & well-being Thierry Moreau, FPGA design champion [Moreau et al.; HPCA 2015]

  17. Trust your compiler approximate cache [Esmaeilzadeh, Sampson, Ceze, Burger; ASPLOS 2012]

  18. Trust your compiler approximate cache st r1 x ld x r3 st.a r2 y ld.a y r4 [Esmaeilzadeh, Sampson, Ceze, Burger; ASPLOS 2012]

  19. Trust your compiler approximate cache 0 st r1 x ld x r3 1 0 1 st.a r2 y ld.a y r4 1 line state bits? [Esmaeilzadeh, Sampson, Ceze, Burger; ASPLOS 2012]

  20. Trust your compiler approximate cache st r1 x ld x r3 st.a r2 y ld.a y r4 line state bits? [Esmaeilzadeh, Sampson, Ceze, Burger; ASPLOS 2012]

  21. lessons learned from Approximate Computing New Opportunities for hardware–software co-design

  22. More hardware flexibility that humans can actually program

  23. More hardware flexibility that humans can actually program FPGA

  24. More hardware flexibility that humans can actually program explicit data movement explicit memory blocks explicit physical routing explicit clock frequency explicit ILP FPGA explicit numeric bit width

  25. More hardware flexibility that humans can actually program A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services Derek Chiou 1 Andrew Putnam Adrian M. Caulfield Eric S. Chung Kypros Constantinides 2 John Demme 3 Hadi Esmaeilzadeh 4 Jeremy Fowers Scott Hauck 5 Stephen Heil Gopi Prashanth Gopal Jan Gray Michael Haselman Amir Hormati 6 Joo-Young Kim James Larus 7 Eric Peterson Sitaram Lanka Simon Pope Aaron Smith Jason Thong Phillip Yi Xiao Doug Burger Microsoft desirable to reduce management issues and to provide a consis- Abstract tent platform that applications can rely on. Second, datacenter Datacenter workloads demand high computational capabili- services evolve extremely rapidly, making non-programmable ties, flexibility, power efficiency, and low cost. It is challenging hardware features impractical. Thus, datacenter providers to improve all of these factors simultaneously. To advance dat- are faced with a conundrum: they need continued improve- acenter capabilities beyond what commodity server designs ments in performance and efficiency, but cannot obtain those can provide, we have designed and built a composable, recon- improvements from general-purpose systems. figurable fabric to accelerate portions of large-scale software Reconfigurable chips, such as Field Programmable Gate services. Each instantiation of the fabric consists of a 6x8 2-D Arrays (FPGAs), offer the potential for flexible acceleration torus of high-end Stratix V FPGAs embedded into a half-rack of many workloads. However, as of this writing, FPGAs have of 48 machines. One FPGA is placed into each server, acces- not been widely deployed as compute accelerators in either sible through PCIe, and wired directly to other FPGAs with datacenter infrastructure or in client devices. One challenge pairs of 10 Gb SAS cables. traditionally associated with FPGAs is the need to fit the ac- In this paper, we describe a medium-scale deployment of celerated function into the available reconfigurable area. One this fabric on a bed of 1,632 servers, and measure its efficacy could virtualize the FPGA by reconfiguring it at run-time to in accelerating the Bing web search engine. We describe support more functions than could fit into a single device. the requirements and architecture of the system, detail the However, current reconfiguration times for standard FPGAs

  26. More hardware flexibility that humans can actually program A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services 23 Derek Chiou 1 Andrew Putnam Adrian M. Caulfield Eric S. Chung Kypros Constantinides 2 John Demme 3 Hadi Esmaeilzadeh 4 Jeremy Fowers Scott Hauck 5 Stephen Heil Gopi Prashanth Gopal Jan Gray Michael Haselman authors! Amir Hormati 6 Joo-Young Kim James Larus 7 Eric Peterson Sitaram Lanka Simon Pope Aaron Smith Jason Thong Phillip Yi Xiao Doug Burger Microsoft desirable to reduce management issues and to provide a consis- Abstract tent platform that applications can rely on. Second, datacenter Datacenter workloads demand high computational capabili- services evolve extremely rapidly, making non-programmable ties, flexibility, power efficiency, and low cost. It is challenging hardware features impractical. Thus, datacenter providers to improve all of these factors simultaneously. To advance dat- are faced with a conundrum: they need continued improve- acenter capabilities beyond what commodity server designs ments in performance and efficiency, but cannot obtain those can provide, we have designed and built a composable, recon- improvements from general-purpose systems. figurable fabric to accelerate portions of large-scale software Reconfigurable chips, such as Field Programmable Gate services. Each instantiation of the fabric consists of a 6x8 2-D Arrays (FPGAs), offer the potential for flexible acceleration torus of high-end Stratix V FPGAs embedded into a half-rack of many workloads. However, as of this writing, FPGAs have of 48 machines. One FPGA is placed into each server, acces- not been widely deployed as compute accelerators in either sible through PCIe, and wired directly to other FPGAs with datacenter infrastructure or in client devices. One challenge pairs of 10 Gb SAS cables. traditionally associated with FPGAs is the need to fit the ac- In this paper, we describe a medium-scale deployment of celerated function into the available reconfigurable area. One this fabric on a bed of 1,632 servers, and measure its efficacy could virtualize the FPGA by reconfiguring it at run-time to in accelerating the Bing web search engine. We describe support more functions than could fit into a single device. the requirements and architecture of the system, detail the However, current reconfiguration times for standard FPGAs

  27. Trust, but formally verify useful work

  28. Trust, but formally verify useful work checking that software doesn’t do anything crazy

  29. Trust, but formally verify Application Language verified properties Architecture Circuits e.g., [Hunt and Larus; OSR April 2007]

  30. Hardware beyond core computation software-defined CPU FPGA networking power supply & battery GPU accelerators new memory mobile display technologies & backlight

  31. the era of language free lunch multicore era co-design? time 2005 2015 immemorial

Recommend


More recommend