asic clouds specializing the datacenter
play

ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein - PowerPoint PPT Presentation

ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor UC San Diego and Toshiba Presented By: Vandit Agarwal Motivation GPU and FPGA based clouds already successful Even ASIC


  1. ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor UC San Diego and Toshiba Presented By: Vandit Agarwal

  2. Motivation • GPU and FPGA based clouds already successful • Even ASIC Clouds have been successfully used • Take this idea ahead to form ASIC based clouds for other applications • Purpose built Datacenter • Large arrays of ASIC accelerators • Optimize Total Cost of Ownership (TCO) • For increasingly common high-volume chronic computations • Downside: • High Non Recurring Engineering (NRE) • Inflexibility

  3. Introduction • Two visible trends: • Heavy work done on cloud; interactive moved to client • Rise of dark silicon - specialization and near threshold computation • Conjunction of these two designs proved viable • On a single machine level, ASICs can o ff er at least an order improvement - explore and propose ASIC cloud • Identify key issues by studying Bitcoin ASIC Cloud

  4. Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W per op/s • Working with a joint knowledge/control over datacenter and h/ w design • Select single TCO-optimal point amongst many Pareto- optimal points

  5. Specialization Hierarchy O ff -PCB Interface On-PCB Network On-ASIC Interconnection Network • ASIC Design: achieves reduction in silicon area and energy consumption • ASIC Server: organization of ASIC, heat sinks, selective components, custom voltages • ASIC Datacenter: optimize rack and datacenter level thermal distribution, costs such as provisioning cost, availability, taxes etc. **To meet the requirements at datacenter level, modifications trickle down in the hierarchy

  6. ASIC Cloud Architecture O ff -PCB Interface On-PCB Network On-ASIC Interconnection Network • Trying to create a generic skeleton for ASIC Cloud • Heart of ASIC cloud - Replicated Compute Accelerator (RCA) - multiplied recursively • Customization: eg - if RCA requires DRAM, then ASIC contains shared DRAM controllers connected to ASIC-local DRAMs

  7. ASIC Server Overview • Focussed on 1U 19-inch Rackmount servers • Forced air-cooling system • Air intake from front, removal from back • Air at 30 o C

  8. ASIC Server Evaluation Flow • Given an implementation and architecture for target RCA: • VLSI tools used to map it to target process • Analysis tools provide info on: • Area • Performance • Power density • Tune the following to find lowest TCO: • No. of RCAs/Chip • No. of chips/PCB • Organization of chips on PCB • Power delivery mechanism • Cooling mechanism • Choice of voltage

  9. Thermally-Aware ASIC Server Design • ASICs and DC/DC convertors - major sources of heat • Heat Sinks: • Heat spreader glued to the heat source (die) using Thermal Interface Material (TIM) • Spreader has fins - air blowed through them • Increasing spreader size improves cooling • Increasing the die size improves cooling - overcomes TIM resistance • Developed a model: • Input: fan curve, ASIC count/row • Output: Optimal heat sink parameters

  10. Arranging ASICs on PCB

  11. More Chips vs Fewer Chips • How large (in mm 2 ) should each chip be? • Determines how many RCAs will be on each chip • Many small ASICs easier to cool than few large ASICs • Increasing silicon area -> heat dissipation capacity increases (TIM) • Large total die area in a row is e ff ective • Increasing no. of chips increases the packaging cost but not by much

  12. Power Density and Server Cost • Given same RCA, increasing Watts, increases performance • Moving right (high power density), very little total silicon per lane (due to temperature constraints) and must be divided into many smaller chips • Cooling and packaging cost • Moving left (low power density), more silicon per lane and fewer chips • Silicon area cost

  13. Bitcoin • Semi-anonymously and securely transfer money • Blockchain - globally replicated public ledger of transactions • A distributed consensus algorithm called Byzantine Fault Tolerance determines whose transactions are added to the blockchain • Mining: • Machines request work from a pool server • Hash - brute force attempt at partial inversion of cryptographically hard hash function • Hashrate - rate of hash - typically Giga hashes per second (GH/s) • On success, other machines verify. Accept and append the block

  14. What Led to Bitcoin ASIC Cloud? • People are incentivized to mine: • More number of machine = more secure system • Blockchain reward (25 BTC = ~USD 11k in 2016) • 144 blocks daily x 25 BTC per block = ~USD 1.5M daily • Rising TCO justifies the increased investment in NRE and other development cost • Leads to more specialization

  15. Bitcoin ASIC Trend Di ffi culty

  16. Implementation • 0.66 mm 2 silicon in UMC 28-nm process. • Power density: 2W/mm 2 • Extremely high power density

  17. Results • More silicon -> optimal voltages decreases -> server e ffi ciency increases • Initially, costs reduce (right to left) but then silicon costs start building up

  18. Voltage Stacking • DC/DC power is significant • Chips serially chained so that their supplies sum to 12V • Lead to significant savings in TCO optimal case

  19. Litecoin ASIC Cloud

  20. Video Transcoding ASIC Cloud **Pareto points are glitchy because of variations in constants and polynomial order for server components as they vary with voltages

  21. CNN ASIC Cloud

  22. When is ASIC Cloud Feasible

  23. Discussion • This is one of the earlier attempts to create a general framework/skeleton for an ASIC cloud. How feasible do you think this technology is and how widely and how soon can we potentially adopt it for a large variety of applications? • The authors recommend that open sourcing various tools by the cloud providers and silicon foundries would potentially lead to lower TCO. Is this a good solution? Why or why not? • What do you think is more optimal? Investing heavily in (high NRE) in more advanced nodes (eg 16nm) or using/modifying older nodes (eg 65nm) in an ASIC?

  24. Bitcoin ASIC Cloud Design • Repeatedly execute a Bitcoin hash operation • Input: 512 bit block • Mutate the block and perform SHA256 on it • Fed into another round of SHA256 • Leading zero count performed and matched with the target • 64 rounds in each SHA

Recommend


More recommend