ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor UC San Diego and Toshiba Presented By: Vandit Agarwal
Motivation • GPU and FPGA based clouds already successful • Even ASIC Clouds have been successfully used • Take this idea ahead to form ASIC based clouds for other applications • Purpose built Datacenter • Large arrays of ASIC accelerators • Optimize Total Cost of Ownership (TCO) • For increasingly common high-volume chronic computations • Downside: • High Non Recurring Engineering (NRE) • Inflexibility
Introduction • Two visible trends: • Heavy work done on cloud; interactive moved to client • Rise of dark silicon - specialization and near threshold computation • Conjunction of these two designs proved viable • On a single machine level, ASICs can o ff er at least an order improvement - explore and propose ASIC cloud • Identify key issues by studying Bitcoin ASIC Cloud
Objective In a Nutshell • Two key metrics drive the development: • H/w cost per performance = $ per op/s • Energy per operation = W per op/s • Working with a joint knowledge/control over datacenter and h/ w design • Select single TCO-optimal point amongst many Pareto- optimal points
Specialization Hierarchy O ff -PCB Interface On-PCB Network On-ASIC Interconnection Network • ASIC Design: achieves reduction in silicon area and energy consumption • ASIC Server: organization of ASIC, heat sinks, selective components, custom voltages • ASIC Datacenter: optimize rack and datacenter level thermal distribution, costs such as provisioning cost, availability, taxes etc. **To meet the requirements at datacenter level, modifications trickle down in the hierarchy
ASIC Cloud Architecture O ff -PCB Interface On-PCB Network On-ASIC Interconnection Network • Trying to create a generic skeleton for ASIC Cloud • Heart of ASIC cloud - Replicated Compute Accelerator (RCA) - multiplied recursively • Customization: eg - if RCA requires DRAM, then ASIC contains shared DRAM controllers connected to ASIC-local DRAMs
ASIC Server Overview • Focussed on 1U 19-inch Rackmount servers • Forced air-cooling system • Air intake from front, removal from back • Air at 30 o C
ASIC Server Evaluation Flow • Given an implementation and architecture for target RCA: • VLSI tools used to map it to target process • Analysis tools provide info on: • Area • Performance • Power density • Tune the following to find lowest TCO: • No. of RCAs/Chip • No. of chips/PCB • Organization of chips on PCB • Power delivery mechanism • Cooling mechanism • Choice of voltage
Thermally-Aware ASIC Server Design • ASICs and DC/DC convertors - major sources of heat • Heat Sinks: • Heat spreader glued to the heat source (die) using Thermal Interface Material (TIM) • Spreader has fins - air blowed through them • Increasing spreader size improves cooling • Increasing the die size improves cooling - overcomes TIM resistance • Developed a model: • Input: fan curve, ASIC count/row • Output: Optimal heat sink parameters
Arranging ASICs on PCB
More Chips vs Fewer Chips • How large (in mm 2 ) should each chip be? • Determines how many RCAs will be on each chip • Many small ASICs easier to cool than few large ASICs • Increasing silicon area -> heat dissipation capacity increases (TIM) • Large total die area in a row is e ff ective • Increasing no. of chips increases the packaging cost but not by much
Power Density and Server Cost • Given same RCA, increasing Watts, increases performance • Moving right (high power density), very little total silicon per lane (due to temperature constraints) and must be divided into many smaller chips • Cooling and packaging cost • Moving left (low power density), more silicon per lane and fewer chips • Silicon area cost
Bitcoin • Semi-anonymously and securely transfer money • Blockchain - globally replicated public ledger of transactions • A distributed consensus algorithm called Byzantine Fault Tolerance determines whose transactions are added to the blockchain • Mining: • Machines request work from a pool server • Hash - brute force attempt at partial inversion of cryptographically hard hash function • Hashrate - rate of hash - typically Giga hashes per second (GH/s) • On success, other machines verify. Accept and append the block
What Led to Bitcoin ASIC Cloud? • People are incentivized to mine: • More number of machine = more secure system • Blockchain reward (25 BTC = ~USD 11k in 2016) • 144 blocks daily x 25 BTC per block = ~USD 1.5M daily • Rising TCO justifies the increased investment in NRE and other development cost • Leads to more specialization
Bitcoin ASIC Trend Di ffi culty
Implementation • 0.66 mm 2 silicon in UMC 28-nm process. • Power density: 2W/mm 2 • Extremely high power density
Results • More silicon -> optimal voltages decreases -> server e ffi ciency increases • Initially, costs reduce (right to left) but then silicon costs start building up
Voltage Stacking • DC/DC power is significant • Chips serially chained so that their supplies sum to 12V • Lead to significant savings in TCO optimal case
Litecoin ASIC Cloud
Video Transcoding ASIC Cloud **Pareto points are glitchy because of variations in constants and polynomial order for server components as they vary with voltages
CNN ASIC Cloud
When is ASIC Cloud Feasible
Discussion • This is one of the earlier attempts to create a general framework/skeleton for an ASIC cloud. How feasible do you think this technology is and how widely and how soon can we potentially adopt it for a large variety of applications? • The authors recommend that open sourcing various tools by the cloud providers and silicon foundries would potentially lead to lower TCO. Is this a good solution? Why or why not? • What do you think is more optimal? Investing heavily in (high NRE) in more advanced nodes (eg 16nm) or using/modifying older nodes (eg 65nm) in an ASIC?
Bitcoin ASIC Cloud Design • Repeatedly execute a Bitcoin hash operation • Input: 512 bit block • Mutate the block and perform SHA256 on it • Fed into another round of SHA256 • Leading zero count performed and matched with the target • 64 rounds in each SHA
Recommend
More recommend