Think outside the rack 2015-04-21 WRSC john wilkes / - PowerPoint PPT Presentation

Think outside the rack 2015-04-21 WRSC john wilkes / johnwilkes@google.com, Parthasarathy Ranganathan, Steven Hand Google Inc .

Datacenter loads are not SPEC benchmarks Single query across multiple racks of multiple servers: graph of one query and associated RPCs for work distribution (only two levels shown); other queries going on, but not shown. Graphic from Dick Sites

Good news! lots of new technologies Silicon/hardware is getting ever more inventive forced to move to parallelism to track Moore's "law" Main memory: lots of volatile RAM, new non-volatile h/w Computation: oodles of cores, specialized accelerators Storage: flash/SSD, [magnetic disks still kicking] Networking: high bandwidth + low latency + lossless(?)

Good news! resource disaggregation Conceptually it's wonderful: Build a "rack computer" from a kit of parts (*) ○ a single big, disaggregated machine ○ all the benefits of a unified OS Build a "datacenter in a rack" (*) ○ a single, scaled-down distributed system, like the big guys use ○ all the benefits of shared-nothing distributed systems (*) OK - build at least two, for reliability

Good news! a RackScale foo is both of these Upsides: ○ meet all the needs of all but the largest organizations ○ buy just what you need (save money) ○ build just what you want (go fast) ○ tune for peak performance (go fast; save money) ○ conceptually similar to existing programming models What could possibly go wrong?

An RSfoo breaks everything An RSfoo is not the same as a computer ○ multiple internal failure domains ○ non-uniform resource access costs An RSfoo is not the same as a datacenter ○ shared nothing => disaggregated resources ○ existing programming models don't work

Datacenter experiences are relevant DRAM errors (1% AFR) A 2000-machine service will Disk failures (2-10% AFR) have >10 machine crashes per Machine crashes (~2/year) day OS upgrades (2-6/year) This is not a problem because of the shared-nothing model Images by Connie Zhou

RSfoo failures If disaggregation is used ○ each component failure ⇒ partial system failure ⇒ visible at the app level ○ fault propagation at the speed of light Apps aren't designed to handle this today

RSfoo provisioning How much of what to buy? ○ workload lifetime << hardware depreciation cycle ○ multiple esoteric resources ○ requires (dynamic) hardware evolution Apps + planning tools aren't designed to handle this today

RSfoo placement/scheduling Avoid resource stranding ○ disaggregation helps … ○ but RSfoo has more resource types Avoid bad placement ○ NUMA writ large ○ dynamic interference Existing placement / scheduling algorithms aren't good at this today

RSfoo inter-application interference RSfoos are small-scale datacenters, so will run multiple apps Disaggregated resources make ... ○ performance isolation much harder ○ security isolation much harder ○ failure isolation much harder Apps + systems aren't designed to handle this today

RSfoo groups You still need multiple RSfoos ... ○ control-plane failure ○ datacenter / network / environment failure ○ "big" workloads ○ end-user latency Existing inter-datacenter solutions (e.g., full replication) probably aren't ideal

Good news! We'll have job security ;-)

Good news! The solutions are in sight. The problems are just beginning. ○ Failures ○ Provisioning and configuration hassles ○ Interference ○ Multi-RSfoo support

One possible approach For each feature/property/behavior, start by asking: ○ "is this a big computer, or a small datacenter?" ○ (distributed systems techniques go a long way) Thinking about timescales may help: ○ seconds and up - datacenter control model ○ below that: application-level Introduce a feature after addressing issues identified here ○ don't forget the programming model!

Think outside the rack 2015-04-21 WRSC john wilkes / - PowerPoint PPT Presentation

Think outside the rack 2015-04-21 WRSC john wilkes / johnwilkes@google.com, Parthasarathy Ranganathan, Steven Hand Google Inc . Datacenter loads are not SPEC benchmarks Single query across multiple racks of multiple servers: graph of one query

Rack in Rails 3 <http://twitter.com/rtomayko> Ryan Tomayko GitHub Rack (Core Team)

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, 2015 Rack-Scale Computers

http://rack.github.com Thursday, November 11, 2010 Rack provides a minimal, modular and adaptable

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 Yuchung Cheng Neal Cardwell

Technical Information Rack Slide Dimension Drawing and Usage Table Dimension Diagram Rack

Towards Reconfigurable Rack-Scale Networking Tyler Szepesi , Bernard Wong, Tim Brecht, Sajjad Rizvi

RACK for SCTP Felix Weinrank Michael Txen Erwin P. Rathgeb Agenda A brief introdcution

MRG - AMQP trading system in a rack Carl Trieloff Senior Consulting Software Engineer/ Director

CAD Geometry Original The pipe rack structures were represented as solid obstructions; flow is

PRODUCT GUIDE ELITE 12 SCARF RACK 300.3080.12.[FIN] DESCRIPTION & SPECIFICATIONS Dont

Exo: Atomic Broadcast for the Rack-Scale Computer Matthew P. Grosvenor Marwan Fayed Andrew W.

ArgonCube 2x2 Cabling and grounding F. Piastra 31.10.2019 Power connections/grounding DAQ rack

Energy-Efficient Building Blocks For Rack Scale Computing Work In Progress Rami Alkubaty

Truck Boat Tail Folding Seat Bike Rack Goals: Improve highway gas mileage on tractor Goals: Allow

TH D ECEMBER 2019 C ONFERENCE T RACK : B IG D ATA D ATE : 6 T IME : 2.30 PM 5.30 PM V ENUE : E

Bicycle Rack Voucher Project RFP Pre-Bidders Conference March 6, 2014 1 Overview

CewePrometer Energy meter in class 0.2S, 0.5S and in class 1 Rack or wall mounted Three

A 12-Rack, 180-Server Datacenter Network (DCN) Using Multiwavelength Optical Switching and Full

Rack-scale Data Processing System Jana Giceva , Darko Makreshanski, Claude Barthels, Alessandro

Bentley Nevada 3500 System Architecture and Rack Configuration Presented by: Arfan Ali

Beyond rack testing. Beyond compare. Introducing two new levels of PCR-based testing services

Litmus Testing at Rack Scale We're Going to Build a Large Program Collider ad Collide instructions

Elodie Boller, P. Tafforeau, A. Rack, A. Bonnin, V. Fernandez ID19 beamline, ESRF, Grenoble,

Unleashing dynamic task scheduling at rack-scale Magnus Norgren, Andra Hugo (DDN

Think outside the rack 2015-04-21 WRSC john wilkes / - PowerPoint PPT Presentation

Think outside the rack 2015-04-21 WRSC john wilkes / johnwilkes@google.com, Parthasarathy Ranganathan, Steven Hand Google Inc . Datacenter loads are not SPEC benchmarks Single query across multiple racks of multiple servers: graph of one query

Rack in Rails 3 &lt;http://twitter.com/rtomayko&gt; Ryan Tomayko GitHub Rack (Core Team)

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, 2015 Rack-Scale Computers

http://rack.github.com Thursday, November 11, 2010 Rack provides a minimal, modular and adaptable

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 Yuchung Cheng Neal Cardwell

Technical Information Rack Slide Dimension Drawing and Usage Table Dimension Diagram Rack

Towards Reconfigurable Rack-Scale Networking Tyler Szepesi , Bernard Wong, Tim Brecht, Sajjad Rizvi

RACK for SCTP Felix Weinrank Michael Txen Erwin P. Rathgeb Agenda A brief introdcution

MRG - AMQP trading system in a rack Carl Trieloff Senior Consulting Software Engineer/ Director

CAD Geometry Original The pipe rack structures were represented as solid obstructions; flow is

PRODUCT GUIDE ELITE 12 SCARF RACK 300.3080.12.[FIN] DESCRIPTION &amp; SPECIFICATIONS Dont

Exo: Atomic Broadcast for the Rack-Scale Computer Matthew P. Grosvenor Marwan Fayed Andrew W.

ArgonCube 2x2 Cabling and grounding F. Piastra 31.10.2019 Power connections/grounding DAQ rack

Energy-Efficient Building Blocks For Rack Scale Computing Work In Progress Rami Alkubaty

Truck Boat Tail Folding Seat Bike Rack Goals: Improve highway gas mileage on tractor Goals: Allow

TH D ECEMBER 2019 C ONFERENCE T RACK : B IG D ATA D ATE : 6 T IME : 2.30 PM 5.30 PM V ENUE : E

Bicycle Rack Voucher Project RFP Pre-Bidders Conference March 6, 2014 1 Overview

CewePrometer Energy meter in class 0.2S, 0.5S and in class 1 Rack or wall mounted Three

A 12-Rack, 180-Server Datacenter Network (DCN) Using Multiwavelength Optical Switching and Full

Rack-scale Data Processing System Jana Giceva , Darko Makreshanski, Claude Barthels, Alessandro

Bentley Nevada 3500 System Architecture and Rack Configuration Presented by: Arfan Ali

Beyond rack testing. Beyond compare. Introducing two new levels of PCR-based testing services

Litmus Testing at Rack Scale We're Going to Build a Large Program Collider ad Collide instructions

Elodie Boller, P. Tafforeau, A. Rack, A. Bonnin, V. Fernandez ID19 beamline, ESRF, Grenoble,

Unleashing dynamic task scheduling at rack-scale Magnus Norgren, Andra Hugo (DDN

Rack in Rails 3 <http://twitter.com/rtomayko> Ryan Tomayko GitHub Rack (Core Team)

PRODUCT GUIDE ELITE 12 SCARF RACK 300.3080.12.[FIN] DESCRIPTION & SPECIFICATIONS Dont