Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, - PowerPoint PPT Presentation

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, 2015

Rack-Scale Computers (RSC) (or Datacenter-in-a-Box systems) • Tightly integrated rack (in a single box) Node Node Node • Very fast node Node interconnection FPGA CPU Node Node Node • Special-purpose GPU NIC components Node Node Node • “Uncommon” network topologies 2 April 21th, 2015

Rack-Scale Computers (RSC) (or Datacenter-in-a-Box systems) “Traditional” Model “Torus” Model Node Node Node Node Node Node Node Node Node Node Node Node 3 April 21th, 2015

Do they need coordination? • Leader election • Locks • Barriers • Atomic counters • Augmented Queues … • Configuration management 4 April 21th, 2015

Out of the box Alternatives • Shared memory algorithms • Multi-kernel coordination • Datacenter coordination 5 April 21th, 2015

Single-machine Coordination • Shared-memory algorithms – Classical shared memory locking algorithms exist since the 70s (Lamport’s Bakery, etc.) – Algorithms require some consistency on the shared memory • Total Store Ordering (TSO – weaker than sequential consistency) • The best know result requires a constant number of remote memory references and memory barriers [PODC’13] • Multi-kernel Solution – A service (deployed on a core) that provides all the coordination primitives that applications need • E.g., Barrelfish supports a service like Zookeeper [APSys’12] • Both solutions do not tolerate faults 6 April 21th, 2015

Datacenter Coordination • Coordination services: System Data Model Sync. Primitive Wait-free Boxwood [44] Key-Value store Locks No Chubby [17] (Small) File system Locks No Sinfonia [6] Key-Value store Microtransactions Yes DepSpace [14] Tuple space cas/replace ops Yes ZooKeeper [31] Hierar. of data nodes Sequencers Yes etcd [3] Hierar. of data nodes Sequen./Atomic ops Yes LogCabin [5] Hierar. of data nodes Conditions Yes – dependable (limited) storage – synchronization power – client failure detection 7 April 21th, 2015

So … • A RSC has multiple fault domains, so fault tolerance is needed – Coordination services are our best bet • Durability may or may not be needed – Strictly required for configuration management • Extensibility for improved performance – See the “Extensible Distributed Coordination” paper/talk on EuroSys’15 8 April 21th, 2015

Traditional Network • The coordination service is implemented as usual, i.e., “just deploy Zookeeper on your RSC” – A bunch of replicas ensure the service is fault tolerant – Durability techniques ensure full crash recovery • Possible improvements: – More efficient replication algorithms • DARE [HPDC’15] proposes RAFT-like RDMA-based state machine replication with 12 microsec latency (1kB write) – 35x faster than ZK in the same network – Faster durability mechanisms (e.g., NVRAM) 10 April 21th, 2015

Torus Network • Coordination scope – L0: local CPU Node Node Node Node Node Node – L1: CPU + other local computing devices Node Node Node Node Node Node Node 3 – L2: all nodes reachable in one hop Node Node Node Node Node Node Node Node Node 2 – L3: all nodes reachable in two hops 1 Node Node Node Node Node Node Node Node Node Node Node – … Node Node Node Node Node Node Node Node Node – LN: all nodes reachable in N-1 hops • This may lead to the development Node Node Node Node Node Node Node of new quorum systems and fault- tolerant algorithms 11 April 21th, 2015

Questions … questions … • The RSC software stack requires general coordination support. The question is: – Do we need anything specific or it is just a matter of deploying what we already have? • Other questions: – Can specialized hardware (FPGA) help? – Can we assume/implement reliable failure detection? – Efficiency or predictability? – What about data-centric coordination? 12 April 21th, 2015

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, - PowerPoint PPT Presentation

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, 2015 Rack-Scale Computers (RSC) (or Datacenter-in-a-Box systems) Tightly integrated rack (in a single box) Node Node Node Very fast node Node interconnection FPGA

Welfare Assessment Quality of Life- The Five Freedoms Need for suitable diet Need for

Emergency Emergency Power Power What do I need? What do I need? January 2009 Jack TIley

Introduction to Fuzzy Logic First Step: . . . Need for Interpolation Vladik Kreinovich Need to

Network Layer (Routing) Recap: Why do we need a Network layer? Internetworking Need to

NEEDFINDING IN HUMAN CENTERED DESIGN What is need finding? What is need finding? What is need

data framework and how it influences policies Chris Nicholls Contents What is NEED?

Cost of Attendance Presented by: Donna Quick Basi Basic Need Need Eq Equation Cost of attendance

Thermal Flywheeling Alex Woolf, PhD - Principal Data Scientist Lineage Logistics 1 THE NEED FOR

National Energy Efficiency Data- framework (NEED) Stakeholder Event 29 th November 2011 NEED

Schooner Rock Restoration Project EA April 2016 Purpose & Need Need to accelerate the

How to export Image and Animation 1. When we finished our design, we need export it. Sometimes we

Community Outreach on Transportation Priorities Why Do We Need Get Us Moving? We know we need to

What do I need to do to sell my What do I need to do to sell my building product in Australia?

District- -based risk based risk- -need need- -driven Personalized Care Program (PCP) for

Hp's Portal Solutions Hewlett-Packard Reinventing Your Web Presence Need to Attract Need More

What does a protein need to work? Leonid Mirny leonid@mit.edu What does a protein need to work?

Kyle Davis Valve In - Game Economies in Team Fortress and Dota HOW TO MAKE YOUR PRODUCT

LikeStarter: a Smart-contract based Social DAO for Crowdfunding Mirko Zichichi , Michele Contu,

Discovery & Monetisation Tom Greenaway Google @tcmg But fjrst Can someone please

Game De sign Base d on Mic r o-T r ansac tions in Online Game s F e b. 22, 2008 Gyuhwan Oh

Blockchain Claire Vishik (Mic Bowman) General perceptions Blockchain is the most disruptive

1 Agillas Computational Model Tuple Space-Based Coordination Content-addressable shared

A Middleware for Concurrent Programming in MPI Applications Tobias Berka, Helge Hagenauer and

MOSDEN: An Internet of Things Middleware for Resource Constrained Mobile Devices Charith Perera,

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, - PowerPoint PPT Presentation

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, 2015 Rack-Scale Computers (RSC) (or Datacenter-in-a-Box systems) Tightly integrated rack (in a single box) Node Node Node Very fast node Node interconnection FPGA

Welfare Assessment Quality of Life- The Five Freedoms Need for suitable diet Need for

Emergency Emergency Power Power What do I need? What do I need? January 2009 Jack TIley

Introduction to Fuzzy Logic First Step: . . . Need for Interpolation Vladik Kreinovich Need to

Network Layer (Routing) Recap: Why do we need a Network layer? Internetworking Need to

NEEDFINDING IN HUMAN CENTERED DESIGN What is need finding? What is need finding? What is need

data framework and how it influences policies Chris Nicholls Contents What is NEED?

Cost of Attendance Presented by: Donna Quick Basi Basic Need Need Eq Equation Cost of attendance

Thermal Flywheeling Alex Woolf, PhD - Principal Data Scientist Lineage Logistics 1 THE NEED FOR

National Energy Efficiency Data- framework (NEED) Stakeholder Event 29 th November 2011 NEED

Schooner Rock Restoration Project EA April 2016 Purpose &amp; Need Need to accelerate the

How to export Image and Animation 1. When we finished our design, we need export it. Sometimes we

Community Outreach on Transportation Priorities Why Do We Need Get Us Moving? We know we need to

What do I need to do to sell my What do I need to do to sell my building product in Australia?

District- -based risk based risk- -need need- -driven Personalized Care Program (PCP) for

Hp's Portal Solutions Hewlett-Packard Reinventing Your Web Presence Need to Attract Need More

What does a protein need to work? Leonid Mirny leonid@mit.edu What does a protein need to work?

Kyle Davis Valve In - Game Economies in Team Fortress and Dota HOW TO MAKE YOUR PRODUCT

LikeStarter: a Smart-contract based Social DAO for Crowdfunding Mirko Zichichi , Michele Contu,

Discovery &amp; Monetisation Tom Greenaway Google @tcmg But fjrst Can someone please

Game De sign Base d on Mic r o-T r ansac tions in Online Game s F e b. 22, 2008 Gyuhwan Oh

Blockchain Claire Vishik (Mic Bowman) General perceptions Blockchain is the most disruptive

1 Agillas Computational Model Tuple Space-Based Coordination Content-addressable shared

A Middleware for Concurrent Programming in MPI Applications Tobias Berka, Helge Hagenauer and

MOSDEN: An Internet of Things Middleware for Resource Constrained Mobile Devices Charith Perera,

Schooner Rock Restoration Project EA April 2016 Purpose & Need Need to accelerate the

Discovery & Monetisation Tom Greenaway Google @tcmg But fjrst Can someone please