The Time-less Datacenter Paul Borrill and Alan H. Karp Earth Computing The Datacenter Resilience Company Stanford EE Computer Systems Colloquium Wednesday, November 16, 2016 http://ee380.stanford.edu
Cloud Computing The Three Taxes: 1. Complexity 2. Fragility 3. Vulnerability 2
Twitter Today Systems can fail in catastrophic ways leading to death or tremendous financial loss. Although their are many potential causes including physical failure, human error, and environmental factors, design errors are increasingly becoming the most serious culprit* *NASA Formal Methods Program: https://shemesh.larc.nasa.gov/fm/fm-why-new.html 3 Earth Computing | The Datacenter Resilience Company
Key Computer Science Problems Reliable Consensus • Generals Problem (no fixed length protocol exists to guarantee a reliable solution in an environment where messages can get lost) • Slow Node vs. Link Failure Indistinguishability. I.e. what can one side of a failed link assume about a partner or cohort on the other side? FLP Result • Impossibility of Distributed Consensus with One Faulty Process Key Idea: • Don’t depend on processes to provide liveness , use a new kind of link 4 Earth Computing | The Datacenter Resilience Company
Problem: Event Ordering is Hard • In a distributed system over a general Process 11 12 13 14 P P:0 Q:-- slope ≤ c P P P P slope ≤ c network we can’t tell if event at process R R:-- P:1 P:2 P:3 P:4 Q:2 Q:2 Q:3 Q:5 R:1 R:1 R:3 R:5 happened before event at process Q, unless Causal History 24 25 21 22 Q 23 P:-- P caused R in some way Q:0 slope ≤ c Q Q Q Q Q Future R:-- P:-- P:-- P:-- P:2 P:2 Q:1 Q:2 Q:3 Q:4 Q:5 Effect R:1 R:1 R:1 R:1 R:1 slope ≤ c • Causal Trees provide this guarantee when 31 32 33 34 35 R P:-- Q:-- R R R R R R:0 t P:-- P:-- P:-- P:2 P:2 Q:-- Q:3 Q:3 Q:5 Q:5 they are stable R:1 R:2 R:3 R:4 R:5 Process • Dynamic Causal Trees provide guarantees 11 12 13 14 P P:0 Q:-- P P P P through failure & healing , iff you have AIT on R:-- P:1 P:2 P:3 P:4 Q:2 Q:2 Q:3 Q:5 slope ≤ c R:1 R:1 R:3 R:5 each link 21 22 23 24 25 Q P:-- Q:0 Q Q Q Q Q slope ≤ c slope ≤ c Future R:-- P:-- P:-- P:-- P:2 P:2 • Needs Atomic Information Transfer Q:1 Q:2 Q:3 Q:4 Q:5 Effect R:1 R:1 R:1 R:1 R:1 Causal History 32 33 31 34 35 R (AIT) in the Link P:-- Q:-- R R R R R R:0 t P:-- P:-- P:-- P:2 P:2 Q:-- Q:3 Q:3 Q:5 Q:5 R:1 R:2 R:3 R:4 R:5 Earth Computing | The Datacenter Resilience Company 5
Problem: Consensus is Hard • Failure detectors have failed • Paxos (Fail-Recover) to solve the problem • Robust Algorithm but hard to understand & get right. • Causal • 2PC (Fail-Stop) Trees make roles robust, easier to understand & verify • Vulnerable to coordinator failure (no safety proof) • 3PC vulnerable to network partitions (no liveness proof) Earth Computing | The Datacenter Resilience Company 6
Why? Because The Network is Flaky! • App developers believe the network is the problem • Networks drop, delay, duplicate & reorder packets • Networking people believe the apps are the problem • The network end to end principle: Apps should retry to distinguish between delays & drops … but … retries* ruin TCP’s ordering guarantees • Both are incorrect . Solution requires a simple , but fresh perspective Peter Bailis, Kyle Kingsbury. The network is reliable * Application retries (i.e. opening a new socket) Earth Computing | The Datacenter Resilience Company 7
Datacenter Failures Cascade Interdependent failures Reconstruction storms Timeout storms Gossip storms Switches are DReDDful Cascade failures They D rop, Re order, D elay and D uplicate Packets Earth Computing | The Datacenter Resilience Company 8
It’s Time to Simplify Delta Amazon Google Apple Netflix Paypal … Earth Computing | The Datacenter Resilience Company 9
The Big Idea e e L s r e n r e B m i T • Document Language ( html ) World Wide Web Key Idea: ONE WAY LINKS 2 Simple Sets of Rules • Connection Protocol ( http ) Mere mortals can now get their Cloud Computing computers to talk to each other g n i t u p m o C h t r a E • Graph Language ( gvml ) Earth Computing Key Idea: TWO WAY LINKS 2 Simple Sets of Rules • Connection Protocol ( eclp ) Mere mortals can now manage Earth Computing their infrastructures Earth Computing | The Datacenter Resilience Company 10
Distributed Systems Primitives • New Concurrency Libraries Key Idea: CAS Shared Memory Lock-Free • Atomic Instruction • Atomic RMW data structures Concurrent Safety {While (CAS(oldvalue,newvalue, ) != new value} Non-Blocking • Deterministic, In-Order Key Idea: AIT Reversible Token Recoverable • Atomic Information • Reversible Atomic Message Atomic Tokens Deterministic Recoverability Durable Indivisible Property {Transfer (AIT(tokenID,Notify=NO, ) != Continue} Earth Computing | The Datacenter Resilience Company 11
Simpler Wiring: N2N, Switchless Today’s Networking: Servers & Switches EARTH Computing: Cells & Links DC DC Gateway GW GW Spine SN SN SN SN Node Leaf LN LN LN LN LN LN LN LN Node ToR ToR ToR ToR ToR ToR ToR ToR ToR ToR ToR ToR ToR Servers, Any to Any (IP) addressing C2C Lattice of Cells & Links Earth Computing | The Datacenter Resilience Company 12
Fundamentally Simpler Today: Internal Segregation Firewalls EARTH: Dynamic Confinement Domains The Datacenter Today The Datacenter Simplified 13 Earth Computing | The Datacenter Resilience Company
Earth Computing Network Fabric Split infrastructure into: Outside Data Center World Cloud datacenter accessed by Cloudplane untrusted legacy protocols EarthCore Earth dynamic, resilient, programmable topologies Core where data is immutable, secure, protected, & resilient to perturbations (failures, disasters, attacks) Earth Computing | The Datacenter Resilience Company 14
The Big Idea EarthCore Confidential | Earth Computing Inc. 15
Logical Foundation for Resilience NIC NIC NIC N C I I N C N C I I N C N C I I N C Cell Cell Cell Agent Agent Agent C N I C N I N C I C N I N C I N I C NIC NIC NIC NIC NIC NIC N C N C I I N C I I N C N C I I N C Cell Cell Cell Agent Agent Agent C N C I N N I C I C N I N C I N I C NIC NIC NIC NIC NIC NIC N C I I N C N C N C I I N C I I N C Cell Cell Cell Agent Agent Agent C N I C N I N C C I N N I C I N I C NIC NIC NIC NIC NIC NIC N C I I N C N C N C I I N C I I N C Cell Cell Cell Agent Agent Agent C N I C N I N C C I N N I C I N I C NIC NIC NIC NIC NIC NIC N C I I N C N C N C I I N C I I N C Cell Cell Cell Agent Agent Agent C N I C N I N C C I N N I C I N I C NIC NIC NIC Fabric 16 Earth Computing | The Datacenter Resilience Company
New Distributed Systems Foundation EARTH Computing Link Protocol (ECLP) • Events: Replaces Heartbeats, Timeouts • Addresses the Common Knowledge* Problem NIC NIC NIC NIC NIC NIC C C C e e e l l l l l A l Agent NIC NIC Agent Cable NIC NIC Cable g NIC NIC e n t NIC NIC NIC NIC NIC NIC 17 Earth Computing | The Datacenter Resilience Company *Knowledge and Common Knowledge in a Distributed Environment – Joseph Y. Halpern & Yoram Moses ’90 (initial version 1984).
Composable Presence Management NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC Cell Cell Router Router Router Agent Cable NIC NIC Cable NIC NIC Cable NIC NIC Cable Agent NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC 18
Composable Presence Management NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC Cell Cell Router Router Router Agent Cable NIC NIC Cable NIC NIC Cable NIC NIC Cable Agent NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC NIC 19
Demo 20
Two Generals Problem 21
Example Use Cases Two Phase Commit Paxos Link Reversal 22 Earth Computing | The Datacenter Resilience Company
Recommend
More recommend