wormspace a modular foundation for simple verifiable
play

WormSpace: A Modular Foundation for Simple, Verifiable Distributed - PowerPoint PPT Presentation

WormSpace: A Modular Foundation for Simple, Verifiable Distributed Systems ACM Symposium on Cloud Computing Nov 22, 2019 Ji-Yong Shin 1 Jieung Kim 1 Wolf Honore 1 Hernan Vanzetto 1 Srihari Radhakrishnan 2 Mahesh Balakrishnan 3 Zhong Shao 1 1 Yale


  1. WormSpace: A Modular Foundation for Simple, Verifiable Distributed Systems ACM Symposium on Cloud Computing Nov 22, 2019 Ji-Yong Shin 1 Jieung Kim 1 Wolf Honore 1 Hernan Vanzetto 1 Srihari Radhakrishnan 2 Mahesh Balakrishnan 3 Zhong Shao 1 1 Yale University 2 Duke University 3 Facebook

  2. Cloud and Distributed Application Environment • Numerous distributed services are readily available • New applications are built by combining existing building blocks • New services are continuously developed and deployed Amazon Google Digital Azure Dataflow Publishing SQL DB Amazon EBS Azure Google Virtual ML Engine Network 2

  3. Cloud and Distributed Application Environment • Distributed services use and re-implement similar features Redundant efforts • Distributed systems are complex and difficult to build correctly Subtle bugs Amazon Google Digital Azure Dataflow Publishing SQL DB Amazon EBS Azure Google Virtual ML Engine Network Exploration for a common, bug-free foundation 3

  4. Design Goals 1. Supports common needs for most systems 2. Simple and easy-to-understand APIs System design 3. Flexible support for optimizations Formal Verification 4. Guaranteed correctness with extensibility 4

  5. Write once register (WOR) Writes • Logically equivalent to consensus (Paxos, Chain-replication, PBFT, etc.) • Lowest common denominator WOR WOR WOR WOR • Distributed register – Replicated by construction (fault tolerance, availability, durability) • Write-once-read-many abstraction – Atomically writes data (consistency) – Only one of concurrent writes succeeds (concurrency control, immutability) 5

  6. WORs in Existing Systems • State machine replication (SMR) and multi-Paxos – Append / sequential read to WORs • Shared log: Corfu, Tango – Append / random read to WORs • Transaction coordinator: 2 phase commit – Random write / random read to WORs • Coordination service: chubby, zookeeper – File APIs over SMR on WORs • Group communication: pub/sub – Append / sequential read to WORs

  7. WOR APIs • Capture Paxos: phase 1 prepare – Preemptible lock concept PBFT: pre-prepare + prepare – Coordination before write Chain-replication: no-op – Returns a capture token • Write Paxos: phase 2 accept – Writes to the WOR PBFT: commit – Capture must be valid Chain-replication: write to the chain • Read – Reads the register B A – Returns data or “empty” B A WOR 7

  8. WormSpace (Write-Once-Read-Many Address Space) • An address space of WORs • Write-once-segment (WOS) for management – Unit of allocation (alloc) and garbage collection (trim) – Consists of special WORs and data WORs – Support for batch-capture and batch-write to all WORs … WormSpace … Client library WO Segment WO Segment WO Segment … Meta Trim Meta Trim Meta Trim WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR Distributed servers 1. The complexity of Paxos and S S distributed servers are hidden Paxos Meta Trim S S 2. Can use alternative WOR S WOR WOR implementations 8

  9. WormSpace Applications • WormPaxos – Multi-Paxos / state machine replications • WormLog – Corfu / shared-log Please refer to the paper for interesting latency optimizations • WormTX – 2PC variant / non-blocking atomic commit … WormSpace … WO Segment WO Segment WO Segment … Meta Trim Meta Trim Meta Trim WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR 9

  10. WormPaxos: Flexible Design Choices • Multi-Paxos variant for state machine replication • Design decisions can be easily configured – Various single-degree consensus protocols – Leader election: who allocates a WOS and batch captures it? • Mencius-like rotating leaders are easy to implement • Raft-like leader election can be implemented orthogonally with a timer – When to call trim call determines durability WormSpace APIs are enough and no need to understand Paxos State Machine Replication Commands … WormSpace 1. Paxos … WO Segment WO Segment WO Segment 2. Chain-replication … 3. Etc. Meta Trim Meta Trim Meta Trim WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR WOR 10

  11. Formal Verification • WOR is primitive, but encapsulates key distributed properties – Consistency, durability and availability Can we verify WOR once and reuse it multiple times? • Concurrent Certified Abstraction Layer (CCAL) [Gu, et al. PLDI 18] – Divides software into layers – Verifies each layer – Verifies layers interact correctly – Lower layer properties hold in higher layers 11

  12. Certified Concurrent Abstraction Layer (CCAL) Sched does not need to know about List at all! L Sched (Uses L Queue ) Contextual refinement proof L Queue (Uses L List ) Contextual refinement proof Specification Informally, L List Refinement proof when we run ANY Program (context) on the Queue, C Implementation the state reached by the Queue has a matching state with the List which runs the Queue’s impl and the Program. 12

  13. Verification Details WormPaxos WormPaxos WormLog Preserves distributed WormPaxos WormPaxos WormLog WormPaxos WormPaxos WormLog WormTx protocol correctness WormPaxos WormLog 1. Rely-guarantee style concurrency WormSpace reasoning Write Once Segment (WOS) Oblivious of 2. Use of a logical network log distributed nature Client Library Layers 3. Proof by induction on the log Write Once Register (WOR) Global Layer (distributed system model) Distributed protocol Worm Client Worm Server verification Worm Client Worm Server Server layers Worm Client Worm Server (Paxos immutability) Paxos Proposer Worm Client Paxos Acceptor Worm Server Paxos Proposer Paxos Acceptor Paxos Proposer Paxos Acceptor Paxos Proposer Data Paxos Acceptor Data Data Data Trusted Trusted Data Data Trusted Trusted Data Data The first Trusted Trusted Computing Base Computing Base Computing Base Computing Base end-to-end verification TCB TCB Computing Base Computing Base of distributed system from the OS CertiKOS (fully verified OS) (x86 ASM to Dist Apps) [Gu, et al. OSDI 16] 13

  14. Experience 108K Lines of Coq Proof 359 362 547 WormPaxos WormPaxos WormLog CLoC CLoC CLoC WormPaxos < 1 month WormPaxos WormLog WormPaxos WormPaxos WormLog WormTx WormPaxos WormLog WormSpace 4.5K CLoC 6 months Write Once Segment (WOS) Client Library Layers Write Once Register (WOR) 1.5 months Global Layer (distributed system model) Worm Client Worm Server Worm Client Worm Server Server layers Worm Client Worm Server Paxos Proposer Worm Client Paxos Acceptor Worm Server Paxos Proposer Paxos Acceptor Paxos Proposer Paxos Acceptor Paxos Proposer Data Paxos Acceptor Data Data Data Trusted Trusted Data Data Trusted Trusted Data Data Trusted Trusted Computing Base Computing Base Computing Base Computing Base TCB TCB Computing Base Computing Base • Simple API and no need to understand distributed protocols • Distributed verification is hidden, but verified properties hold 14

  15. Evaluation • WormPaxos vs Egalitarian Paxos and its calssical multi-Paxos impl. – Amazon EC2: 3 servers and 16 client nodes – Write-only benchmark – C vs. Go and different internals WormPaxos EPaxos CPaxos 16 14 12 Latency (ms) 10 8 6 4 2 0 0 5 10 15 20 25 30 Throughput (KOps/s) Verified systems are not slow! 15

  16. Evaluation • WormSpace over CertiKOS – Local cloud with same configuration as Amazon EC2 Ubuntu+WormPaxos CertiKOS+WormPaxos 6 5 4 Latency (ms) 3 2 1 0 0 5 10 15 20 25 30 Throughput (KOps/s) – Over 10X lower throughput and over 1.5X higher latency – Mainly due to inefficiencies in LwIP of CertiKOS 16

  17. Conclusion • Write once registers for programming – Lowest common denominator for most systems – Source of consistency, availability, and durability • Write once register for verification – Primitive module that encapsulates key distributed system properties – Can be verified once and reused to simplify application verification • WormSpace for simple, verifiable distributed systems – Address space of WOR and with extra APIs – Allows for simple and flexible distributed application designs – Facilitates verification of distributed applications 17

  18. Thank you Questions? jiyong.shin@yale.edu 18

Recommend


More recommend