leveraging lightweight virtual machines to easily
play

Leveraging Lightweight Virtual Machines to Easily and Efficiently - PowerPoint PPT Presentation

Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services Jacob R. Lorch Andrew Baumann Lisa Glendenning Dutch T. Meyer Andrew Warfield Our goal: Turn existing binaries into fault-


  1. Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services Jacob R. Lorch Andrew Baumann Lisa Glendenning Dutch T. Meyer Andrew Warfield

  2. Our goal: Turn existing binaries into fault- tolerant services. Jay Lorch, Microsoft Research Tardigrade 2

  3. Example: FDS Metadata Service FDS Metadata server FDS Cluster [Nightingale et al., OSDI 2012] Jay Lorch, Microsoft Research Tardigrade 3

  4. Example: FDS Metadata Service FDS Metadata server Paxos leader election FDS Cluster [Nightingale et al., OSDI 2012] Jay Lorch, Microsoft Research Tardigrade 4

  5. Techniques for Use state machine making code replication library fault-tolerant Explicitly persist state to Better: reliable back-end Transparently make the binary have limitations Requires development fault-tolerant resources Potential for oversight • Non-determinism • Failing to persist state • Exposing non-persisted data • Bugs in crash recovery Jay Lorch, Microsoft Research Tardigrade 5

  6. Outline • Motivation • Background: Asynchronous VM replication • Our solution: Lightweight VM replication • Challenges and solutions • Evaluation Jay Lorch, Microsoft Research Tardigrade 6

  7. Outline • Motivation • Background: Asynchronous VM replication • Our solution: Lightweight VM replication • Challenges and solutions • Evaluation Jay Lorch, Microsoft Research Tardigrade 7

  8. Asynchronous virtual machine replication - Remus [Cully et al., NSDI 2008] Δ Δ primary backup Primary can crash at any time; backup is always a bit behind. Jay Lorch, Microsoft Research Tardigrade 8

  9. Asynchronous virtual machine replication - Remus [Cully et al., NSDI 2008] primary backup Output buffer Jay Lorch, Microsoft Research Tardigrade 9

  10. Asynchronous virtual machine replication - Remus [Cully et al., NSDI 2008] Ack( Δ ) primary backup Output buffer Jay Lorch, Microsoft Research Tardigrade 10

  11. High VM activity can delay packets Baseline Safety Scan Search Indexer Update Deduplication 9697 10000 7741 4942 2460 1716 1000 722 Latency of ping (ms) 276 160 151 104 102 96 88 100 81.9 77 76 71 66 67 43 Processes unrelated to the service 10 can balloon client-perceived latency. 1 50th quantile 95th quantile 99th quantile 99.9th quantile Jay Lorch, Microsoft Research Tardigrade 11

  12. Outline • Motivation • Background: Asynchronous VM replication • Our solution: Lightweight VM replication • Challenges and solutions • Evaluation Jay Lorch, Microsoft Research Tardigrade 12

  13. Our solution: Use lightweight VMs instead Lightweight VM system examples Xax [Douceur et al., OSDI 2008] Native Client [Sehr et al., IEEE S&P 2009] Drawbridge [Porter et al., ASPLOS 2011] Embassies [Howell et al., NSDI 2013] Narrow API Bascule [Baumann et al., Eurosys 2013] (e.g., ~45 calls in Bascule) Other processes Service process LVM host Host OS Jay Lorch, Microsoft Research Tardigrade 13

  14. Lightweight VMs can support unmodified binaries via a library OS Service process LVM API LVM host Jay Lorch, Microsoft Research Tardigrade 14

  15. Lightweight VMs can support unmodified binaries via a library OS Service process Bascule has a Service binary Windows LibOS and a Linux LibOS OS API Library OS LVM API LVM host Jay Lorch, Microsoft Research Tardigrade 15

  16. A lightweight VM is encapsulated by virtue of having a narrow interface Service process Service binary OS API Library OS LVM API LVM host Jay Lorch, Microsoft Research Tardigrade 16

  17. Our approach: Checkpoint by interposing on existing LVM API Service process Checkpoint Interposition Service binary using existing API means OS API LVM and LibOS Library OS don’t have to change LVM API Checkpointer LVM API LVM host Jay Lorch, Microsoft Research Tardigrade 17

  18. [Cully et al., NSDI 2008] Asynchronous Asynchronous Lightweight Lightweight Virtual Virtual Virtual Virtual Machine Machine Machine Machine Replication Replication Replication Replication Service Service Library OS Library OS Checkpointing Checkpointing Host Host primary backup primary backup Jay Lorch, Microsoft Research Tardigrade 18

  19. [Cully et al., NSDI 2008] Our Asynchronous Asynchronous Lightweight Lightweight implementation of LVMR is called Virtual Virtual Virtual Virtual Tardigrade Machine Machine Machine Machine Replication Replication Replication Replication Guest Guest (service+OS) (service+OS) Checkpointing Checkpointing Host Host primary backup primary backup Jay Lorch, Microsoft Research Tardigrade 19

  20. Outline • Motivation • Background: Asynchronous VM replication • Our solution: Lightweight VM replication • Challenges and solutions • Evaluation Jay Lorch, Microsoft Research Tardigrade 20

  21. Practical LVMR poses challenges Challenges Solutions See paper for details Maintaining consistency Vertical Paxos across reconfigurations Incremental checkpointing, Achieving performance checkpoint capping, parallelism, Lessons for LVM potential scaling send buffer size API designers Quiescing, pre-checkpointing, Checkpointing via an enforcing determinism, existing LVM API terminating connections Jay Lorch, Microsoft Research Tardigrade 21

  22. Checkpointing uses certain LVM API features Feature Purpose Ability to track changed Efficiently compute memory pages checkpoint deltas Ability to suspend and Capture consistent snapshot inspect other threads Determinism when API calls Prevent divergence on are replayed failover Host state either replayable Recreate host state on or regeneratable backup Jay Lorch, Microsoft Research Tardigrade 22

  23. Features may not always be in LVM APIs Feature Workaround Ability to track changed memory pages Missing ability to suspend Ability to suspend and Use exceptions, pre- and inspect other threads checkpointing inspect other threads Non-determinism when API Determinism when API calls Hide non-determinism calls are replayed are replayed Host state not replayable or Host state either replayable Expose divergence as error regeneratable or regeneratable condition Jay Lorch, Microsoft Research Tardigrade 23

  24. To capture a checkpoint, we must quiesce and capture all threads’ state. Guest (service + library OS) Memory Checkpointing layer What if the API doesn’t let a thread suspend and inspect another thread? Host primary Jay Lorch, Microsoft Research Tardigrade 24

  25. We can use exceptions to quiesce guest threads Guest (service + library OS) Checkpoint Checkpointing layer Host primary Jay Lorch, Microsoft Research Tardigrade 25

  26. Exception handler quiesces and captures each guest thread’s state Guest (service + library OS) Checkpoint Memory ExceptionHandler( , ) Checkpointing layer Host primary Jay Lorch, Microsoft Research Tardigrade 26

  27. Synchronous system calls complicate quiescence Guest (service + library OS) Checkpointing layer Host primary Jay Lorch, Microsoft Research Tardigrade 27

  28. The wait system call is easy to deal with Guest (service + library OS) select() file select() file descriptor list descriptor list 0x1AC 0x1AC Checkpointing layer Checkpointing layer 0x3BB 0x3BB 0x907 0x907 time-to-checkpoint Host primary Jay Lorch, Microsoft Research Tardigrade 28

  29. General synchronous system calls require pre-checkpointing Guest (service + library OS) Checkpointing layer Checkpointing layer Checkpointing layer Host primary Jay Lorch, Microsoft Research Tardigrade 29

  30. API non-determinism undermines replay CreateSemaphore() CreateSemaphore() returns descriptor returns descriptor 0xAAA 0xBBB primary backup Jay Lorch, Microsoft Research Tardigrade 30

  31. An indirection table can hide non- determinism Guest (service + library OS) Guest (service + library OS) Checkpointing layer Checkpointing layer Guest descriptor Host descriptor Guest descriptor Host descriptor 0x001 0xAAA 0x001 0xBBB 0x002 0x932 0x002 0x909 Host Host primary backup Jay Lorch, Microsoft Research Tardigrade 31

  32. State external to guest needs to be replayable or regeneratable Guest (service + library OS) API provides sockets, not LVM API packets Checkpointing layer Checkpointer can’t capture LVM API TCP session state! Host TCP session state primary backup Jay Lorch, Microsoft Research Tardigrade 32

  33. System-specific modifications may be necessary Guest (service + library OS) Guest (service + library OS) TCP connections get Checkpointing layer Checkpointing layer dropped on a failover. Fixing this requires a major API change to make it use Host Host packets rather than sockets TCP session state primary backup Jay Lorch, Microsoft Research Tardigrade 33

  34. Outline • Motivation • Background: Asynchronous VM replication • Our solution: Lightweight VM replication • Challenges and solutions • Evaluation Jay Lorch, Microsoft Research Tardigrade 34

  35. Effect of external processes - Remus Baseline Safety Scan Search Indexer Update Deduplication 9697 10000 7741 4942 2460 1716 1000 722 Latency of ping (ms) 276 160 151 104 102 96 88 100 81.9 77 76 71 66 67 43 10 1 50th quantile 95th quantile 99th quantile 99.9th quantile Jay Lorch, Microsoft Research Tardigrade 35

Recommend


More recommend