File System Replication Pictures Pictures Co-design and Verification of Tool Tool an Available File System Pictures Mahsa Najafzadeh, Marc Shapiro, and Patrick Eugster – Low latency Tool – High availability – Fault tolerance Mahsa Najafzadeh 2 Conflict Example= removing a directory POSIX File Systems vs. Distribution while adding a file into the directory POSIX: Remove Pictures • Assumes operations occur in a total order Update/Remove Conflict Pictures • Requires a synchronous, strong consistency model Pictures Tools • Synchronisation is costly and not available under partition IMG_1234.jpg Tools • In practice, concurrency conflicts are rare IMG_1234.jpg Distribution: Tools • No synchronisation: processes an update locally, propagates Add Photo Pictures effects to other replicas later. Pictures • Weakens consistency and causes conflicts Tools Tools Mahsa Najafzadeh 3 4
Safety Tree Invariant • Convergent: do replicas that delivered the same updates have the same state? • Has a fixed root node • Is the invariant preserved? • Root is an ancestor of every node in the tree (reachability) Sequential: single operation in isolation maintains • the invariant • Every node, which has a name has exactly one parent, Concurrent execution maintains the invariant • except the root • No cycle in the directory structure • Unique names within a directory Mahsa Najafzadeh Mahsa Najafzadeh 6 5 Example= sequential move operation Example= do not move directory fails under self root C is NOT ancestor of A C ¬ (C ↓ * A ) mvDir(C,A) mvDir(C,A) A root B root I ✘ I ✘ u eff I u PRE I u u C C A B A B C ↓ * A : C is reachable from A Mahsa Najafzadeh Mahsa Najafzadeh 7 8
Example= concurrent moves fails Example= concurrent moves fails B is NOT ancestor of A B is NOT ancestor of A root root mvDir(B,A) mvDir(B,A) A B A B mvDir PRE : ¬ (B ↓ * A ) mvDir PRE : ¬ (B ↓ * A ) u PRE u PRE r 1 r 1 root root mvDir(A,B) A B A B r 2 r 2 root B ↓ * A : A is reachable from B B ↓ * A : A is reachable from B A B Mahsa Najafzadeh Mahsa Najafzadeh 9 10 Example= concurrent moves fails Concurrency Control B is NOT ancestor of A root Tokens ≈ concurrency control abstractions mvDir(B,A) A B mvDir PRE : ¬ (B ↓ * A ) Tokens = { τ , …} u PRE r 1 Conflict relation ⋈ ⊆ Tokens × Tokens root root Example - mutual exclusion tokens: B A I ✘ mvDir(A,B) Tokens = { τ }; τ ⋈ τ A B r 2 An operation’s generator may acquire a set of tokens root Operations associated with conflicting tokens cannot A be concurrent B Mahsa Najafzadeh Mahsa Najafzadeh 12 11
Example= moving a directory while Example= moving a directory while updating its content is safe updating its content is ok root root mvDir(B, A) mvDir(B, A) A B A B u PRE u PRE r 1 r 1 root root A B A B r 2 r 2 root addFile(f,B) A B f Mahsa Najafzadeh Mahsa Najafzadeh 13 14 Example= moving a directory while When is Synchronization Necessary? updating its content is ok • CAP theorem: Either (Strong) Consistency or root root Availability, not both, when Partitions occur A B mvDir(B, A) A B • u PRE This is a design trade-off r 1 f root Our approach: u PRE A B • Synchronize (CP) only operations where strictly r 2 necessary for safety root root • Other operations are asynchronous (AP) A B A B addFile(f,B) Safety = convergent + invariants f f Mahsa Najafzadeh Mahsa Najafzadeh 15 16
Model Model Precondition Precondition u val Safety u val Safety client client u PRE u eff u PRE u eff r 1 r 1 origin replica origin replica u u v eff u eff u eff r 2 r 2 other replica other replica u eff u eff v eff r 3 r 3 other replica other replica v Generator (@origin) reads state from one copy and maps operation u to: Deliver(@all replicas): causally dependent messages delivered in order Return value: u val ∈ State ➞ Value Effects: u eff ∈ State ➞ (State ➞ State) Mahsa Najafzadeh Mahsa Najafzadeh Mahsa Najafzadeh Mahsa Najafzadeh 17 18 Add-wins directory= removing a directory A Mostly-Available, Convergent and while adding a file into the directory Correct File System Design • Allows common file system operations can run without Remove Pictures Update/Remove Conflict Pictures synchronization except for moves Pictures IMG_1234.jpg Tools • Maintains the tree invariant Tools IMG_1234.jpg Pictures • Guarantees convergence using replicated data types [Shapiro + 2011] Tools Add Photo • Name conflicts: Pictures • Merge directories Pictures • Rename files • Update/Remove conflicts: add-wins directory Tools Tools Mahsa Najafzadeh 19 20
Effector Safety: CISE Analysis: Proves Application is Correct Example= move requires precondition • Rely-Guarantee reasoning for a causally-consistent system with root only polynomial complexity C • Consists of three analysis rules: Effector Safety: mvdir(C,A) A root B Every effect in isolation execution maintains the invariant I (sequential I u eff I u PRE u C safety) Commutativity: A B invariant invariant Concurrent operations commute (convergence) Stability: Preconditions are stable under concurrency (concurrent safety) • do not move directory under self If satisfied: the invariant I is guaranteed in every possible execution [Gotsman et al. POPL 2016 ’Cause I’m Strong Enough: Reasoning about Consistency Choices in Distributed Systems] Mahsa Najafzadeh Mahsa Najafzadeh 21 22 Stability Rule: Stability Rule: precondition is stable under concurrent effect precondition is stable under concurrent effect 1. Effector Safety: u eff preserves I when executed 1. Effector Safety: u eff preserves I when executed in any state satisfying u PRE in any state satisfying u PRE precondition of u holds precondition of u holds I I u PRE u PRE u eff u eff u u σ σ r 1 r 1 I ? I I u eff v eff σ σ r 2 r 2 v eff Mahsa Najafzadeh Mahsa Najafzadeh 23 24
Stability Rule: Stability Rule: precondition is stable under concurrent effect precondition is stable under concurrent effect 1. Effector Safety: u eff preserves I when executed 1. Effector Safety: u eff preserves I when executed in any state satisfying u PRE in any state satisfying u PRE 2. P recondition Stability: u PRE will hold when u eff is 2. P recondition Stability: u PRE will hold when u eff is applied at any replica applied at any replica I u PRE I u PRE u eff u eff u u σ σ r 1 r 1 I ? I I I u eff u eff v eff v eff σ σ r 2 r 2 v eff v eff Is it preserved u PRE after executing v? u PRE ? Mahsa Najafzadeh Mahsa Najafzadeh 25 26 Necessary and Sufficient Concurrency Example: avoid conflicting moves Controls for Move LCA(A,B) { τ (B), τ (A) } T T T T mvDir(B,A) root ✔ A B r 1 mvDir(A,B) A B r 1 root ( τ (A) ⋈ τ (A) ) { τ (A), τ (B) } ( τ (B) ⋈ τ (B) ) mvDir(A,B) ✘ r 2 A B r 2 root • Add tokens, avoid mvDir || mvDir • A mutually exclusive token for each A B directory d ∈ Dir : ( τ (d) ⋈ τ (d) ) Mahsa Najafzadeh Mahsa Najafzadeh 27 28
Conclusion Verification Results • A rigorous approach for modeling file system #O #Tokens #Invarian Average Applications Anomaly behavior for both centralized/synchronous and P ts Time(ms) replicated asynchronous semantics Sequential 7 7 1 NO 278 • Common operations except move to run without safety concurrency controls Concurrent 7 0 1 1297 violation • A hierarchical least-common ancestor concurrency Fully-Asynchronous 7 0 1 duplication 2350 control mechanism is necessary and sufficient for move operations Mostly-Asynchronous 7 2 1 1570 NO Mahsa Najafzadeh Mahsa Najafzadeh 29 30 Future Work • Translate the move concurrency controls into an efficient implementation • Integrate hard links, devices, and mounts into model Backup Slides • Reason about the file system behavior in the presence of failures Q/A Mahsa Najafzadeh 31
Removing Token Over Source Removing Token Over Source Directory Directory root root { τ (B), τ (C)} { τ (B), τ (C)} D F D F mvDir(A,B) mvDir(A,B) A A C C r 1 r 1 B B H H { τ (F) } mvDir(A,F) r 2 r 2 root root D F D F A C A C B H Mahsa Najafzadeh Mahsa Najafzadeh B 22/04/16 33 H 34 Removing Token Over Source Removing Token Over Destination Directory Directory root root { τ (B), τ (C)} { τ (A), τ (C)} D F D F mvDir(A,B) mvDir(A,B) A A C C r 1 B B r 1 H H root D F { τ (F) } A C mvDir(A,F) B H r 2 r 2 root root D F D F A C A C B H B Mahsa Najafzadeh H Mahsa Najafzadeh 35 36
Removing Token Over Destination Removing Token Over Destination Directory Directory root root { τ (A), τ (C)} { τ (A), τ (C)} D F D F mvDir(A,B) mvDir(A,B) A A C C B B r 1 H r 1 H root D F { τ (B), τ (A) } { τ (B), τ (A) } A C mvDir(B,H) mvDir(B,H) H B r 2 r 2 root root root root D D F F F F D D A A A C A C C C H B H B Mahsa Najafzadeh Mahsa Najafzadeh 37 38 B B H H Removing Token Over Ancestors Removing Token Over Ancestors root root up to LCA up to LCA D D F F { τ (A), τ (B)} { τ (A), τ (B)} A A C C mvDir(A,B) mvDir(A,B) B B H H r 1 r 1 { τ (C), τ (H) } mvDir(C,H) r 2 r 2 root root D F F D A A C C B Mahsa Najafzadeh B Mahsa Najafzadeh H 39 H 40
Recommend
More recommend