A Lattice-Theoretic Approach to Monitoring Distributed Computations Vijay K. Garg Neeraj Mittal Parallel and Distributed Systems Lab, Department of Electrical and Computer Engineering, The University of Texas at Austin, Advanced Networking and Dependable Systems Laboratory Computer Science Department The University of Texas at Dallas RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 1
Motivation Debugging and Testing Distributed Programs: Global Breakpoints: stop the program when x 1 + x 2 > x 3 Traces need to be analyzed to locate bugs. Software Fault-Tolerance: Distributed programs are prone to errors. ◮ Concurrency, nondeterminism, process and channel failures Software faults are dominant reasons for system outages Need to take corrective action when the current computation violates a safety invariant Software Quality Assurance: Can I trust the results of the computation? Does it satisfy all required properties? RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 2
What is a Distributed Computation? Distributed Program: a computer program that runs on a distributed system Distributed Computation: A single execution of a distributed program Assumptions: No shared memory, No shared clock, Asynchrony in communication RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 3
Modeling a Distributed Computation A computation is ( E , → ) where E is the set of events and → (happened-before) is the smallest relation that includes: e occurred before f in the same process implies e → f . e is a send event and f the corresponding receive implies e → f . if there exists g such that e → g and g → f , then e → f . e 1 e 2 e 3 e 4 e 5 P 1 f 1 f 2 f 3 f 4 f 5 P 2 g 1 g g 3 g 4 2 P 3 [Lamport 78] RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 4
Modeling a computation as a Poset ( E , → ) is an irreflexive poset ( → is an irreflexive and transitive binary relation on E ) Can we exploit the theory of ordered sets? join/meet of elements, width of a poset, dimension of a poset, order ideals Example: Order ideal of a poset corresponds to a consistent global state. The set of all order ideals form a distributive lattice under set containment relation. Can we exploit the theory of distributive lattices for analyzing consistent global states ? representing sublattices, lattice congruences RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 5
Talk Outline Motivation 1 Background: Posets and Lattices 2 Global Predicate Detection Problem 3 Cooper and Marzullo’s Algorithm Alagar and Venkatesan’s Algorithm Lexical Enumeration of Consistent Global States Predicate Detection for Special Classes 4 Linear Predicates Relational Predicates Slicing 5 Basis Temporal Logic 6 Syntax and Semantics Semiregular Predicates Algorithm to detect BTL RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 6
Background: Posets A poset (partially ordered set) is a tuple ( X , ≤ ) where X is any set and ≤ is a binary relation on X with the following properties: reflexive, antisymmetric and transitive ( X , < ) is an irreflexive poset when < is irreflexive and transitive. Examples: 1 ( N , < ): set of natural numbers under usual less than relation 2 ( N k , < ): set of k -dimensional vectors under component-wise comparison (2 , 3 , 0) < (3 , 3 , 1) (2 , 3 , 0) � < (1 , 4 , 2) 3 ( E , → ): set of events of a distributed computation under the happened-before relation RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 7
Background: Poset Terminology b 1 b 2 b 3 a 1 a 2 a 3 x || y ( x incomparable with y ): ¬ ( x < y ) ∧ ¬ ( y < x ) chain: Y ⊆ X is a chain if every distinct pair of elements from Y is comparable antichain: Y ⊆ X is an antichain if every distinct pair of elements from Y is incomparable height of a poset: size of the longest chain in the poset width of a poset: size of the longest antichain in the poset width antichain: antichains of size equal to the width RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 8
Background: Lattices For any z ∈ X , z is the join of x and y , i.e., z = x ⊔ y iff x ≤ z and y ≤ z ∀ z ′ ∈ X , ( x ≤ z ′ ∧ y ≤ z ′ ) ⇒ z ≤ z ′ . The meet of two elements z = x ⊓ y is defined dually. A poset ( X , ≤ ) is a lattice iff it is closed under meets and joins. ∀ x , y ∈ X , x ⊔ y ∈ X and x ⊓ y ∈ X . e ❢ ❢ ❢ b f ✼ ✓ ✂✂ ✍ ❙ ♦ ❈ ❖ ✻ ❈ ✓ ❙ ❈ ✓ ❙ ✂ e d ❈ ❢ ❢ ✂ � ✒ d ❈ ✻ ❅ ■ ❅ � ✻ ❢ ❈ ✻ ❅ � ❈ � ❅ ❈ � ❅ c b ❢ � ❅ ✼ ✓ ❢ ❢ a b c ✓ ❢ ❢ ■ ❅ ■ ❅ ✒ � ❅ ✓ ❅ � a ❢ ❅ � a ❢ RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 9
Background: Sublattices Which subsets form sublattices? d c d c b b a a (i) (ii) d d c b c a b a (iv) (iii) RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 10
Background: Distributive Lattices A lattice ( L , ≤ ) is a distributive lattice iff ∀ x , y , z ∈ L : x ⊔ ( y ⊓ z ) = ( x ⊔ y ) ⊓ ( x ⊔ z ). Fact A lattice is distributive iff it does not have a pentagon or a diamond as a sublattice. 1 1 q r q r p p 0 0 RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 11 Figure: Examples of nondistributive lattices
Background: Order Ideals of a Poset b 1 b 2 b 3 a 1 a 2 a 3 Let ( X , < ) be any poset. A subset Y ⊆ X an order ideal (or a downset) if z ∈ Y ∧ y < z ⇒ y ∈ Y . Are these order ideals? Y 1 = { a 1 , b 1 } Y 2 = { a 1 , a 3 , b 1 } Y 3 = {} Y 4 = X Y ⊆ X an order filter if z ∈ Y ∧ z < y ⇒ y ∈ Y RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 12
Example: Order Ideals Consistent Global State (CGS) of a Distributed System G 1 G 2 P 1 m 1 m 3 P 2 m 2 P 3 Consistent global state = subset of events executed so far A subset G of E is a consistent global state (also called a consistent cut ) if ∀ e , f ∈ E : ( f ∈ G ) ∧ ( e → f ) ⇒ ( e ∈ G ) RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 13
Background: Lattice of order ideals Theorem The set of all order ideals of any poset forms a distributive lattice under the set containment relation. The set of ideals forms a lattice if X and Y are ideals then so are X ∩ Y and X ∪ Y meet → intersection join → union b 1 b 2 b 3 a 1 a 2 a 3 Y 1 = { a 1 , a 3 , b 1 } Y 2 = { a 1 , a 2 , b 2 } Y 1 ∪ Y 2 = { a 1 , a 2 , a 3 , b 1 , b 2 } Y 1 ∩ Y 2 = { a 1 } RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 14
Ideal Lattice The lattice of ideals is distributive union distributes over intersection which of the following graphs are possible CGS lattices? Corollary: The set of all CGS of a computation forms a distributive lattice. RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 15
Modeling using States vs Events One can model a computation using states rather than events x:=x+2 send(x) x:=x−1 y:=y+3 receive(y) y:=2*y Equivalent state based model 1,3 0,1 2,3 pc,x 3,2 send(x) x := x+2 x := x−1 y := y+3 receive(y) y:=2*y 1,4 2,3 pc,y 0,1 3,6
Consistent Global States in the State based Model a 0 a ′ b ′ c ′ a b c e 0 e ′ f ′ g ′ e g f (a) Event Based (b) State Based Model Model { c ′ , g ′ } { a , b , c , e , f , g } { c ′ , f ′ } { b ′ , g ′ } { a , b , c , e , f } { a , b , e , f , g } { c ′ , e ′ } { b ′ , f ′ } { a , b , c , e } { a , b , e , f } { c ′ , e 0 } { b ′ , e ′ } { a , b , c } { a , b , e } { b ′ , e 0 } { a ′ , e ′ } { a , b } { a , e } { a ′ , e 0 } { a 0 , e ′ } { a } { e } {} { a 0 , e 0 } (c) CGS (d) CGS
Talk Outline Motivation 1 Background: Posets and Lattices 2 Global Predicate Detection Problem 3 Cooper and Marzullo’s Algorithm Alagar and Venkatesan’s Algorithm Lexical Enumeration of Consistent Global States Predicate Detection for Special Classes 4 Linear Predicates Relational Predicates Slicing 5 Basis Temporal Logic 6 Syntax and Semantics Semiregular Predicates Algorithm to detect BTL RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 18
Global Predicate Detection Predicate: A global condition expressed using variables on processes e.g., more than one process is in critical section, there is no token in the system Problem: find a consistent cut that satisfies the given predicate X Y p 1 critical sections p 2 The global predicate may express: a software fault or a global breakpoint RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 19
Two interpretations of predicates t s 0 s 1 s 2 s 3 3 2 t 2 t 3 t 0 t 1 1 s (0,0) 1 2 3 Possibly:Φ: exists a path from the initial state to the final state along which Φ is true on some state Definitely:Φ : for all paths from the initial state to the final state Φ is true on some state RV’14 Tutorial (Garg and Mittal) Monitoring Distributed Computations 20
Recommend
More recommend