computing in a distributed system in the presence of
play

Computing in a Distributed System in the Presence of Benign Failures - PowerPoint PPT Presentation

Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr e SCHIPER , EPFL) Distributed System medium of communication computational unit No universal computational model for


  1. Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr´ e SCHIPER , EPFL)

  2. Distributed System medium of communication computational unit No universal computational model for distributed systems

  3. Two Basic Principles • The model must specify why faults occur Causes of two different natures: • Degree of synchronism • Failure model

  4. Two Basic Principles • The model must specify why faults occur Causes of two different natures: • Degree of synchronism • Failure model

  5. Two Basic Principles • The model must specify why faults occur • The model must specify by whom (culprit) faults occur

  6. Two Basic Principles • The model must specify why faults occur • The model must specify by whom faults occur The notion of faulty component is necessary and useful for the analysis of distributed computations

  7. First Principle bounded delays ( synchronous ) finite delays ( asynchronous ) arbitrary delays ( failure ) . . . breaks the natural continuum from bounded to infinite delays !

  8. A classical type of systems Synchronous system + crash failures

  9. A classical type of systems Synchronous system + crash failures • transmission delays bounded • process speeds bounded or infinite

  10. First Principle • breaks the natural continuum from bounded to infinite delays • synchronism degree and failure model are not independant

  11. Second Principle • may lead to undesirable conclusions Only one transmission fault from each node each process is considered faulty Send omission model � (no algorithm when the entire system is faulty)

  12. Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors “Every correct process eventually decides” One transmission failure for a message sent by p to q Send omission model: p is allowed to make no decision � Link failure model: p and q must make a decision � Receive omission model: q is allowed to make no decision �

  13. Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors • real causes of transmission failures are often unknown

  14. Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors • real causes of transmission failures are often unknown • no evidence that the notion of faulty component is helpful

  15. The Heard-Of Model We just specify transmission faults: we don’t consider anymore by whom nor why faults occur

  16. HO: a Round-Based Model p local sending phase receive phase computation (to all) round r At each round, every process sends messages to all allows us to distinguish semantic and operational � features of computations

  17. HO: a Round-Based Model p local sending phase receive phase computation (to all) round r If m is received at round r then m has been sent at round r � Rounds are communication-closed layers

  18. First Principle bounded delays ( synchronous ) arbitrary delays ( failure ) � late messages are discarded [Dwork, Lynch & Stockmeyer, 1988] and [Gafni, 1998]

  19. HO Process  Init p ⊆ States p States p ,  S p : ( s, q ) → m q µ ) → s ′ T p : ( s, �  s s ′ p round r At round r , process p receives messages from HO ( p, r ) supp( � µ ) = HO ( p, r )

  20. Second Principle Faults are specified but not the culprits � [Santoro & Widmayer 1989]

  21. HO Algorithm • Distributed algorithm on Π A = ( States p , Init p , S p , T p ) p ∈ Π • Run of algorithm A  ( s 0 with s 0 p ∈ Init p p ) p ∈ Π  ( HO ( p, r )) p ∈ Π ,r> 0 

  22. • Kernel of round r : � K ( r ) = HO ( p, r ) p ∈ Π • coKernel of round r : coK ( r ) = Π \ K ( r ) • Global kernel (of a run): � � K = HO ( p, r ) = K ( r ) r> 0 p ∈ Π ,r> 0 • Global coKernel (of a run): coK = Π \ K

  23. Communication Predicate Predicate over collections of heard-of sets P nosplit :: ∀ p, q, ∀ r : HO ( p, r ) ∩ HO ( q, r ) � = ∅ P sp unif :: ∀ p, q, ∀ r : HO ( p, r ) = HO ( q, r )

  24. Communication Predicate Predicate over collections of heard-of sets endogenous definition of the system properties � ( � = Failure Detector model )

  25. P f | K | ≥ n − f K :: P f ∀ p, ∀ r : | HO ( p, r ) | ≥ n − f HO :: P reg :: ∀ p, q, ∀ r : HO ( p, r + 1) ⊆ HO ( q, r ) P unif :: ∃ Π 0 , ∀ p, ∀ r : HO ( p, r ) = Π 0 P ♦ unif :: ∃ Π 0 , ∃ r 0 , ∀ p, ∀ r > r 0 : HO ( p, r ) = Π 0

  26. system type communication predicate P f Synchronous, reliable links K at most f faulty senders P f Synchronous, reliable links, K ∧ P reg at most f crash failures P f Asynchronous, reliable links, HO at most f crash failures P f Asynchronous, reliable links, HO ∧ P ♦ unif at most f initial crash failures P f Idem with n > 2 f K ∧ P unif P 1 Asynchronous, reliable links, K and failure detector S ♦ synchronous, reliable links, P f at most f crash failures HO ∧ P ♦ unif 0-25

  27. Our Results • Shorter and simpler proofs of important computability results • Communication predicates for which Consensus is solvable � What is necessary and sufficient to solve Consensus? • Interrelationships between communication predicates (or, how to be not lost in translation ...) • Agreement problems: new algorithms for new systems Realistic solutions to cope with transient and � dynamic failures

Recommend


More recommend