Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr´ e SCHIPER , EPFL)
Distributed System medium of communication computational unit No universal computational model for distributed systems
Two Basic Principles • The model must specify why faults occur Causes of two different natures: • Degree of synchronism • Failure model
Two Basic Principles • The model must specify why faults occur Causes of two different natures: • Degree of synchronism • Failure model
Two Basic Principles • The model must specify why faults occur • The model must specify by whom (culprit) faults occur
Two Basic Principles • The model must specify why faults occur • The model must specify by whom faults occur The notion of faulty component is necessary and useful for the analysis of distributed computations
First Principle bounded delays ( synchronous ) finite delays ( asynchronous ) arbitrary delays ( failure ) . . . breaks the natural continuum from bounded to infinite delays !
A classical type of systems Synchronous system + crash failures
A classical type of systems Synchronous system + crash failures • transmission delays bounded • process speeds bounded or infinite
First Principle • breaks the natural continuum from bounded to infinite delays • synchronism degree and failure model are not independant
Second Principle • may lead to undesirable conclusions Only one transmission fault from each node each process is considered faulty Send omission model � (no algorithm when the entire system is faulty)
Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors “Every correct process eventually decides” One transmission failure for a message sent by p to q Send omission model: p is allowed to make no decision � Link failure model: p and q must make a decision � Receive omission model: q is allowed to make no decision �
Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors • real causes of transmission failures are often unknown
Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors • real causes of transmission failures are often unknown • no evidence that the notion of faulty component is helpful
The Heard-Of Model We just specify transmission faults: we don’t consider anymore by whom nor why faults occur
HO: a Round-Based Model p local sending phase receive phase computation (to all) round r At each round, every process sends messages to all allows us to distinguish semantic and operational � features of computations
HO: a Round-Based Model p local sending phase receive phase computation (to all) round r If m is received at round r then m has been sent at round r � Rounds are communication-closed layers
First Principle bounded delays ( synchronous ) arbitrary delays ( failure ) � late messages are discarded [Dwork, Lynch & Stockmeyer, 1988] and [Gafni, 1998]
HO Process Init p ⊆ States p States p , S p : ( s, q ) → m q µ ) → s ′ T p : ( s, � s s ′ p round r At round r , process p receives messages from HO ( p, r ) supp( � µ ) = HO ( p, r )
Second Principle Faults are specified but not the culprits � [Santoro & Widmayer 1989]
HO Algorithm • Distributed algorithm on Π A = ( States p , Init p , S p , T p ) p ∈ Π • Run of algorithm A ( s 0 with s 0 p ∈ Init p p ) p ∈ Π ( HO ( p, r )) p ∈ Π ,r> 0
• Kernel of round r : � K ( r ) = HO ( p, r ) p ∈ Π • coKernel of round r : coK ( r ) = Π \ K ( r ) • Global kernel (of a run): � � K = HO ( p, r ) = K ( r ) r> 0 p ∈ Π ,r> 0 • Global coKernel (of a run): coK = Π \ K
Communication Predicate Predicate over collections of heard-of sets P nosplit :: ∀ p, q, ∀ r : HO ( p, r ) ∩ HO ( q, r ) � = ∅ P sp unif :: ∀ p, q, ∀ r : HO ( p, r ) = HO ( q, r )
Communication Predicate Predicate over collections of heard-of sets endogenous definition of the system properties � ( � = Failure Detector model )
P f | K | ≥ n − f K :: P f ∀ p, ∀ r : | HO ( p, r ) | ≥ n − f HO :: P reg :: ∀ p, q, ∀ r : HO ( p, r + 1) ⊆ HO ( q, r ) P unif :: ∃ Π 0 , ∀ p, ∀ r : HO ( p, r ) = Π 0 P ♦ unif :: ∃ Π 0 , ∃ r 0 , ∀ p, ∀ r > r 0 : HO ( p, r ) = Π 0
system type communication predicate P f Synchronous, reliable links K at most f faulty senders P f Synchronous, reliable links, K ∧ P reg at most f crash failures P f Asynchronous, reliable links, HO at most f crash failures P f Asynchronous, reliable links, HO ∧ P ♦ unif at most f initial crash failures P f Idem with n > 2 f K ∧ P unif P 1 Asynchronous, reliable links, K and failure detector S ♦ synchronous, reliable links, P f at most f crash failures HO ∧ P ♦ unif 0-25
Our Results • Shorter and simpler proofs of important computability results • Communication predicates for which Consensus is solvable � What is necessary and sufficient to solve Consensus? • Interrelationships between communication predicates (or, how to be not lost in translation ...) • Agreement problems: new algorithms for new systems Realistic solutions to cope with transient and � dynamic failures
Recommend
More recommend