Computing in a Distributed System in the Presence of Benign Failures - - PowerPoint PPT Presentation

▶

Nov 25, 2023 830 likes •1.13k views

Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr e SCHIPER , EPFL) Distributed System medium of communication computational unit No universal computational model for

SLIDE 1

Computing in a Distributed System in the Presence of Benign Failures

Bernadette CHARRON-BOST, CNRS (joint work with Andr´ e SCHIPER , EPFL)

SLIDE 2

Distributed System

computational unit

medium of communication

No universal computational model for distributed systems

SLIDE 3

Two Basic Principles

The model must specify why faults occur

Causes of two different natures:

Degree of synchronism
Failure model

SLIDE 4

Two Basic Principles

The model must specify why faults occur

Causes of two different natures:

Degree of synchronism
Failure model

SLIDE 5

Two Basic Principles

The model must specify why faults occur
The model must specify by whom (culprit) faults occur

SLIDE 6

Two Basic Principles

The model must specify why faults occur
The model must specify by whom faults occur

The notion of faulty component is necessary and useful for the analysis of distributed computations

SLIDE 7

First Principle

bounded delays (failure) (synchronous) arbitrary delays (asynchronous) finite delays

. . . breaks the natural continuum from bounded to

infinite delays !

SLIDE 8

A classical type of systems Synchronous system + crash failures

SLIDE 9

A classical type of systems Synchronous system + crash failures

transmission delays bounded
process speeds bounded or infinite

SLIDE 10

First Principle

breaks the natural continuum from bounded to infinite delays
synchronism degree and failure model are not independant

SLIDE 11

Second Principle

may lead to undesirable conclusions

Only one transmission fault from each node Send omission model

each process is considered faulty

(no algorithm when the entire system is faulty)

SLIDE 12

Second Principle

may lead to undesirable conclusions
faulty processes are allowed to have deviant behaviors

“Every correct process eventually decides” One transmission failure for a message sent by p to q Send omission model:

p is allowed to make no decision

Link failure model:

p and q must make a decision

Receive omission model:

q is allowed to make no decision

SLIDE 13

Second Principle

may lead to undesirable conclusions
faulty processes are allowed to have deviant behaviors
real causes of transmission failures are often unknown

SLIDE 14

Second Principle

may lead to undesirable conclusions
faulty processes are allowed to have deviant behaviors
real causes of transmission failures are often unknown
no evidence that the notion of faulty component is helpful

SLIDE 15

The Heard-Of Model

We just specify transmission faults: we don’t consider anymore by whom nor why faults occur

SLIDE 16

HO: a Round-Based Model

sending phase local

p

computation receive phase

round r

(to all)

At each round, every process sends messages to all

allows us to distinguish semantic and operational

features of computations

SLIDE 17

HO: a Round-Based Model

sending phase local

p

computation receive phase

round r

(to all)

If m is received at round r then m has been sent at round r Rounds are communication-closed layers

SLIDE 18

First Principle

bounded delays (failure) (synchronous) arbitrary delays

late messages are discarded

[Dwork, Lynch & Stockmeyer, 1988] and [Gafni, 1998]

SLIDE 19

HO Process

   Statesp, Initp ⊆ Statesp Sp : ( s, q ) → mq Tp : ( s, µ ) → s′

round r p s s′

At round r, process p receives messages from HO(p, r) supp( µ) = HO(p, r)

SLIDE 20

Second Principle

Faults are specified but not the culprits

[Santoro & Widmayer 1989]

SLIDE 21

HO Algorithm

Distributed algorithm on Π

A = (Statesp, Initp, Sp, Tp) p∈Π

Run of algorithm A

   (s0

p)p∈Π

with s0

p ∈ Initp

(HO(p, r))p∈Π,r>0

SLIDE 22

Kernel of round r:

K(r) =

p∈Π

HO(p, r)

coKernel of round r:

coK(r) = Π \ K(r)

Global kernel (of a run):

K =

p∈Π,r>0

HO(p, r) =

K(r)

Global coKernel (of a run):

coK = Π \ K

SLIDE 23

Communication Predicate

Predicate over collections of heard-of sets Pnosplit :: ∀p, q, ∀r : HO(p, r) ∩ HO(q, r) = ∅ Psp unif :: ∀p, q, ∀r : HO(p, r) = HO(q, r)

SLIDE 24

Communication Predicate

Predicate over collections of heard-of sets

endogenous definition of the system properties

( = Failure Detector model )

SLIDE 25

Pf

K ::

|K| ≥ n − f Pf

HO ::

∀p, ∀r : |HO(p, r)| ≥ n − f Preg :: ∀p, q, ∀r : HO(p, r + 1) ⊆ HO(q, r) Punif :: ∃Π0, ∀p, ∀r : HO(p, r) = Π0 P♦unif :: ∃Π0, ∃r0, ∀p, ∀r > r0 : HO(p, r) = Π0

SLIDE 26

system type communication predicate Synchronous, reliable links Pf

at most f faulty senders Synchronous, reliable links, Pf

K ∧ Preg

at most f crash failures Asynchronous, reliable links, Pf

HO ∧ P♦unif

at most f initial crash failures Idem with n > 2f Pf

K ∧ Punif

Asynchronous, reliable links, P1

and failure detector S ♦ synchronous, reliable links, at most f crash failures Pf

HO ∧ P♦unif

0-25

SLIDE 27

Our Results

Shorter and simpler proofs of important computability results
Communication predicates for which Consensus is solvable

What is necessary and sufficient to solve Consensus?

Interrelationships between communication predicates

(or, how to be not lost in translation ...)

Agreement problems: new algorithms for new systems
Realistic solutions to cope with transient and