Agreement in Distributed Systems CS 188 Distributed Systems - PowerPoint PPT Presentation

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Lecture 12 Page 1 CS 188,Winter 2015

Introduction • We frequently want to get a set of nodes in a distributed system to agree • Commitment protocols and mutual exclusion are particular cases • The approaches we discussed for those work in limited situations • In general, when can we reach agreement in a distributed system? Lecture 12 Page 2 CS 188,Winter 2015

Basics of Agreement Protocols • What is agreement? • What are the necessary conditions for agreement? Lecture 12 Page 3 CS 188,Winter 2015

What Do We Mean By Agreement? • In simplest case, can n processors agree that a variable takes on value 0 or 1? – Only non-faulty processors need agree • More complex agreements can be built from this simple agreement Lecture 12 Page 4 CS 188,Winter 2015

Conditions for Agreement Protocols • Consistency – All participants agree on same value and decisions are final • Validity – Participants agree on a value at least one of them wanted • Termination/Progress – All participants choose a value in a finite number of steps Lecture 12 Page 5 CS 188,Winter 2015

Challenges to Agreement • Delays – In message delivery – In nodes responding to messages • Failures – And recovery from failures • Lies by participants – Or innocent errors that have similar effects Lecture 12 Page 6 CS 188,Winter 2015

Failures and Agreement • Failures make agreement difficult – Failed nodes don’t participate – Failed nodes sometimes recover at inconvenient times – At worst, failed nodes participate in harmful ways • Real failures are worse than fail-stop Lecture 12 Page 7 CS 188,Winter 2015

Types of Failures • Fail-stop – A nice, clean failure – Processor stops executing anything • Realistic failures – Partitionings – Arbitrary delays • Adversarial failures – Arbitrary bad things happen Lecture 12 Page 8 CS 188,Winter 2015

Election Algorithms • If you get everyone to agree a particular node is in charge, • Future consensus is easy, since he makes the decisions • How do you determine who’s in charge? – Statically – Dynamically Lecture 12 Page 9 CS 188,Winter 2015

Static Leader Selection Methods • Predefine one process/node as the leader • Simple – Everyone always knows who’s the leader • Not very resilient – If the leader fails, then what? Lecture 12 Page 10 CS 188,Winter 2015

Dynamic Leader Selection Methods • Choose a new leader dynamically whenever necessary • More complicated • But failure of a leader is easy to handle – Just elect a new one • Election doesn’t imply voting – Not necessarily majority-based Lecture 12 Page 11 CS 188,Winter 2015

Election Algorithms vs. Mutual Exclusion Algorithms • Most mutual exclusion algorithms don’t care much about failures • Election algorithms are designed to handle failures • Also, mutual exclusion algorithms only need a winner • Election algorithms need everyone to know who won Lecture 12 Page 12 CS 188,Winter 2015

A Typical Use of Election Algorithms • A group of processes wants to periodically take a distributed snapshot • They don’t want multiple simultaneous snapshots • So they want one leader to order them to take the snapshot Lecture 12 Page 13 CS 188,Winter 2015

Problems in Election Algorithms • Some of the nodes may have failed before the algorithm starts • Some of the nodes may fail during the algorithm • Some nodes may recover from failure – Possible at inconvenient times • What about partitions? Lecture 12 Page 14 CS 188,Winter 2015

Election Algorithms and the Real Work • The election algorithm is usually overhead • There’s a real computation you want to perform • The election algorithm chooses someone to lead it • Having two leaders while real computation is going on is bad Lecture 12 Page 15 CS 188,Winter 2015

The Bully Algorithm • The biggest kid on the block gets to be the leader • But what if the biggest kid on the block is taking his piano lesson? • The next biggest kid gets to be leader – Until the piano lesson is over . . . Lecture 12 Page 16 CS 188,Winter 2015

Electing a Bully The Spike’s piano Mom The kids come out to play lesson hasn’t ends let him out yet I’m the leader, Hey, Hey, I’m here, where I’m the leader, I’m here, Hey, Hey, Butch! and we’re playing Peewee! Peewee! Spike! Spike! are you sissies? let’s play tag! who else is? Butch! Spike! Cuthbert Cuthbert! baseball! Lecture 12 Page 17 CS 188,Winter 2015

Assumptions of the Bully Algorithm • A static set of possible participants – With an agreed-upon order • All messages are delivered with T m seconds • All responses are sent within T p seconds of delivery • These last two imply synchronous behavior Lecture 12 Page 18 CS 188,Winter 2015

The Basic Idea Behind the Bully Algorithm • Possible leaders try to take over • If they detect a better leader, they agree to its leadership • Keep track of state information about whether you are electing a leader • Only do real work when you agree on a leader Lecture 12 Page 19 CS 188,Winter 2015

The Bully Algorithm and Timeouts • Call out the biggest kid’s name – If he doesn’t answer soon enough, call out the next biggest kid’s name – Until you hear an answer – Or the caller is the biggest kid – Then take over, by telling everyone else you’re the leader Lecture 12 Page 20 CS 188,Winter 2015

The Bully Algorithm At Work • One node is currently the coordinator • It expects a certain set of nodes to be up and participating • The coordinator asks all other nodes • If an expected node doesn’t answer, start an election – Also if it answers in the negative • If an unexpected node answers, start an election Lecture 12 Page 21 CS 188,Winter 2015

The Practicality of the Bully Algorithm • The bully algorithm works reasonably well if the timeouts are effective – A timeout occurring really means the site in question is down • And there are no partitions at all – If there are, what happens? Lecture 12 Page 22 CS 188,Winter 2015

The Invitation Algorithm • More practical than bully algorithm – Doesn’t depend on timeouts • But its results are not as definitive • An asynchronous algorithm Lecture 12 Page 23 CS 188,Winter 2015

The Basic Idea Behind the Invitation Algorithm • A current coordinator tries to get all other nodes to agree to his leadership • If more than one coordinator around, get together and merge groups • Use timeouts only to allow progress, not to make definitive decisions • No set priorities for who will be coordinator Lecture 12 Page 24 CS 188,Winter 2015

The Invitation Algorithm and Group Numbers • The invitation algorithm recruits a group of nodes to work together – More than one group can exist simultaneously • Group numbers identify the group • Why not identify with coordinator ID? – Because one node can serially coordinate many groups Lecture 12 Page 25 CS 188,Winter 2015

The Basic Operation of the Invitation Algorithm • Coordinators in a normal state periodically check all other nodes • If any other node is a coordinator, try to merge the groups • If timeouts occur, don’t worry about it – Also don’t worry if a response to check comes from this or earlier request Lecture 12 Page 26 CS 188,Winter 2015

Merging in the Invitation Algorithm • Merging always requires forming new group – May have same coordinator, but different group number • Coordinator who initiates merge asks all other known coordinators to merge – They ask their group members – Original group members also asked Lecture 12 Page 27 CS 188,Winter 2015

A Simplified Example UP ={1,2,3,4} 1 3 AreYouCoordinator? Yes 1 1 1 3 Accept Invite Ready Node 1 checks AreYouCoordinator? Accept Ready Invite Invite on for other No behalf of node coordinator 1 4 2 3 1 1 1 So node 1 finds another coordinator Node 1 asks the other coordinator and his old node to join his group Node 1 forms a new group If all members of UP{} respond, we’re fine Lecture 12 Page 28 CS 188,Winter 2015

The Reorganization State • Nodes enter the reorganization state after getting their answer • What’s the point of this state? – Why not just start up the group? – After all, we all know who’s going to be a member • Or do we? Lecture 12 Page 29 CS 188,Winter 2015

Why We Need Another Round of Messages 1 3 Invitation 1 1 Invitation Invitation 4 2 1 1 Who does 1 think will join the group, at this point? Assuming no timeouts, 4 will also join And what if someone crashes? 2 and 3 And 2 needs to know that Presumably not accepting the invitation? Lecture 12 Page 30 CS 188,Winter 2015

Timeouts in the Merge • Don’t worry too much about them • Some nodes respond before the timeout – Some don’t • If you don’t catch them this time, you might the next Lecture 12 Page 31 CS 188,Winter 2015

Straggler Messages • This algorithm is asynchronous – So messages may come in late • What do we do when messages arrive late? • Mostly, reject them • How do we tell? – Messages contain group number Lecture 12 Page 32 CS 188,Winter 2015

Agreement in Distributed Systems CS 188 Distributed Systems - PowerPoint PPT Presentation

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Lecture 12 Page 1 CS 188,Winter 2015 Introduction We frequently want to get a set of nodes in a distributed system to agree Commitment protocols and

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015 Lecture 13 Page 1 CS

Standard 188-2015 Presentation - TE Watson ANSI/ASHRAE Standard 188-2015 Legionellosis: Risk

Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015 Lecture 11 Page

IRB Review IRB Review IRB NUMBER: 14-188-6 IRB NUMBER: 14-188-6 IRB APPROVAL DATE: 01/17/2018

Introduction to Number of Probability favorable outcomes Probability = of an event Total

Real World Subtraction MP4 Model with mathematics. with Manipulatives MP5 Use appropriate tools

Introduction to Probability Click to go to Table of Contents Slide 5 / 188 Probability One way

Table of Contents Review of Two Step Equations Click on a topic to go to that section.

Real World Subtraction MP4 Model with mathematics. with Manipulatives MP5 Use appropriate tools

SCOPE OF THE TBT AGREEMENT TRADE IN GOODS GATT 1994 TBT Agreement lex specialis SCOPE OF THE

Bonn Agreement Oil Appearance Code Bonn Agreement Oil Appearance Code BAOAC BAOAC Bonn

Agreement July 1 1 , 2017 Agreement Key Terms Agreement between TJPA and salesforce.com 25-Year

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

The Bonn Agreement 1969 and BE-AWARE Project Alexander von Buxhoeveden Representing the Bonn

(6) a. ERG agreement ABS agreement (not encoded in (3)) SUBJ OBJ [!] F ROM S YNTAX TO E

Agreement in HPSG Introduction to HPSG, WS 2007/2008 Monica L. L au Universitt Tbingen

FTCS 93 June 1993 A Formally Verified Algorithm for Interactive Consistency Under a Hybrid Fault

Tutorial : Byzantine agreement Valerie King University of Victoria Victoria, Canada 25

EECS 591 D ISTRIBUTED S YSTEMS Manos Kapritsos Fall 2020 Slides by: Lorenzo Alvisi B YZANTINE F

Byzantine Generals Problem August 26, 2019 source: Department of Homeland Security, Science &

Consensus in Distributed Systems Course: Distributed Computing Faculty: Dr. Rajendra Prasath

Using Proof-of-Work to Coordinate Adam Brandenburger* and Kai Steverson * J.P. Valles

Distributed Consensus with Process Failures Paulo S ergio Almeida Distributed Systems Group

CS 683 - Security and Privacy Fall 2019 Instructor: Karim Eldefrawy University of San Francisco

Agreement in Distributed Systems CS 188 Distributed Systems - PowerPoint PPT Presentation

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Lecture 12 Page 1 CS 188,Winter 2015 Introduction We frequently want to get a set of nodes in a distributed system to agree Commitment protocols and

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015 Lecture 13 Page 1 CS

Standard 188-2015 Presentation - TE Watson ANSI/ASHRAE Standard 188-2015 Legionellosis: Risk

Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015 Lecture 11 Page

IRB Review IRB Review IRB NUMBER: 14-188-6 IRB NUMBER: 14-188-6 IRB APPROVAL DATE: 01/17/2018

Introduction to Number of Probability favorable outcomes Probability = of an event Total

Real World Subtraction MP4 Model with mathematics. with Manipulatives MP5 Use appropriate tools

Introduction to Probability Click to go to Table of Contents Slide 5 / 188 Probability One way

Table of Contents Review of Two Step Equations Click on a topic to go to that section.

Real World Subtraction MP4 Model with mathematics. with Manipulatives MP5 Use appropriate tools

SCOPE OF THE TBT AGREEMENT TRADE IN GOODS GATT 1994 TBT Agreement lex specialis SCOPE OF THE

Bonn Agreement Oil Appearance Code Bonn Agreement Oil Appearance Code BAOAC BAOAC Bonn

Agreement July 1 1 , 2017 Agreement Key Terms Agreement between TJPA and salesforce.com 25-Year

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

The Bonn Agreement 1969 and BE-AWARE Project Alexander von Buxhoeveden Representing the Bonn

(6) a. ERG agreement ABS agreement (not encoded in (3)) SUBJ OBJ [!] F ROM S YNTAX TO E

Agreement in HPSG Introduction to HPSG, WS 2007/2008 Monica L. L au Universitt Tbingen

FTCS 93 June 1993 A Formally Verified Algorithm for Interactive Consistency Under a Hybrid Fault

Tutorial : Byzantine agreement Valerie King University of Victoria Victoria, Canada 25

EECS 591 D ISTRIBUTED S YSTEMS Manos Kapritsos Fall 2020 Slides by: Lorenzo Alvisi B YZANTINE F

Byzantine Generals Problem August 26, 2019 source: Department of Homeland Security, Science &amp;

Consensus in Distributed Systems Course: Distributed Computing Faculty: Dr. Rajendra Prasath

Using Proof-of-Work to Coordinate Adam Brandenburger* and Kai Steverson * J.P. Valles

Distributed Consensus with Process Failures Paulo S ergio Almeida Distributed Systems Group

CS 683 - Security and Privacy Fall 2019 Instructor: Karim Eldefrawy University of San Francisco

Byzantine Generals Problem August 26, 2019 source: Department of Homeland Security, Science &