How choosing the Raft consensus algorithm saved us 3 months of - PowerPoint PPT Presentation

Feb 24, 2023 •233 likes •615 views

How choosing the Raft consensus algorithm saved us 3 months of development time What do I do with unused space on my servers? Lets build an S3 cluster! Requirements: Fully S3 compatible Easy to maintain Fault tolerant I found a

How choosing the Raft consensus algorithm saved us 3 months of development time
What do I do with unused space on my servers?
Let’s build an S3 cluster! Requirements: • Fully S3 compatible • Easy to maintain • Fault tolerant
I found a great candidate: SX + LibreS3 Bonuses: • Block level deduplication + • Highly scalable • Multiplatform … but something was missing!
What about automatic failover? Almost there! • Fully distributed • Data replication • Cluster membership management ... but no support for detecting and kicking out dead nodes
How to deal with the failure? • Some node has to make a decision • Decisive node must not be faulty • All the alive nodes should follow There is a need for a consensus algorithm.
Choosing the algorithm Paxos: Raft: • Proven to work • Easy • Very complicated • Straightforward • Many variants and implementation • Accurate and interpretations (ZooKeeper, …) comprehensive specs And the winner is… Raft!
Raft How does it work?
Leader election
Leader election
Leader election
Leader election
Raft Node failure
Dead node detection
Dead node detection
Dead node detection
How I implemented Raft in SX
Implementation details • Heartbeats are sent via internal SX communication • Membership changes are performed automatically • Node failure detection relies on configurable timeouts • Almost no impact on SX performance
How to enable Raft in SX? Enable Raft node failure timeout: $ sxadm cluster --set-param hb_deadtime=120 \ sx://admin@sx.foo.com Kill one of the nodes and check its status: $ sxadm cluster – I sx://admin@sx.foo.com * node 10…da : … status: follower, online: ** NO ** * node bd …ad : … status: follower, online: yes * node c2…b7 : … status: leader, online: yes Wait for the node to be marked as faulty: $ sxadm cluster – I sx://admin@sx.foo.com * node 10…da: … status: follower, online: ** FAULTY ** * node bd …ad: … status: follower, online: yes * node c2…b7: … status: leader, online: yes
www.skylable.com Robert Wojciechowski follow @skylable
Stay tuned …
Coming up next: SXFS FUSE based filesystem mapping for SX: • Client-side encrypted • Fully deniable • Deduplication • Fault tolerant
The election basics • There is only one legitimate leader • Each node chooses a timeout • When timeout is reached a new election is started • A candidate node votes for itself • The candidate requests a vote • In case the candidate received a majority of votes it becomes a new leader
Corner cases Leader failure
Leader node failure
Leader node failure
Leader node failure
Leader node failure
Corner cases Race condition
Election race condition
Election race condition
Election race condition
Election race condition
Corner cases Split votes
Split votes
Split votes
Split votes
Split votes

Recommend

Raft: A Consensus Algorithm for Replicated Logs Diego Ongaro and John Ousterhout Stanford

Raft: A Consensus Algorithm for Replicated Logs Diego Ongaro and John Ousterhout Stanford University Goal: Replicated Log Clients shl Consensus State Consensus State Consensus State Module Machine Module Machine Module Machine

488 views • 31 slides

When Aeron Met Raft Martin Thompson - @mjpt777 What does Consensus mean? consensus

Cluster Consensus When Aeron Met Raft Martin Thompson - @mjpt777 What does Consensus mean? consensus noun \ k n- sen(t)-s s \ : general agreement : unanimity Source: http://www.merriam-webster.com/ consensus noun \ k

1.14k views • 68 slides

RAFT Consensus Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor

RAFT Consensus Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor Log Consensus Bit consensus: agree on a single bit, based on inputs (0,1,0,0,1,0,0) -> 1 Log consensus: agree on contents and order of

688 views • 55 slides

Designing for Understandability: the Raft Consensus Algorithm Diego Ongaro John Ousterhout

Designing for Understandability: the Raft Consensus Algorithm Diego Ongaro John Ousterhout Stanford University Algorithms Should Be Designed For ... Correctness? Efficiency? Conciseness? Understandability! August 29, 2016 The Raft

1.55k views • 29 slides

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing Reliable BGP Routing Reliable BGP Routing Routing Graceful Seamless Graceful Seamless Seamless Graceful Graceful Seamless Migration Migration

1.05k views • 57 slides

Consensus Building Consensus is Consensus is finding an acceptable proposal that all members

Consensus Building Consensus is Consensus is finding an acceptable proposal that all members can support. Consensus is not A unanimous vote A majority vote Everyone 100% satisfied Consensus requires Time Active

351 views • 7 slides

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing Reliable BGP Routing Reliable BGP Routing Routing Graceful Seamless Graceful Seamless Graceful Seamless Seamless Graceful Migration Migration

581 views • 27 slides

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus on Building on Bitcoin Lisboa, Portugal -- 4 July 2018 Paul Sztorc Twitter: @truthcoin paul@tierion.com 16C8 1597 E76E 86E6 C01E F037 AA4B 3330

619 views • 37 slides

Keeping RAFT Afloat Cloud Scale Distributed Consensus Philip Haynes YOW! Data September 2016

Keeping RAFT Afloat Cloud Scale Distributed Consensus Philip Haynes YOW! Data September 2016 ThreatMetrix Confidential Information Do Not Copy or Distribute Without Express Written Permission CONSENSUS? To the general public, consensus

347 views • 22 slides

Scaling Saved Searches Serving real time push-notifications for millions saved searches Who are

466382733 Scaling Saved Searches Serving real time push-notifications for millions saved searches Who are we? ebay kleinanzeigen ebay What are we? ads = classified ads some numbers 22M ads live! 18M searches/day 466382733 Saved

1.86k views • 182 slides

Raft and Other Stories Consensus Trilogy: Part III Rough Timeline for Today Talk about

Raft and Other Stories Consensus Trilogy: Part III Rough Timeline for Today Talk about logistics and lab 0 Raft: You should all know this better than I do. Quiz and midterm course evaluation. Lab 0 and 2 Some Observations Nearly

1.73k views • 151 slides

Consensus II Replicated State Machines, RAFT CS 240: Computing Systems and Concurrency Lecture

Consensus II Replicated State Machines, RAFT CS 240: Computing Systems and Concurrency Lecture 10 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. RAFT slides heavily based on those from Diego

437 views • 41 slides

When Raft Meets SDN: How to Elect a Leader and Reach Consensus in an Unruly Network Yang Zhang ,

When Raft Meets SDN: How to Elect a Leader and Reach Consensus in an Unruly Network Yang Zhang , Eman Ramadan, Hesham Mekky, Zhi-Li Zhang University of Minnesota In Intr troduc ductio tion Consensus Algorithm In Intr troduc ductio

329 views • 15 slides

Your Plan After High School Choosing a Career Choosing a College College Admissions

Your Plan After High School Choosing a Career Choosing a College College Admissions Financial Aid Choosing a Career Choosing a Career TN College and Career Planning System Career Interest Assessment Skills Confidence

417 views • 30 slides

Membership of the consensus group Membership of the consensus group Members of the group were

Asian Pacific Consensus on the Asian Pacific Consensus on the Management of Hilarcholangiocarcinoma Management of Hilarcholangiocarcinoma th APDW2012 December 8 th APDW2012 December 8 Membership of the consensus group Membership of the consensus

251 views • 5 slides

Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus The

Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus The processes use consensus to agree on a common value out of values they initially propose Reaching consensus is one of the most fundamental problems in

592 views • 38 slides

Bitcoin & RAFT Distributed Systems Nikita Borisov Topics for Today Finish Bitcoin

Bitcoin & RAFT Distributed Systems Nikita Borisov Topics for Today Finish Bitcoin Broadcast mechanism Overview of MP2 Raft consensus Bitcoin broadcast Need to broadcast: Transactions to all nodes, so they can be

453 views • 33 slides

RISE Research & Innovation Staff Exchange 2016 Call Dr. Jennifer Brennan European Advisor

RISE Research & Innovation Staff Exchange 2016 Call Dr. Jennifer Brennan European Advisor (Marie Skodowska-Curie Actions) During webinar please email queries to mariecurie@iua.ie Presentation, Q&A report and support documents

1.15k views • 59 slides

Unlocking Solar for Low- and Moderate-income Residents: A Matrix of Financing Options by Resident

Unlocking Solar for Low- and Moderate-income Residents: A Matrix of Financing Options by Resident and Housing Type Jeff Cook and Lori Bird 9/6/2017 Project Objective Objective o Develop a matrix of the top financing options for various

361 views • 24 slides

DATA MINING LECTURE 15 The Map-Reduce Computational Paradigm Most of the slides are taken from:

DATA MINING LECTURE 15 The Map-Reduce Computational Paradigm Most of the slides are taken from: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org J. Leskovec, A. Rajaraman, J.

990 views • 51 slides

In Search of an Understandable Consensus Algorithm Diego Ongaro and John Ousterhout Stanford

In Search of an Understandable Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University Overview Problem: Consensus; every state machine should be in the same state Why Raft when (Multi-)Paxos already exists? - Easier to

398 views • 10 slides

Programming Distributed Systems 10 Total-order broadcast with Raft Annette Bieniusa AG Softech

Programming Distributed Systems 10 Total-order broadcast with Raft Annette Bieniusa AG Softech FB Informatik TU Kaiserslautern Summer Term 2019 Annette Bieniusa Programming Distributed Systems Summer Term 2019 1/ 34 Classical Consensus

835 views • 34 slides

Failure Detectors Concurrency Trilogy Part IV Announcements Project proposals are due

Failure Detectors Concurrency Trilogy Part IV Announcements Project proposals are due tonight, unless you got an extension. Only a few hours left to submit something or seek an extension. No quiz next week. Should have gotten

597 views • 45 slides

Camera Visualization System Requirements and Status JTM - March 2017 Visualization Requirements

Camera Visualization System Requirements and Status JTM - March 2017 Visualization Requirements Camera visualization is mainly aimed at diagnosing issues with CCDs during CCD/raft testing, focal plane integration, camera commissioning

223 views • 9 slides

How choosing the Raft consensus algorithm saved us 3 months of - PowerPoint PPT Presentation

How choosing the Raft consensus algorithm saved us 3 months of development time What do I do with unused space on my servers? Lets build an S3 cluster! Requirements: Fully S3 compatible Easy to maintain Fault tolerant I found a

Raft: A Consensus Algorithm for Replicated Logs Diego Ongaro and John Ousterhout Stanford

When Aeron Met Raft Martin Thompson - @mjpt777 What does Consensus mean? consensus

RAFT Consensus Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor

Designing for Understandability: the Raft Consensus Algorithm Diego Ongaro John Ousterhout

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus Building Consensus is Consensus is finding an acceptable proposal that all members

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus

Keeping RAFT Afloat Cloud Scale Distributed Consensus Philip Haynes YOW! Data September 2016

Scaling Saved Searches Serving real time push-notifications for millions saved searches Who are

Raft and Other Stories Consensus Trilogy: Part III Rough Timeline for Today Talk about

Consensus II Replicated State Machines, RAFT CS 240: Computing Systems and Concurrency Lecture

When Raft Meets SDN: How to Elect a Leader and Reach Consensus in an Unruly Network Yang Zhang ,

Your Plan After High School Choosing a Career Choosing a College College Admissions

Membership of the consensus group Membership of the consensus group Members of the group were

Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus The

Bitcoin &amp; RAFT Distributed Systems Nikita Borisov Topics for Today Finish Bitcoin

RISE Research &amp; Innovation Staff Exchange 2016 Call Dr. Jennifer Brennan European Advisor

Unlocking Solar for Low- and Moderate-income Residents: A Matrix of Financing Options by Resident

DATA MINING LECTURE 15 The Map-Reduce Computational Paradigm Most of the slides are taken from:

In Search of an Understandable Consensus Algorithm Diego Ongaro and John Ousterhout Stanford

Programming Distributed Systems 10 Total-order broadcast with Raft Annette Bieniusa AG Softech

Failure Detectors Concurrency Trilogy Part IV Announcements Project proposals are due

Camera Visualization System Requirements and Status JTM - March 2017 Visualization Requirements

Bitcoin & RAFT Distributed Systems Nikita Borisov Topics for Today Finish Bitcoin

RISE Research & Innovation Staff Exchange 2016 Call Dr. Jennifer Brennan European Advisor