w hat a bout p axos
play

W HAT A BOUT P AXOS ? Paxos tolerates a minority of processing - PowerPoint PPT Presentation

B YZANTINE F AULT T OLERANCE Ellis Michael A H IERARCHY OF F AULT M ODELS No faults Crash faults Byzantine faults People who use tabs instead of spaces B YZANTINE F AULTS Also called "general" or "arbitrary" faults.


  1. B YZANTINE F AULT T OLERANCE Ellis Michael

  2. A H IERARCHY OF F AULT M ODELS No faults Crash faults Byzantine faults People who use tabs instead of spaces

  3. B YZANTINE F AULTS • Also called "general" or "arbitrary" faults. • Faulty nodes can take any actions. They can send any messages, collude with each other, etc. in an attempt to "trick" the non-faulty nodes and subvert the protocol. • Why this model?

  4. S TRANGE T HINGS H APPEN AT S CALE • Hardware failures are real and can cause both crashes and aberrant behavior. • Cosmic rays from outer space (!) We'll come back to these can and will randomly fl ip bits in at the end of the lecture. memory. • Software bugs are all too common. • Security vulnerabilities can let attackers into distributed systems.

  5. W HAT A BOUT P AXOS ? • Paxos tolerates a minority of processing failing by crashing . • What could a malicious replica do to a Paxos deployment? - Stop processing requests. - A leader could report incorrect results to a client. - A follower could acknowledge a proposal and then discard it. - A follower could respond to prepare messages without all previously acknowledged commands. - A server could continually start new leader elections. - ...

  6. B YZANTINE Q UORUMS Obviously, if all servers are Byzantine, we can't guarantee anything. How many servers do we need to tolerate 𝑔 𝑔 faults? • In order to make progress, we can only wait for 𝑜‒𝑔 𝑜‒ 2 𝑔 > 𝑔 𝑜‒ 2 𝑔 servers. 𝑜 servers Provable lower 𝑜‒𝑔 bound. • What if two di ff erent servers contact 𝑜‒𝑔 quorums? If they intersect at 𝑔 or fewer servers, that's not good. 𝑔 • Therefore, we need at least 3 𝑔 +1 servers. Any two quorums of 2 𝑔 +1= 𝑜‒𝑔 will intersect at at least one non- faulty server.

  7. S ETUP • 𝑜 =3 𝑔 +1 servers, 𝑔 of which can be faulty. Unlimited clients. • We assume public-key infrastructure. Servers and clients can sign messages and verify signatures. Signatures aren't forgeable. - We denote message 𝑛 with ⟨ 𝑛 ⟩ , and message 𝑛 signed by 𝑞 as ⟨ 𝑛 ⟩ 𝑞 . • Servers also have access to a digest function (cryptographic hash) on messages, 𝐸 ( 𝑛 ), which we assume is collision-resistant. • The attacker controls 𝑔 faulty servers and knows the protocol the other servers are running. The attacker also has control over the network and can delay and reorder messages to all nodes.

  8. G OAL The goal, as in Paxos, is state-machine replication. We want to guarantee safety when there are 𝑔 or fewer failures (or an unlimited number of crash failures) and liveness during periods of synchrony. Easy, right?

  9. PBFT: T HE B ASIC I DEA view 2 leader view 1 leader view 3 𝑞 2 Practical Byzantine Fault Tolerance (PBFT) is leader- leader based, just like Paxos. But it more closely resembles 𝑞 1 Viewstamped Replication [Oki and Liskov '88]. 𝑞 3 • The system progresses through a series of view 𝑜 + 1 numbered views . There is a single leader leader view 4 associated with each view. leader • The clients will send their commands to the leader. 𝑞 𝑜 𝑞 4 • The leader assigns the command a sequence view 5 number (slot number) and forwards to the leader view 𝑜 followers. leader ... 𝑞 5 • The protocol ensures that this decision is permanently fi xed; then they respond to the client.

  10. W HAT ' S T HE W ORST T HAT C OULD H APPEN ? • The leader could be faulty. Clients wait for 𝑔 +1 - It could assign di ff erent commands to the same matching replies. sequence number. - It could try to send the wrong result to the client. - It could ignore the clients altogether. Followers can replace a • The followers could also be faulty and lie about the misbehaving leader with a commands they received. view change .

  11. W HAT A BOUT F AULTY C LIENTS ? • We assume that there is some existing way for clients to authenticate themselves with the system. • Access controls can be used to restrict what each client is allowed to do. • System administrators (or the system itself) can revoke access for faulty clients.

  12. P APERS , P LEASE • Servers don't take each others' word for anything. They require proof. • In order to verify that a client's command is legitimate, they need the signed message from the client (or proof thereof). Certificate • All other steps in the system are taken only after receiving signed messages from a quorum of 2 𝒈 +1 servers. Servers can also collect these messages into certi fi cates they can use to prove to each other the legitimacy of certain steps.

  13. P ROTOCOL O VERVIEW Three sub-protocols: Server state: 1. Normal operations • Current view Phase 1: Pre-prepare • State machine checkpoint Phase 2: Prepare Phase 3: Commit • Current state machine state 2. View change • Log of all not garbage collected messages 3. Garbage collection

  14. N ORMAL O PERATIONS (I) client 𝑑 𝑛 = ⟨ REQUEST ⟩ 𝑑 leader 𝑚 followers ⟨⟨ PRE-PREPARE, 𝑤 , 𝑜 , 𝐸 ( 𝑛 ) ⟩ 𝑚 , 𝑛⟩

  15. A CCEPTING P RE -P REPARES The leader sends ⟨⟨ PRE-PREPARE, 𝑤 , 𝑜 , 𝐸 ( 𝑛 ) ⟩ 𝑚 , 𝑛⟩ to the followers. • 𝑤 is the view number. • 𝑜 is the sequence number assigned by the leader. • 𝐸 ( 𝑛 ) is a digest of the message (to reduce amount of public key crypto). A follower accepts the PRE-PREPARE if: • The client request is valid. • The follower is in view 𝑤 . • The follower hasn't accepted a di ff erent PRE-PREPARE for the same sequence number in the same view. • The sequence number isn't too far ahead (to prevent sequence numbers from getting unnecessarily large).

  16. N ORMAL O PERATIONS (II) client 𝑑 leader followers ⟨ PREPARE, 𝑤 , 𝑜 , 𝐸 ( 𝑛 ) ⟩ 𝑞

  17. P REPARE C ERTIFICATES • Once followers accept the PRE-PREPARE, they broadcast (signed) PREPARE messages. • Once a server has received 2 𝑔 matching PREPAREs and the associated PRE- PREPARE, it has a Prepare Certi fi cate . • Because quorums intersect at at least one honest server, and honest servers don't prepare di ff erent commands in the same slot, no two prepare certi fi cates ever exist for the same view and same sequence number and di ff erent commands . • However, a single server having a prepare certi fi cate is not enough. What about view changes? The new leader might not get the Prepare Certi fi cate, might not have enough information to pick the correct command in the new view.

  18. N ORMAL O PERATIONS (III) client 𝑑 leader followers ⟨ COMMIT, 𝑤 , 𝑜 , 𝐸 ( 𝑛 ) ⟩ 𝑞

  19. C OMMIT C ERTIFICATES • Once a server has a Prepare Certi fi cate, it broadcasts a COMMIT message. • Once a server has 2 𝑔 +1 matching COMMITs (and the associated client message), it has a Commit Certi fi cate . • A commit certi fi cate proves that every quorum of 2 𝑔 +1 servers has at least one non-faulty node with a Prepare Certi fi cate. This command is now stable and will be fi xed in the same slot future view changes. • The server can then execute the command (provided it executed all previous commands) and reply to the client.

  20. N ORMAL O PERATIONS (IV) Client waits for 𝑔 +1 matching replies, implying at least one correct server has a Commit Certificate. client 𝑑 ⟨ REPLY, 𝑤 , 𝑜 , 𝐸 ( 𝑛 ) ⟩ 𝑞 leader followers PRE-PREPARE PREPARE COMMIT REPLY

  21. V IEW C HANGE • Followers monitor the leader. If the leader stops responding to pings or does anything shady, they start a view change. • First, the follower sends ⟨ VIEW-CHANGE, 𝑤 +1, 𝒬⟩ 𝑞 to the leader of view 𝑤 +1 and ⟨ VIEW-CHANGE, 𝑤 +1 ⟩ 𝑞 to the other followers. The follower stops accepting messages for the old view. - 𝒬 is the set of all Prepare Certi fi cates (or Commit Certi fi cates) the follower has received. • Other followers join in the view change when they receive 𝑔 +1 VIEW- CHANGE messages.

  22. S TARTING A N EW V IEW Once the new leader receives 2 𝑔 VIEW-CHANGE messages from the other servers, it broadcasts ⟨ NEW-VIEW, 𝑤 +1, 𝒲 , 𝒫⟩ 𝑞 • 𝒲 is the set of VIEW-CHANGE messages it received. • 𝒫 is a set of PRE-PREPARES in the new view, one for every sequence number less than or equal to the largest sequence number seen in a Prepare Certi fi cate in a VIEW-CHANGE message. If there is a Prepare Certi fi cate for that sequence number, the PRE-PREPARE is for that command. Otherwise, the leader pre-prepares a no-op. Followers can independently verify that the view was started correctly from the set 𝒲 . If everything checks out, they start the new view and process the PRE-PREPARES in 𝒫 as normal.

  23. =committed =prepared ⊥ =no-op Status in previous view 𝑑 1 𝑑 2 𝑑 3 𝑑 4 𝑑 5 𝑑 6 1 2 3 4 5 6 7 8 9 10 Possible new leader's log ⊥ ⊥ ⊥ ⊥ 𝑑 1 𝑑 2 𝑑 4 𝑑 5 1 2 3 4 5 6 7 8 9 10

  24. G ARBAGE C OLLECTION • In the normal case, servers save their log of commands and all of the messages they receive. • In the non-Byzantine case, servers can periodically compact their logs. They can bring out-of-date servers back up-to-date with a state transfer . • In the Byzantine case, a server can't just accept a state transfer from another node. It needs proof.

Recommend


More recommend