two phase commit network fses
play

two-phase commit / network FSes 1 last time remote procedure calls - PowerPoint PPT Presentation

two-phase commit / network FSes 1 last time remote procedure calls imitate function/method call interface extra setup: where is server interface description language to specify interface extra concerns: portability (language + machine),


  1. preparing agree to commit promise: “I will accept this transaction” promise recorded in the machine log in case it crashes agree to abort promise: “I will not accept this transaction” promise recorded in the machine log in case it crashes never ever take back agreement! to keep promise: can’t allow interfering operations (even though student might not be added b/c of other machines) 16 e.g. agree to add student to class → reserve seat in class

  2. coordinator decision coordinator can’t take back global decision must record in presistent log to ensure not forgotten coordinator fails without logged decision? collect votes again 17

  3. coordinator decision coordinator can’t take back global decision must record in presistent log to ensure not forgotten coordinator fails without logged decision? collect votes again 17

  4. fjnishing worker applies transcation (e.g. record student is in class) worker never ever applies transaction still want to do operation? make a new transaction unsure which? option 1: ask coordinator e.g. worker policy: keep asking if no outcome unsure which? option 2: make sure coordinator resends outcome e.g. coordinator keeps sending outcome until it gets “yes, I got it” reply 18 coordinator says commit → commit transaction coordinator (or anyone) says abort → abort transaction

  5. fjnishing worker applies transcation (e.g. record student is in class) worker never ever applies transaction still want to do operation? make a new transaction unsure which? option 1: ask coordinator e.g. worker policy: keep asking if no outcome unsure which? option 2: make sure coordinator resends outcome e.g. coordinator keeps sending outcome until it gets “yes, I got it” reply 18 coordinator says commit → commit transaction coordinator (or anyone) says abort → abort transaction

  6. two-phase commit: roles typical two-phase commit implementation several workers one coordinator might be same machine as a worker 19

  7. two-phase-commit messages “will you agree to do this action?” on failure: can ask multiple times! AGREE-TO-COMMIT or AGREE-TO-ABORT worker records decision in log (before sending) I counted the votes and the result is commit/abort only commit if all votes were commit 20 coordiantor → worker: PREPARE worker → coordinator: coordinator → worker: COMMIT or ABORT

  8. TPC: normal operation coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT log: state=WAIT log: state=AGREED-TO-COMMIT log: state=COMMIT 21

  9. TPC: normal operation coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT log: state=WAIT log: state=AGREED-TO-COMMIT log: state=COMMIT 21

  10. TPC: normal operation — confmict coordinator worker 1 worker 2 PREPARE AGREE-TO- ABORT AGREE-TO- COMMIT ABORT class is full! log: state=ABORT log: state=WAIT log: state=AGREED-TO-COMMIT log: state=ABORT 22

  11. TPC: normal operation — confmict coordinator worker 1 worker 2 PREPARE AGREE-TO- ABORT AGREE-TO- COMMIT ABORT class is full! log: state=ABORT log: state=WAIT log: state=AGREED-TO-COMMIT log: state=ABORT 22

  12. exercise (1) under what circumstances may a worker send vote to abort? [A] in repsonse to a duplicate PREPARE message after replying to the fjrst with a vote to commit [B] after rebooting after a crash, if its log indicates it previously decided to vote to abort, but did not receive any decisions from the coordinator [C] after rebooting after a crash, if its log indicates it previously decided to vote to commit, but did not receive any decisions from the coordinator [D] after sending a vote to commit, but detecting that the coordinator crashed and has been down for a very long time 23

  13. exercise (2) under what circumstances may a coordinator send a decision to abort? [A] when rebooting after a crash, after having last sent a request to vote to all but one worker and receiving votes to commit from all workers contacted [B] when rebooting after a crash, when the log indicates that the last thing the coordinator did was deciding to commit but the log doesn’t indicate that any workers were contacted [C] after successfully sending a request for a vote to a worker, but not receiving the reply due to a network problem 24

  14. two-phase commit: blocking agree to commit “add student to class”? can’t allow confmicting actions… adding student to confmicting class? removing student from the class? not leaving seat in class? …until know transaction globally committed/aborted 25

  15. two-phase commit: blocking agree to commit “add student to class”? can’t allow confmicting actions… adding student to confmicting class? removing student from the class? not leaving seat in class? …until know transaction globally committed/aborted 25

  16. waiting forever? if machine goes away at wrong time, might never decide what happens solution in practice: manual intervention 26

  17. reasoning about protocols: state machines very hard to reason about dist. protocol correctness each machine is in some state know what every message does in this state avoids common problem: don’t know what message does 27 typical tool: state machine

  18. reasoning about protocols: state machines very hard to reason about dist. protocol correctness each machine is in some state know what every message does in this state avoids common problem: don’t know what message does 27 typical tool: state machine

  19. coordinator state machine (simplifjed?) receive AGREE-TO-COMMIT from all resend COMMIT if needed resend ABORT if needed after timeout/failure resend PREPARE accumulate votes send COMMIT send ABORT INIT or no reply from worker receive any AGREE-TO-ABORT send PREPARE to all COMMITTED ABORTED WAITING 28

  20. coordinator state machine (simplifjed?) receive AGREE-TO-COMMIT from all resend COMMIT if needed resend ABORT if needed after timeout/failure resend PREPARE accumulate votes send COMMIT send ABORT INIT or no reply from worker receive any AGREE-TO-ABORT send PREPARE to all COMMITTED ABORTED WAITING 28

  21. coordinator state machine (simplifjed?) receive AGREE-TO-COMMIT from all resend COMMIT if needed resend ABORT if needed after timeout/failure resend PREPARE accumulate votes send COMMIT send ABORT INIT or no reply from worker receive any AGREE-TO-ABORT send PREPARE to all COMMITTED ABORTED WAITING 28

  22. coordinator state machine (simplifjed?) receive AGREE-TO-COMMIT from all resend COMMIT if needed resend ABORT if needed after timeout/failure resend PREPARE accumulate votes send COMMIT send ABORT INIT or no reply from worker receive any AGREE-TO-ABORT send PREPARE to all COMMITTED ABORTED WAITING 28

  23. coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! in assignment: worker sends acknowledgment; arrange retry if no ack worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message

  24. coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! in assignment: worker sends acknowledgment; arrange retry if no ack worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent 29 coordinator crashes? log indicating last state → resend last message

  25. coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! in assignment: worker sends acknowledgment; arrange retry if no ack worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message

  26. coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! in assignment: worker sends acknowledgment; arrange retry if no ack worker doesn’t get COMMIT/ABORT? worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message or, if allowed, maybe send ABORT

  27. coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message in assignment: worker sends acknowledgment; arrange retry if no ack

  28. coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message in assignment: worker sends acknowledgment; arrange retry if no ack

  29. coordinator state machine (less simplifjed?) resend PREPARE resend COMMIT vote/failure/timeout: resend ABORT vote/failure/timeout: store + tally vote: (or send ABORT) failure/timeout: INIT send COMMIT receive AGREE-TO-COMMIT from all send ABORT receive any AGREE-TO-ABORT send PREPARE to all COMMITTED ABORTED WAITING 30

  30. coordinator state machine (less simplifjed?) resend PREPARE resend COMMIT vote/failure/timeout: resend ABORT vote/failure/timeout: store + tally vote: (or send ABORT) failure/timeout: INIT send COMMIT receive AGREE-TO-COMMIT from all send ABORT receive any AGREE-TO-ABORT send PREPARE to all COMMITTED ABORTED WAITING 30

  31. worker state machine (simplifjed) INIT AGREED-TO-COMMIT COMMITTED ABORTED recv PREPARE send AGREE-TO-COMMIT recv PREPARE send AGREE-TO-ABORT recv ABORT recv COMMIT 31

  32. worker state machine (less simplifjed?) INIT AGREED-TO-COMMIT COMMITTED ABORTED recv PREPARE send AGREE-TO-COMMIT recv PREPARE send AGREE-TO-ABORT recv ABORT recv COMMIT recv PREPARE (re)send AGREE-TO-ABORT recv PREPARE resend AGREE-TO-COMMIT 32

  33. worker state machine (less simplifjed?) INIT AGREED-TO-COMMIT COMMITTED ABORTED recv PREPARE send AGREE-TO-COMMIT recv PREPARE send AGREE-TO-ABORT recv ABORT recv COMMIT recv PREPARE (re)send AGREE-TO-ABORT recv PREPARE resend AGREE-TO-COMMIT 32

  34. worker failure recovery worker crashes? log indicating last state log written before acting on that state if INIT: wait for PREPARE (resent)? if AGREE-TO-COMMIT or ABORTED: resend AGREE-TO-COMMIT/ABORT if COMMITTED: redo operation (just like redo logging) 33

  35. state machine missing details really want to specify result of/action for every message! worker recv ABORT in ABORTED: do nothing worker recv ABORT in INIT: go to ABORTED worker recv PREPARE in COMMITTED: ignore? … everything specifjed: machine checkable? want to discard fjnished transactions eventually 34

  36. worker failure during prepare worker failure after prepare without sending vote? option 1: coordinator retries prepare option 2: coordinator gives up, sends abort option 3: worker resends vote proactively 35

  37. worker failure during prepare worker failure after prepare without sending vote? option 1: coordinator retries prepare option 2: coordinator gives up, sends abort option 3: worker resends vote proactively 36

  38. TPC: worker fails after prepare (1a) COMMIT guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends as if never received on reboot: didn’t record transaction assignment: coord crash+reboot coordinator timeout COMMIT coordinator AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 37

  39. TPC: worker fails after prepare (1a) COMMIT guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends as if never received on reboot: didn’t record transaction assignment: coord crash+reboot coordinator timeout COMMIT coordinator AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 37

  40. TPC: worker fails after prepare (1a) COMMIT guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends as if never received on reboot: didn’t record transaction assignment: coord crash+reboot coordinator timeout COMMIT coordinator AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 37

  41. TPC: worker fails after prepare (1a) COMMIT guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends as if never received on reboot: didn’t record transaction assignment: coord crash+reboot coordinator timeout COMMIT coordinator AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 37

  42. TPC: worker fails after prepare (1b) coordinator timeout guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends not sure whether decision received on reboot: read log recorded in log: agree-to-commit assignment: coord crash+reboot COMMIT coordinator COMMIT AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 38

  43. TPC: worker fails after prepare (1b) coordinator timeout guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends not sure whether decision received on reboot: read log recorded in log: agree-to-commit assignment: coord crash+reboot COMMIT coordinator COMMIT AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 38

  44. TPC: worker fails after prepare (1b) coordinator timeout guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends not sure whether decision received on reboot: read log recorded in log: agree-to-commit assignment: coord crash+reboot COMMIT coordinator COMMIT AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 38

  45. TPC: worker fails after prepare (1b) coordinator timeout guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends not sure whether decision received on reboot: read log recorded in log: agree-to-commit assignment: coord crash+reboot COMMIT coordinator COMMIT AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 38

  46. worker failure during prepare worker failure after prepare without sending vote? option 1: coordinator retries prepare option 2: coordinator gives up, sends abort option 3: worker resends vote proactively 39

  47. TPC: worker fails after prepare (2) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT ABORT didn’t have time to log response? coordinator gives up, votes to abort doesn’t care about worker 2’s vote anymore 40

  48. TPC: worker fails after prepare (2) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT ABORT didn’t have time to log response? coordinator gives up, votes to abort doesn’t care about worker 2’s vote anymore 40

  49. TPC: worker fails after prepare (2) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT ABORT didn’t have time to log response? coordinator gives up, votes to abort doesn’t care about worker 2’s vote anymore 40

  50. worker failure during prepare worker failure after prepare without sending vote? option 1: coordinator retries prepare option 2: coordinator gives up, sends abort option 3: worker resends vote proactively 41

  51. TPC: worker fails after prepare (3) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT record agree-to-commit on reboot — can proactively resend vote 42

  52. TPC: worker fails after prepare (3) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT record agree-to-commit on reboot — can proactively resend vote 42

  53. network failure after during voting? same options: coordinator resends PREPARE coordinator gives up worker resends vote 43 network failure during voting ≈ node failure

  54. TPC: network failure (1) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT ABORT 44

  55. worker failure during commit worker failure during commit? option 1: coordinator resends outcome somehow? requires acknowledgements from worker required for assignment option 2: worker resends vote (coordinator resends outcome) NB: coordinator cannot give up 45

  56. aside: worker ACKs coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT ack-commit assignment: worker sends response from COMMIT (no extra work: Commit is RPC call with return value) if not received, coordinator knows something wrong 46

  57. aside: worker ACKs coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT ack-commit assignment: worker sends response from COMMIT (no extra work: Commit is RPC call with return value) if not received, coordinator knows something wrong 46

  58. worker failure during commit worker failure during commit? option 1: coordinator resends outcome somehow? requires acknowledgements from worker required for assignment option 2: worker resends vote (coordinator resends outcome) NB: coordinator cannot give up 47

  59. coordinator resend automatically coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT could detect missing ACK and resend but how many times to retry? how long to wait? would complicate testing COMMIT 48

  60. coordinator resend automatically coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT could detect missing ACK and resend but how many times to retry? how long to wait? would complicate testing COMMIT 48

  61. TPC: worker revoting coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT COMMIT record agree-to-commit on reboot — resend vote coordinator resends decision 49

  62. TPC: worker revoting coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT COMMIT record agree-to-commit on reboot — resend vote coordinator resends decision 49

  63. two-phase commit assignment two phase commit assignment store single value across workers single coordinator sends messages to/from workers to change values workers current value can be queried directly goal: several replicas all have same value or unavailable …even if failures 50

  64. assignment: RPC coordinator talks to worker by making RPC calls workers only talk to coordinator by replying to RPC example: make ”prepare” call, worker’s ”agree-to-X” is return value RPC system detects worker being down, network errors, etc. become Python exception in coordinator coordinator verifjes Commit/Abort received instead of worker asking again automatic: Commit/Abort message is RPC call; RPC call fails if problem 51

  65. assignment: failure recovery to simplify assignment: always return error if you detect failure assume testing code/user will restart the coordinator+workers coordinator sends messages to workers on reboot to recover resend prepare or commit, abort, etc. 52

  66. assignment: failure types send RPC and it gets lost it gets sent, but acknowledgment/reply is lost it gets sent, but delayed until after another RPC 53

  67. assignment: failure types send RPC and it gets lost it gets sent, but acknowledgment/reply is lost it gets sent, but delayed until after another RPC 54

  68. TPC: reordering id= 0 solution: resent later (timeout or coordinator recovery) fjrst prepare message didn’t get to worker 2 one solution: unique/increasing ID numbers problem: need to know this is an old message but maybe prepare wasn’t really lost… id= 1 PREPARE id= 0 COMMIT AGREE-TO- COMMIT coordinator id= 0 COMMIT AGREE-TO- (resent) id= 0 PREPARE id= 0 PREPARE worker 2 worker 1 55

  69. TPC: reordering id= 0 solution: resent later (timeout or coordinator recovery) fjrst prepare message didn’t get to worker 2 one solution: unique/increasing ID numbers problem: need to know this is an old message but maybe prepare wasn’t really lost… id= 1 PREPARE id= 0 COMMIT AGREE-TO- COMMIT coordinator id= 0 COMMIT AGREE-TO- (resent) id= 0 PREPARE id= 0 PREPARE worker 2 worker 1 55

  70. TPC: reordering id= 0 solution: resent later (timeout or coordinator recovery) fjrst prepare message didn’t get to worker 2 one solution: unique/increasing ID numbers problem: need to know this is an old message but maybe prepare wasn’t really lost… id= 1 PREPARE id= 0 COMMIT AGREE-TO- COMMIT coordinator id= 0 COMMIT AGREE-TO- (resent) id= 0 PREPARE id= 0 PREPARE worker 2 worker 1 55

  71. message reordering and assignment assignment: you need to worry about reordering connections prevent reordering, but… RPC system doesn’t prevent it: can use multiple connections problem: old request seems to fail , but is actually slow you repeat old request again solution: sequence numbers or transactions ID and/or timestamps some way to tell “this is old” 56 later on slow old request reaches machine → must be ignored!

  72. other model: every node has a copy of data extending voting two-phase commit: unanimous vote to commit assumption: data split across nodes, every must cooperate goal: work (including updates!) despite a few failing nodes just require “enough” nodes to be working for now — assume fail-stop nodes don’t respond or tell you if broken 57

  73. extending voting two-phase commit: unanimous vote to commit assumption: data split across nodes, every must cooperate goal: work (including updates!) despite a few failing nodes just require “enough” nodes to be working for now — assume fail-stop nodes don’t respond or tell you if broken 57 other model: every node has a copy of data

  74. backup slides 58

Recommend


More recommend