term 2 2020
play

Term 2 2020 Complete your myExperience and shape the future of - PowerPoint PPT Presentation

Distributed Termination, Global Snapshots and Parallel Scientific Computing Dr Vladimir Z. Tosic 1 Term 2 2020 Complete your myExperience and shape the future of education at UNSW. Click the link in Moodle or


  1. Distributed Termination, Global Snapshots and Parallel Scientific Computing Dr Vladimir Z. Tosic 1 Term 2 2020

  2. Complete your myExperience and shape the future of education at UNSW. Click the link in Moodle or login to myExperience.unsw.edu.au (use z1234567@ ad .unsw.edu.au to login) The survey is confidential, your identity will never be released Survey results are not released to teaching staff until after your results are published

  3. MAIN TOPICS IN THE LAST LECTURE… ( BEN-ARI TEXTBOOK CHAPTER 12 ) • Fault lt toler leran ance ce and inc inconsisten istent t inf inform rmation ation in distributed systems – the problem of consen ensus sus • By Byzant ntine General rals algorithm hm explanation • Byzantine Generals algorithm examples and demo in in DA DAJ • King ing algo lgorit ithm hm explanation and examples 3

  4. MAIN TOPICS IN THIS LECTURE… ( BEN-ARI TEXTBOOK CHAPTER 11 ) • Glob lobal l properti rties in a distributed system – the problem of consisten stency cy • Dis Distrib ribute uted d terminati ination using the Dijkstra-Scholten and credit recovery algorithms • Global snapsho hots ts and the Chandy-Lamport algorithm • (Briefly; not in our textbook) Parallel programming in scientific computing and the Gravi avitat tation ional al N-Body y Problem lem 4

  5. WEEK 8 HW CLARIFICATIONS (ATOMICITY IN RICART-AGRAWALA) From Chapter 10 in Ben- Ari’s Textbook 5

  6. RICART-AGRAWALA ALGORITHM – COMPLETE (1/3) 6

  7. RICART-AGRAWALA ALGORITHM – COMPLETE (2/3) 7

  8. RICART-AGRAWALA ALGORITHM – COMPLETE (3/3) 8

  9. RICART-AGRAWALA ALGORITHM – PROMELA FOR Main 9

  10. RICART-AGRAWALA ALGORITHM – PROMELA FOR Receive 10

  11. DISTRIBUTED TERMINATION – INTRODUCTION From Chapter 11 in Ben- Ari’s Textbook and materials by G.R. Andrews 11

  12. GLOBAL PROPERTIES IN A DISTRIBUTED SYSTEM (DS) • DS conundrum 1: determining time and synchronising clocks • DS conundrum 2: information in a node changes while “state” information is collected among multiple nodes • Therefore: not studying simultaneity in DS, but consistency istency – unambiguous accounting of the state of the system 1. Dis 1. Distribute ributed d terminat mination – determine whether computations in all nodes have terminated 2. 2. (Co Consiste sistent) nt) snapshot hot – unambiguously account each message to a particular node/channel 12

  13. TERMINATION – BROADER PERSPECTIVE • Terminatio ination is an important liveness property of programs that are intended to terminate • Sequential programs do not terminate if they diverge (i.e. e. not converge erge) and run forever • Concurrent programs can also deadloc lock (incl. livelock) • Thus: termin rminati ation = convergence ergence + deadlock-freed freedom om 13

  14. THE NEED FOR TERMINATION DETECTION ALGORITHMS • Terminatio ination is a property of union of states of all individual processes and all message channels (“global state”) • As “global state” of a distributed system is not visible to a single node, it is not easy to know w wh when all ll processes esses term rminated inated • Even when all nodes are idle, there might be me messages ages in in transi nsit (sent but not yet received) that will unblock receiving nodes • Several approaches possible, we will study the Dij Dijkstra- Sc Scholte ten algorithm hm and mention some others 14

  15. DISTRIBUTED TERMINATION – DIJKSTRA-SCHOLTEN ALGORITHM From Chapter 11 in Ben- Ari’s Text xtbo book ok 15

  16. DIJKSTRA-SCHOLTEN ALGORITHM – ASSUMPTIONS • Change to previous DS assumptions: Not every 2 nodes have to be connected directly, nodes only have to form a dir irecte cted d graph • Termination algorithm is additional (to regular computations) statements executed when sending/receiving messages • Assume special ial envir ironme nment nt node – no incoming edges, all other nodes can be accessed from it, initiates DS by sending messages (all other nodes inactive), responsible for reporting termination • Node begins computation after receiving 1st message (on any edge), eventually terminates, but can restart on receiving a new message 16

  17. DISTRIBUTED SYSTEM WITH ENVIRONMENT NODE AND BACK EDGES • Assume: for every regular edge from i to j there is a back k edge from j to i carrying special type of message called sign ignal • Assume: each node is at all times able to receive, process and send signals 17

  18. DIJKSTRA-SCHOLTEN ALGORITHM – PRELIM. VERS. DATA STRUCTURES • Requirement: for every ry received ived messag sage, e, sign ignal l back to the source ce • inD inDeficit icit i [E [E]: difference between number of messages received on incoming edge E of node i and number of signals sent back • inDe Deficit i : sum of inDeficit i [E] for ALL edges of node i • outDef Defici icit i : difference between number of messages sent on ALL outgoing edges of node i and number of signals received back • When a node terminates it no longer sends messages, but it can continue sending signals as long as inD inDeficit icit i [E [E ]≠0 for any edge E • DS term rminatio ination when for the environment node: outDe Defic ficit it env env =0 =0 18

  19. DIJKSTRA-SCHOLTEN ALGORITHM – PRELIMINARY V. (1/3): SEND/RECEIVE • Additions to regular sending and receiving of ALL messages 19

  20. DIJKSTRA-SCHOLTEN ALGORITHM – PRELIM. VERS. (2/3): SIGNALS // / note e this! is! • Additional new w processe esses (blocked except when conditions true ) • send sign ignal does s not send the fina inal l sign ignal l wh whil ile the node is a is active! ive! 20

  21. DIJKSTRA-SCHOLTEN ALGORITHM – PRELIM. VERS. (3/3): ENVIRONMENT 21

  22. DIJKSTRA-SCHOLTEN ALGORITHM – PRELIM. V. CORRECTNESS / LIVENESS • For simplicity of proofs only, assume communication is synchron hronou ous • Whether synchronous or asynchronous does not impact correctness, as we assumed that all asynchronous messages are received eventually • Lemma 11.1: Inva varia riants nts 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢 𝑗 ≥0 , 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢 𝑗 ≥0 at each node i ; σ 𝑗∈𝑜𝑝𝑒𝑓𝑡 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢 𝑗 = σ 𝑗∈𝑜𝑝𝑒𝑓𝑡 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢 𝑗 • Theorem 11.2: If the system stem term rminate inates, s, the envir ironme nment nt node eventuall tually announce ces s term rminatio ination • Task for you: Try doing this proof yourself, then read the solution from the textbook (page 242) 22

  23. DIJKSTRA-SCHOLTEN ALGORITHM – PRELIM. VERS. IS NOT SAFE • node1 sends to node2 and node3 , which then send to each other • inDefict 2 =2, inDeficit 2 [e2]=1, inDeficit 3 =2, inDeficit 3 [e3]=1 • By p5 and p6 , both node2 and node3 signal node1 , so it will have outDeficit 1 =0 and wil will announce nce termin rminati ation before ore it occurs urs! 23

  24. DIJKSTRA-SCHOLTEN ALGORITHM – VIRTUAL SPANNING TREE • Source of 1st message to arrive at a node is this node’s parent • Node i waits for: signals from all its children, 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢 𝑗 =0, its own termination; then sends s it its las last sign ignal l to it its parent nt • Variable parent nt stores parent edge (or -1 if it is still unknown) 24

  25. DIJKSTRA-SCHOLTEN ALGORITHM – FINAL VERSION (1/2) • Note: no sending of messages before 1st message received 25

  26. DIJKSTRA-SCHOLTEN ALGORITHM – FINAL VERSION (2/2) // note this! s! // last t signal al always s to parent! nt! // reset t parent; t; new parent t possible ible if re-activa ctivate ted 26

  27. DIJKSTRA-SCHOLTEN ALGORITHM – PARTIAL SCENARIO 27

  28. DIJKSTRA-SCHOLTEN ALGORITHM – DATA STRUCTURES AFTER PARTIAL SCENARIO • 1 ⇒ 2 in the table means: node1 sends message to node2 • (pare rent nt, , inD inDeficit icit[E] [E], , outDef Deficit icit) at each node (Es in order of nodes) • In the figure: outDef Deficit icit wit within in node in ( in (), , inD inDeficit icit on edges 28 • Task for you: add sig ignals ls and decisio isions to term rminate inate (DT DTTs) s)

  29. DIJKSTRA-SCHOLTEN ALGORITHM – PARTIAL SCENARIO SOLUTION 29

  30. DIJKSTRA-SCHOLTEN ALGORITHM – CORRECTNESS / SAFETY • For non-environment node: 𝑞𝑏𝑠𝑓𝑜𝑢 ≠ −1 ⇔ node is activ tive • Lemma 11.3: 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢 𝑗 =0 ⇒ 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢 𝑗 =0 is invariant at each non-environment node i • Lemma 11.4: parent variables define spanning tree of active nodes with the environment node at root; 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢 𝑗 ≠0 for each active node • Theorem 11.2: If envir ironment nment node announces ces term rminatio ination, n, the system stem has term rminated inated • Task for you: Try doing these proofs yourself, then read the solutions from the textbook (page 246) 30

  31. DIJKSTRA-SCHOLTEN ALGORITHM – PERFORMANCE • Problem lem: the number of additional signals = the number of messages • Can be HUGE overhead when a big distributed system shuts down • Improvement: sending 1 signal instead of N signals on same edge • Improvement: Initialising all parent vars to point to environment node • Task for you: Examine textbook (page 247) pseudocode for these improvements • Another problem: when deficit count is more than max integer 31 • Solution: credit recovery algorithms

Recommend


More recommend