approche algorithmique des syst emes distribu es aasr
play

Approche Algorithmique des Syst` emes Distribu es (AASR) - PowerPoint PPT Presentation

Approche Algorithmique des Syst` emes Distribu es (AASR) Guillaume Pierre guillaume.pierre@irisa.fr Dapr` es un jeu de transparents de Maarten van Steen VU Amsterdam, Dept. Computer Science 04b: Communication (2/2) Contents Chapter


  1. Approche Algorithmique des Syst` emes Distribu´ es (AASR) Guillaume Pierre guillaume.pierre@irisa.fr D’apr` es un jeu de transparents de Maarten van Steen VU Amsterdam, Dept. Computer Science 04b: Communication (2/2)

  2. Contents Chapter 01: Introduction 02: Architectures 03: Processes 04: Communication 04: Communication (1/2) 04: Communication (2/2) 05: Naming 06: Synchronization 07: Consistency & Replication 08: Fault Tolerance 09: Security 2 / 39

  3. Multicast communication Application-level multicasting Gossip-based data dissemination 3 / 39

  4. Application-level multicasting Essence Organize nodes of a distributed system into an overlay network and use that network to disseminate data. Chord-based tree building Initiator generates a multicast identifier mid . 1 Lookup succ(mid) , the node responsible for mid . 2 Request is routed to succ(mid) , which will become the root . 3 If P wants to join, it sends a join request to the root. 4 When request arrives at Q : 5 Q has not seen a join request before ⇒ it becomes forwarder ; P becomes child of Q . Join request continues to be forwarded. Q knows about tree ⇒ P becomes child of Q . No need to forward join request anymore. 4 / 39

  5. ALM: Some costs End host Router A C 1 1 30 20 Re Ra Rc 5 7 40 Rd Rb 1 1 Internet D B Overlay network Link stress : How often does an ALM message cross the same physical link? Example: message from A to D needs to cross � Ra , Rb � twice. Stretch : Ratio in delay between ALM-level path and network-level path. Example: messages B to C follow path of length 71 at ALM, but 47 at network level ⇒ stretch = 71/47. 5 / 39

  6. Epidemic Algorithms General background Update models Removing objects 6 / 39

  7. Principles Basic idea Assume there are no write–write conflicts: Update operations are performed at a single server A replica passes updated state to only a few neighbors Update propagation is lazy, i.e., not immediate Eventually, each update should reach every replica Two forms of epidemics Anti-entropy : Each replica regularly chooses another replica at random, and exchanges state differences, leading to identical states at both afterwards Gossiping : A replica which has just been updated (i.e., has been contaminated ), tells a number of other replicas about its update (contaminating them as well). 7 / 39

  8. Anti-entropy Principle operations A node P selects another node Q from the system at random. Push : P only sends its updates to Q Pull : P only retrieves updates from Q Push-Pull : P and Q exchange mutual updates (after which they hold the same information). Observation For push-pull it takes O ( log ( N )) rounds to disseminate updates to all N nodes ( round = when every node as taken the initiative to start an exchange). 8 / 39

  9. Anti-entropy: analysis (extra) Basics Consider a single source, propagating its update. Let p i be the probability that a node has not received the update after the i-th cycle. Analysis: staying ignorant With pull, p i + 1 = ( p i ) 2 : the node was not updated during the i-th cycle and should contact another ignorant node during the next cycle. N ) N ( 1 − p i ) ≈ p i e − 1 (for small p i and large With push, p i + 1 = p i ( 1 − 1 N ): the node was ignorant during the i-th cycle and no updated node chooses to contact it during the next cycle. With push-pull : ( p i ) 2 · ( p i e − 1 ) 9 / 39

  10. Push vs. Pushpull Let’s add 500 nodes, all with the same initial central neighbor Question Why did we omit the pull protocols? 10 / 39

  11. Anti-entropy in large-scale distributed systems How can each node in the system randomly select on of its neighbors? Centralized list of nodes ⇒ not scalable (because of the query traffic) Fully-replicated list of nodes ⇒ not scalable (because of the update traffic) Conclusion: let’s build a distributed system for that... :-) 11 / 39

  12. Epidemic overlay management Decentralized Traditional Each node has a partial view of Each node has a the network (small, fixed size) complete view of the network Nodes periodically exchange Nodes periodically links with a random node (from exchange data with their partial view). a randomly-selected Randomly ⇒ random network node Methodically ⇒ structure 12 / 39

  13. Randomized overlays Each node’s view contains a set of (truly) random nodes from the network Periodically refreshed Important property: Random node from a view == random node from the network So we can apply “traditional” gossip 13 / 39

  14. CYCLON: one possible way to build a random overlay Each node keeps a fixed-size list of neighbors (e.g., 20) Each neighbor is tagged with the date we last saw him alive Periodically, each node selects one node out of its neighbors Pick the oldest peer out of its view Exchange some references with this peer And add a reference to itself 14 / 39

  15. Average path length Path length = a good measure of the time and cost to flood the network 15 / 39

  16. Clustering coefficient Coefficient == probability that two neighbors of the same node are also neighbors of each other A large clustering coefficient is: Bad for flooding (many redundant messages) Bad for self-healing (each strongly connected cluster has only few links to other clusters) 16 / 39

  17. Clustering coefficient 17 / 39

  18. In-degree distribution The in-degree distribution affects: Robustness (because of weakly-connected nodes) Load balancing Way epidemics spread 18 / 39

  19. Self-healing 19 / 39

  20. Self-healing 20 / 39

  21. Creating structure with unstructured overlays “Unstructured” P2P overlays are very good at creating structure : Define a global proximity function on nodes Each node will link with nodes which minimize cost ( self , node ) Each time two nodes gossip, they: Merge their entire views Keep only the “best” nodes according to the proximity function 21 / 39

  22. Maintaining connectivity Imagine an overlay where each node connects only to similar nodes E.g., similar music taste This may break connectivity How can new nodes find their place in the overlay? How can the structure evolve if I change my tastes? 22 / 39

  23. Dual-layer gossiping VICINITY aggressively searches for the best nodes It also picks nodes from the underlying CYCLON layer CYCLON keeps the overlay connected It learns uniformly random nodes 23 / 39

  24. Example: build a torus 24 / 39

  25. Example: build a torus 25 / 39

  26. Example: build a torus 26 / 39

  27. Example: build a torus 27 / 39

  28. Let’s use Vicinity to search for content eDonkey2000 traces: 12,000 nodes and their list of files In total: 970,000 unique files Each node initially knows 5 random others Goal: connect each node with its 10 closest neighbors Similarity metric: # of files shared by both A and B 28 / 39

  29. Vicinity performance Using Vicinity but not Cyclon: 29 / 39

  30. Vicinity performance Let’s have Vicinity periodically exchange a few Vicinity links with a random neighbor (taken from Cyclon) 30 / 39

  31. Vicinity performance Let’s have Vicinity periodically exchange the best Vicinity links with a random neighbor 31 / 39

  32. Vicinity performance Let’s have Vicinity periodically exchange the best links (also from Cyclon) with a random neighbor 32 / 39

  33. Aggregation Aggregation is the collective name of a set of functions that provide statistical information about a system. Useful in large-scale distributed systems The average load of nodes in a cloud The sum of free space in a distributed storage system The total number of nodesin a P2P system Solutions should be: Decentralized Robust to churn 33 / 39

  34. Churn All large-scale systems have churn : nodes join and leave all the time 34 / 39

  35. Gossip-based aggregation 35 / 39

  36. Example: average estimation Each node contains a state: a number representing the value to be averaged selectPeer () : random selection among current neighbors update ( s a , s b ) = s a + s b 2 Observations: After each exchange the system’s average does not change 1 The variance is reduced 2 Therefore: if the system is connected then each node will 3 converge toward the global average. 36 / 39

  37. A run of the protocol 37 / 39

  38. Exercises 1 How does the topology influence the convergence speed? 2 Which topology is optimal for convergence speed? 3 What are the effects of link failures on the protocol? 4 What are the effects of node failures on the protocol? 5 Devise a protocol that measures the (approximate) number of nodes in the system 38 / 39

  39. Counting The counting protocol is based on average calculation Initialization: one node starts with 1, all others with 0 The average value will converge towards 1 / N Problem: how to select that “one node?” Concurrent instances of the counting protocol Each instance is led by a different node Messages are tagged with a unique identifier Nodes participate in all instances for a duration T Each node self-elects itself as a leader with probability P = c / N est c is the interval (in cycles) at which we want to estimate N 39 / 39

Recommend


More recommend