Ken Birman i Cornell University. CS5410 Fall 2008. Gossip and - - PowerPoint PPT Presentation

ken birman i
SMART_READER_LITE
LIVE PREVIEW

Ken Birman i Cornell University. CS5410 Fall 2008. Gossip and - - PowerPoint PPT Presentation

Ken Birman i Cornell University. CS5410 Fall 2008. Gossip and Network Overlays A topic that has received a lot of recent attention Today well look at three representative approaches Scribe, a topic based pub sub system that


  • Ken Birman i Cornell University. CS5410 Fall 2008.

  • Gossip and Network Overlays � A topic that has received a lot of recent attention � Today we’ll look at three representative approaches � Scribe, a topic ‐ based pub ‐ sub system that runs on the Pastry DHT (slides by Anne ‐ Marie Kermarrec) � Sienna a content subscription overlay system (slides by � Sienna, a content ‐ subscription overlay system (slides by Antonio Carzaniga) � T ‐ Man, a general purpose system for building complex network overlays (slides by Ozalp Babaoglu)

  • Scribe � Research done by the Pastry team, at MSR lab in Cambridge England � Basic idea is simple B i id i i l � Topic ‐ based publish/subscribe � Use topic as a key into a DHT � Use topic as a key into a DHT � Subscriber registers with the “key owner” � Publisher routes messages through the DHT owner � Optimization to share load � If a subscriber is asked to forward a subscription, it doesn’t do so and instead makes note of the subscription Later it will so and instead makes note of the subscription. Later, it will forward copies to its children

  • Architecture Scalable communication Subscription management SCRIBE service Event notification P2P location and PASTRY DHT routing layer Internet TCP/IP 20/12/2002 4

  • Design � Construction of a multicast tree based on the Pastry network � Reverse path forwarding R h f di � Tree used to disseminate events � Use of Pastry route to create and join groups � Use of Pastry route to create and join groups 20/12/2002 5

  • SCRIBE: Tree Management � Create: route to Root groupId j join( groupId) ( g p ) � Join: route to groupId J i Id groupId Id Forwards two copies � Tree: union of Pastry routes from members routes from members Multicast ( groupId) to the root. � Multicast: from the root down to the d h leaves Low link stress Low link stress join( groupId) Low delay 20/12/2002 6

  • SCRIBE: Tree Management d467c4: root d467c4: root 26b20d d471f1 d467c4: root Proximity space y p d13da3 65a1fc 65a1fc 65a1fc d13da3 Name space 26b20d 20/12/2002 7

  • Concerns? � Pastry tries to exploit locality but could these links send a message from Ithaca… to Kenya… to Japan… � What if a relay node fails? Subscribers it serves Wh if l d f il S b ib i will be cut off � They refresh subscriptions but unclear how often this � They refresh subscriptions, but unclear how often this has to happen to ensure that the quality will be good � (Treat subscriptions as “leases” so that they evaporate if not refreshed… no need to unsubscribe…)

  • SCRIBE: Failure Management � Reactive fault tolerance � Tolerate root and nodes failure � Tree repair: local impact � Fault detection: heartbeat messages � Local repair l i 20/12/2002 9

  • Scribe: performance � 1500 groups, 100,000 nodes, 1msg/group � Low delay penalty � Good partitioning and load balancing G d titi i d l d b l i � Number of groups hosted per node : 2.4 (mean) 2 (median) � Reasonable link stress: � Mean msg/link : 2.4 (0.7 for IP) � Maximum link stress: 4*IP M i li k *IP 20/12/2002 10

  • Topic distribution Windows Update oup Size Stock Gro Alert Alert Instant Messaging Topic Rank 20/12/2002 11

  • Concern about this data set � Synthetic, may not be terribly realistic � In fact we know that subscription patterns are usually power ‐ law distributions, so that’s reasonable l di t ib ti th t’ bl � But unlikely that the explanation corresponds to a clean Zipf ‐ like distribution of this nature (indeed, totally p ( , y implausible) � Unfortunately, this sort of issue is common when evaluating very big systems using simulations l i bi i i l i � Alternative is to deploy and evaluate them in use… but only feasible if you own Google ‐ scale resources! only feasible if you own Google scale resources!

  • Delay penalty 1500 f Topics 1250 e Number of 1000 Mean = 1.66 Median =1.56 Cumulative 750 500 C 250 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Delay Penalty Relative to IP 20/12/2002 13

  • Node stress: 1500 topics Mean = 6.2 es er of node Median =2 Numbe Total number of children table entries 20/12/2002 14

  • Scribe Scribe Link stress 40000 Scribe 35000 Mean = 1.4 IPMulticast Median = 0 30000 er of Links 25000 20000 Numbe 15000 Maximum stress 10000 5000 0 1 10 100 1000 10000 Link stress Link stress 20/12/2002 15

  • Anycast � Supports highly dynamic groups � Suitable for decentralized resource discovery (can add predicate during DFS) predicate during DFS) � Results (100k nodes/.5M network): � Join: 4.1 msgs (empty group); avg 3.5 msgs (2,500 members) � 1,000 anycasts: 4.1 msg (empty group); avg 2.3 msgs (2,500 t ( t ) ( members) � Locality: For >90% of anycasts, <7% of member were closer than the receiver receiver 20/12/2002 16

  • Fireflies Fireflies ppt Fireflies.ppt

  • T ‐ Man T ‐ Man T Man