Adaptive Data Propagation in Peer-to-Peer Systems Thomas Repantis trep@cs.ucr.edu CS253-Distributed Systems, Winter 2004 – p.1/18
Overview 1. Problem 2. Solution 3. Simulation Results 4. Conclusion CS253-Distributed Systems, Winter 2004 – p.2/18
Problem Definition How can we efficiently locate an object in an unstructured peer-to-peer system, when a reference to that object is given? Traditionally flooding, propagating query hop-by-hop, with many disadvantages: Messages travel a large number of hops. Waste processing power of many nodes. Produce large amounts of network traffic. Delay the answer. CS253-Distributed Systems, Winter 2004 – p.3/18
Suggested Solutions Organize nodes according to their interests. Use Bloom filters to summarize data stored in nodes. Each node examines the content synopses of its neighbors to decide were to propagate a query. CS253-Distributed Systems, Winter 2004 – p.4/18
We suggest going even further Let us propagate the Content Synopses adaptively, according to parameters like: Number of queries we have received from a peer. Number of local hits the queries of a peer have produced. CS253-Distributed Systems, Winter 2004 – p.5/18
System operation example According to its criteria, C propagates S only to B B based on S routes Q only to C QH is routed back to A D A Q Q C S QH S B QH S F E G CS253-Distributed Systems, Winter 2004 – p.6/18
Simulation Parameters We implemented our protocols on top of the Neurogrid simulator Counting Bloom Filters, 4-bit counter, 10 bits length, 4 hash functions 300 possible Documents 400 possible Keywords 30 Documents per Node 1 Keyword per Document 50 Maximum Connections per Node TTL 7 CS253-Distributed Systems, Winter 2004 – p.7/18
Synopses Hits Queries found in neighbors’ content synopses. Synopses Hits 200000 "SynopsesHitsADP" "SynopsesHitsBF" 180000 160000 140000 Number of Synopses Hits 120000 100000 80000 60000 40000 20000 0 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.8/18
Synopses Misses Queries not found in neighbors’ content synopses. Synopses Misses 250000 "SynopsesMissesADP" "SynopsesMissesBF" 200000 Number of Synopses Misses 150000 100000 50000 0 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.9/18
False Positives Queries falsely propagated. False Positives 700 "FalsePositivesADP" "FalsePositivesBF" 600 500 Number of False Positives 400 300 200 100 0 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.10/18
Average Number of Matches Number of matching documents found for a search. Average Number of Matches 700 "MatchesAvgADP" "MatchesAvgBF" "MatchesAvgGNT" 600 500 Average Number of Matches 400 300 200 100 0 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.11/18
Average Number of Message Trans- fers Number of messages sent during a search. Average Number of Message Transfers 25000 "MsgTransfersAvgADP" "MsgTransfersAvgBF" "MsgTransfersAvgGNT" 20000 Average Number of Message Transfers 15000 10000 5000 0 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.12/18
Average Number of Nodes Reached Number of nodes reached during a search. Average Number of Nodes Reached 7000 "NodesReachedAvgADP" "NodesReachedAvgBF" "NodesReachedAvgGNT" 6000 Average Number of Nodes Reached 5000 4000 3000 2000 1000 0 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.13/18
Average TTL of First Match The TTL of the first message that found the first match (i.e. how many hops before the first hit). Average TTL of First Match 6 "TTLavgADP" "TTLavgBF" "TTLavgGNT" 5 Average TTL of First Match 4 3 2 1 0 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.14/18
Average Recall Proportion of all possible matches that was actually discovered. Average Recall 1 "RecallAvgADP" "RecallAvgBF" "RecallAvgGNT" 0.9 0.8 0.7 Average Recall 0.6 0.5 0.4 0.3 0.2 0.1 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.15/18
Average Recall Efficiency Average Recall divided by the number of messages transferred. Average Recall Efficiency 0.1 "RecallEffAvgADP" "RecallEffAvgBF" "RecallEffAvgGNT" 0.09 0.08 0.07 Average Recall Efficiency 0.06 0.05 0.04 0.03 0.02 0.01 0 0 1000 2000 3000 4000 5000 6000 7000 Number of Nodes CS253-Distributed Systems, Winter 2004 – p.16/18
Conclusions By propagating content synopses to peers that are selected adaptively we get: Faster search and retrieval. Less bandwidth wasted. Less processing power wasted. However, the recall of flooding is not reached. Room for future work! Other parameters Further propagation Pulling instead of pushing CS253-Distributed Systems, Winter 2004 – p.17/18
Thank you! Questions/comments? CS253-Distributed Systems, Winter 2004 – p.18/18
Recommend
More recommend