information replication strategy in unstructured peer to
play

Information Replication Strategy in Unstructured Peer-to-Peer - PowerPoint PPT Presentation

Introduction System design Preliminary results Conclusion Information Replication Strategy in Unstructured Peer-to-Peer Networks Using Thematic Agents Nicolas Bonnel, Gildas Mnier, Pierre-francois Marteau Laboratoire Valoria - Universit


  1. Introduction System design Preliminary results Conclusion Information Replication Strategy in Unstructured Peer-to-Peer Networks Using Thematic Agents Nicolas Bonnel, Gildas Ménier, Pierre-francois Marteau Laboratoire Valoria - Université de Bretagne Sud October 24, 2007 1 / 20

  2. Introduction System design Overview Preliminary results P2P architecture Conclusion Introduction 1 Overview P2P architecture System design 2 Preliminary results 3 Conclusion 4 2 / 20

  3. Introduction System design Overview Preliminary results P2P architecture Conclusion Overview Context Indexing very large databases The system constrains the location and replication of data Resources scavenging Peer to Peer architecture Allow to use more Fault tolerance computers, cheap cost Scalability Ex : SETI 3 / 20

  4. Introduction System design Overview Preliminary results P2P architecture Conclusion Structured p2p network Characteristic Constrain on data location (distributed hash function) Features Easy to retrieve rare items Approximatives and ranged queries very costly Load balancing problems Chord, CAN, Tapestry, ... 4 / 20

  5. Introduction System design Overview Preliminary results P2P architecture Conclusion Untructured p2p network Characteristic No constrain on data location Features Highly replicated items can be retrieved at a cheap cost Can control data placement Gnutella [Clip2, 2002], ... Very costly to retrieve rare items 5 / 20

  6. Introduction System design Preliminary results Conclusion Introduction 1 Overview P2P architecture System design 2 Preliminary results 3 Conclusion 4 6 / 20

  7. Introduction System design Preliminary results Conclusion System design Architecture Index documents, distributed index database (each node host a part) Unstructured peer-to-peer architecture Nodes have a summary of the keywords they host (Bloom filter) This summary allows to speed up query forwarding Replication on nodes with similar summary 7 / 20

  8. Introduction System design Preliminary results Conclusion Bloom Filters [Bloom, 70] Definition A array of m bits. h i : 0 < = i < k k hash functions. insert(x) : ∀ i : A [ h i ( x )] = 1 query(x) : true if ∀ i : A [ h i ( x )] == 1 False positives False positives are possible, but false negatives are not Probability of false positive : ( 1 − ( 1 − 1 m ) kn ) k 8 / 20

  9. Introduction System design Preliminary results Conclusion Replication strategy Agent behavior Agents control the number of replica for each data in the network An agent carry a keyword k (theme) and related indexed information Agents move randomly on the network It can create or delete replica according to its local knowledge Each step, small probability to have a new theme 9 / 20

  10. Introduction System design Preliminary results Conclusion Replication strategy Agent behavior Each time it visits a node N c , the agent computes a score φ ( k , N c ) = S ( k , N l ) S ( k , N c ) × f ( k ) α N l is the node where the agent has taken it’s theme S ( k , N ) : scoring function for a node N for the keyword k Measures a trade off between the space available and the degree of matching of k to the node Bloom filter f ( k ) : frequence of last nodes visited hosting k α : constant that tunes the replication amount to achieve 10 / 20

  11. Introduction System design Preliminary results Conclusion Replication strategy Agent behavior Replicating bound τ inf and Deleting bound τ sup τ inf + τ sup = 1 2 If φ ( k , N c ) ≤ τ inf , creation of a replica for k on the local node N c If φ ( k , N c ) ≥ τ sup , all indexed information for k is removed from the local node N c m Network with m nodes : 100 × α average number of replicas for each data 11 / 20

  12. Introduction System design Preliminary results Conclusion Introduction 1 Overview P2P architecture System design 2 Preliminary results 3 Conclusion 4 12 / 20

  13. Introduction System design Preliminary results Conclusion Experiments settings General settings 400 nodes, random graph like topology Node degree between 2 and 8 30 000 documents from Wikipedia Bloom filter’s size : 8192 (2 13 ) Number of hash functions : 32 1000 queries generated at random Agents settings 2000 agents 100 nodes recorded Replicating constant : α = 2 Bounds τ inf = 0 . 8, τ sup = 1 . 2 13 / 20

  14. Introduction System design Preliminary results Conclusion Preliminary results Evolution of the number of replicas and filters occupation. Number of replicas : normal distribution centered around 13 Filters occupation increase from 43 . 5 % to 70 . 9 % Filters occupation stable since 5 replicas 14 / 20

  15. Introduction System design Preliminary results Conclusion Preliminary results Random walk in unreplicated and replicated environment. Half queries are answered within 1000 hops without replication Half queries are answered within 50 hops with 13 replicas Results are still good even with the failure of half nodes. 15 / 20

  16. Introduction System design Preliminary results Conclusion Preliminary results Ratio between unreplicated and replicated environment. 22 times faster (in average) to answer between 5 % and 50 % of queries. The distribution of replicas is homogoneous, as wherever the query is forwarded at random, it still finds a replica of the searched information. 16 / 20

  17. Introduction System design Preliminary results Conclusion Preliminary results Self-healing capacities Failure of half of nodes (ie : memory of those nodes reseted) Average number of replicas drops to 6 . 5 Information lost : 0 . 036 % Then the number of replicas grows like in the first figure 17 / 20

  18. Introduction System design Preliminary results Conclusion Introduction 1 Overview P2P architecture System design 2 Preliminary results 3 Conclusion 4 18 / 20

  19. Introduction System design Preliminary results Conclusion Conclusion Conclusion Information replication with agents Algorithm fully decentralized, scales very well Self healing properties Resilient to hard failures Future Work Larger network More dynamic environement 19 / 20

  20. Introduction System design Preliminary results Conclusion References Clip2. The gnutella protocol specification v0.4, 2002. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM , 13(7) :422–426, 1970. 20 / 20

Recommend


More recommend