Information Replication Strategy in Unstructured Peer-to-Peer - PowerPoint PPT Presentation

Introduction System design Preliminary results Conclusion Information Replication Strategy in Unstructured Peer-to-Peer Networks Using Thematic Agents Nicolas Bonnel, Gildas Ménier, Pierre-francois Marteau Laboratoire Valoria - Université de Bretagne Sud October 24, 2007 1 / 20

Introduction System design Overview Preliminary results P2P architecture Conclusion Introduction 1 Overview P2P architecture System design 2 Preliminary results 3 Conclusion 4 2 / 20

Introduction System design Overview Preliminary results P2P architecture Conclusion Overview Context Indexing very large databases The system constrains the location and replication of data Resources scavenging Peer to Peer architecture Allow to use more Fault tolerance computers, cheap cost Scalability Ex : SETI 3 / 20

Introduction System design Overview Preliminary results P2P architecture Conclusion Structured p2p network Characteristic Constrain on data location (distributed hash function) Features Easy to retrieve rare items Approximatives and ranged queries very costly Load balancing problems Chord, CAN, Tapestry, ... 4 / 20

Introduction System design Overview Preliminary results P2P architecture Conclusion Untructured p2p network Characteristic No constrain on data location Features Highly replicated items can be retrieved at a cheap cost Can control data placement Gnutella [Clip2, 2002], ... Very costly to retrieve rare items 5 / 20

Introduction System design Preliminary results Conclusion Introduction 1 Overview P2P architecture System design 2 Preliminary results 3 Conclusion 4 6 / 20

Introduction System design Preliminary results Conclusion System design Architecture Index documents, distributed index database (each node host a part) Unstructured peer-to-peer architecture Nodes have a summary of the keywords they host (Bloom filter) This summary allows to speed up query forwarding Replication on nodes with similar summary 7 / 20

Introduction System design Preliminary results Conclusion Bloom Filters [Bloom, 70] Definition A array of m bits. h i : 0 < = i < k k hash functions. insert(x) : ∀ i : A [ h i ( x )] = 1 query(x) : true if ∀ i : A [ h i ( x )] == 1 False positives False positives are possible, but false negatives are not Probability of false positive : ( 1 − ( 1 − 1 m ) kn ) k 8 / 20

Introduction System design Preliminary results Conclusion Replication strategy Agent behavior Agents control the number of replica for each data in the network An agent carry a keyword k (theme) and related indexed information Agents move randomly on the network It can create or delete replica according to its local knowledge Each step, small probability to have a new theme 9 / 20

Introduction System design Preliminary results Conclusion Replication strategy Agent behavior Each time it visits a node N c , the agent computes a score φ ( k , N c ) = S ( k , N l ) S ( k , N c ) × f ( k ) α N l is the node where the agent has taken it’s theme S ( k , N ) : scoring function for a node N for the keyword k Measures a trade off between the space available and the degree of matching of k to the node Bloom filter f ( k ) : frequence of last nodes visited hosting k α : constant that tunes the replication amount to achieve 10 / 20

Introduction System design Preliminary results Conclusion Replication strategy Agent behavior Replicating bound τ inf and Deleting bound τ sup τ inf + τ sup = 1 2 If φ ( k , N c ) ≤ τ inf , creation of a replica for k on the local node N c If φ ( k , N c ) ≥ τ sup , all indexed information for k is removed from the local node N c m Network with m nodes : 100 × α average number of replicas for each data 11 / 20

Introduction System design Preliminary results Conclusion Experiments settings General settings 400 nodes, random graph like topology Node degree between 2 and 8 30 000 documents from Wikipedia Bloom filter’s size : 8192 (2 13 ) Number of hash functions : 32 1000 queries generated at random Agents settings 2000 agents 100 nodes recorded Replicating constant : α = 2 Bounds τ inf = 0 . 8, τ sup = 1 . 2 13 / 20

Introduction System design Preliminary results Conclusion Preliminary results Evolution of the number of replicas and filters occupation. Number of replicas : normal distribution centered around 13 Filters occupation increase from 43 . 5 % to 70 . 9 % Filters occupation stable since 5 replicas 14 / 20

Introduction System design Preliminary results Conclusion Preliminary results Random walk in unreplicated and replicated environment. Half queries are answered within 1000 hops without replication Half queries are answered within 50 hops with 13 replicas Results are still good even with the failure of half nodes. 15 / 20

Introduction System design Preliminary results Conclusion Preliminary results Ratio between unreplicated and replicated environment. 22 times faster (in average) to answer between 5 % and 50 % of queries. The distribution of replicas is homogoneous, as wherever the query is forwarded at random, it still finds a replica of the searched information. 16 / 20

Introduction System design Preliminary results Conclusion Preliminary results Self-healing capacities Failure of half of nodes (ie : memory of those nodes reseted) Average number of replicas drops to 6 . 5 Information lost : 0 . 036 % Then the number of replicas grows like in the first figure 17 / 20

Introduction System design Preliminary results Conclusion Conclusion Conclusion Information replication with agents Algorithm fully decentralized, scales very well Self healing properties Resilient to hard failures Future Work Larger network More dynamic environement 19 / 20

Introduction System design Preliminary results Conclusion References Clip2. The gnutella protocol specification v0.4, 2002. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM , 13(7) :422–426, 1970. 20 / 20

Information Replication Strategy in Unstructured Peer-to-Peer - PowerPoint PPT Presentation

Introduction System design Preliminary results Conclusion Information Replication Strategy in Unstructured Peer-to-Peer Networks Using Thematic Agents Nicolas Bonnel, Gildas Mnier, Pierre-francois Marteau Laboratoire Valoria - Universit

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

August 23, 2012 Data Replication/ETL: Terms Data Replication : Data Replication is the process of

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

THE PEER-TO-PEER NETWORK JOHN NEWBERY @jfnewbery github.com/jnewbery THE PEER-TO-PEER NETWORK

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

Peer-to-Peer Networks 09 Random Graphs for Peer-to-Peer-Networks Christian Ortolf Technical

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

Peer to Peer Learning & Support Aims and Objectives of this Workshop Workshop 3: Peer to

Peer-to-Peer Networking and Discovery Technologies Week 6 Whats Peer-to-Peer? A different

CFD General Notation System (CGNS) Usage for unstructured grids Edwin van der Weide Stanford

New features in MySQL Replication Lars Thalmann, Development Manager, Replication & Backup

Todays Topics - Chapter 15 Slide 1 performance enhancement Replication Replication of

P2P: Distributed Hash Tables Chord + Routing Geometries Nirvan Tyagi CS 6410 Fall16

SpamResist: Making Peer-to-Peer Tagging SpamResist: Making Peer-to-Peer Tagging Systems Robust to

Discovering Lincoln Researchers PBRF 2018 Showed us the impressive research being undertaken at

iFLR INTERNATIONAL FINANCIAL LAW REVIEW Multinationals face e-discovery challenges Steven

DETECTION USING SENTENCE CORRELATIONS Muharram Mansoorizadeh and Taher Rahgooy Bu-Ali Sina

E C O N O MI C E V E N T D E T E C T I O N F O R C O MP A N Y - S P

Analyst Meeting Q&A (Earnings Release for the Fiscal Year Ended March 31, 2019) Questioner No.

Second Quarter 2019 Earnings August 5, 2019 Forward-Looking Statements This presentation

in Big Data Projects, with Case Studies in Digital Analytics Paul Maiste Mark Kovscek PAUL

Alyona Medelyan @zelandiya Anna Divoli @annadivoli Problem 1 London New York How do lawyers

Sambuz

Useful Links

Newsletter

Mail Us

Information Replication Strategy in Unstructured Peer-to-Peer - PowerPoint PPT Presentation

Introduction System design Preliminary results Conclusion Information Replication Strategy in Unstructured Peer-to-Peer Networks Using Thematic Agents Nicolas Bonnel, Gildas Mnier, Pierre-francois Marteau Laboratoire Valoria - Universit

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

August 23, 2012 Data Replication/ETL: Terms Data Replication : Data Replication is the process of

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

THE PEER-TO-PEER NETWORK JOHN NEWBERY @jfnewbery github.com/jnewbery THE PEER-TO-PEER NETWORK

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

Peer-to-Peer Networks 09 Random Graphs for Peer-to-Peer-Networks Christian Ortolf Technical

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

Peer to Peer Learning &amp; Support Aims and Objectives of this Workshop Workshop 3: Peer to

Peer-to-Peer Networking and Discovery Technologies Week 6 Whats Peer-to-Peer? A different

CFD General Notation System (CGNS) Usage for unstructured grids Edwin van der Weide Stanford

New features in MySQL Replication Lars Thalmann, Development Manager, Replication &amp; Backup

Todays Topics - Chapter 15 Slide 1 performance enhancement Replication Replication of

P2P: Distributed Hash Tables Chord + Routing Geometries Nirvan Tyagi CS 6410 Fall16

SpamResist: Making Peer-to-Peer Tagging SpamResist: Making Peer-to-Peer Tagging Systems Robust to

Discovering Lincoln Researchers PBRF 2018 Showed us the impressive research being undertaken at

iFLR INTERNATIONAL FINANCIAL LAW REVIEW Multinationals face e-discovery challenges Steven

DETECTION USING SENTENCE CORRELATIONS Muharram Mansoorizadeh and Taher Rahgooy Bu-Ali Sina

E C O N O MI C E V E N T D E T E C T I O N F O R C O MP A N Y - S P

Analyst Meeting Q&amp;A (Earnings Release for the Fiscal Year Ended March 31, 2019) Questioner No.

Second Quarter 2019 Earnings August 5, 2019 Forward-Looking Statements This presentation

in Big Data Projects, with Case Studies in Digital Analytics Paul Maiste Mark Kovscek PAUL

Alyona Medelyan @zelandiya Anna Divoli @annadivoli Problem 1 London New York How do lawyers

Sambuz

Useful Links

Newsletter

Mail Us

Peer to Peer Learning & Support Aims and Objectives of this Workshop Workshop 3: Peer to

New features in MySQL Replication Lars Thalmann, Development Manager, Replication & Backup

Analyst Meeting Q&A (Earnings Release for the Fiscal Year Ended March 31, 2019) Questioner No.