Outline Complex Adaptive Systems C.d.L. Informatica – Università di Bologna • Introduction to P2P systems • Common topologies Peer-to-peer systems and • Data location overlay networks • Churn • Newscast algorithm Fabio Picconi • Security Dipartimento di Scienze dell’Informazione 1 2 Peer-to-peer vs. client-server Example – Video sharing Client-server: YouTube Advantages • Client can disconnect after upload • Uploader needs little bandwidth downloader downloader • Other users can find the file easily (just use search on server webpage) Disadvantages downloader uploader • Server may not accept file or remove it later (according to content client-server peer-to-peer policy) • Server well connected to the • Only nodes located on the downloader downloader • Whole system depends on the server “center” of the Internet “periphery” of the Internet client-server (what if shut down like Napster?) • Servers carries out critical tasks • Tasks distributed across all nodes • Server storage and bandwidth • Clients only talk to server • Clients talk to other clients are expensive! 3 4
Example – Video sharing Comparison: P2P vs. client-server Peer-to-peer: BitTorrent Client-server Peer-to-peer Advantages • Asymmetric: client and servers • Symmetric: each node carries out • Does not depend on a central server carry out different tasks the same tasks • Bandwidth shared across nodes • Global knowledge: servers have • Local knowledge: nodes only downloader downloader (downloaders also act as uploaders) a global view of the network know a small set of other nodes • High scalability, low cost • Centralization: communications • Decentralization: nodes must self- Disadvantages and management are centralized organize in a decentralized way downloader seeder • Single point of failure: a server • Robustness: several nodes may fail • Seeder must remain on-line to failure brings down the system with little or no impact guarantee file availability • Limited scalability: servers • High scalability: high aggregate • Content is more difficult to find downloader downloader easily overloaded capacity, load distribution (downloaders must find .torrent file) peer-to-peer • Expensive: server storage and • Freeloaders cheat in order to • Low-cost: storage and bandwidth bandwidth capacity is not cheap download without uploading are contributed by users 5 6 Characterizing peer-to-peer systems P2P environment The main characteristics of P2P systems are: P2P systems are deployed in a challenging environment: • decentralization (i.e., no central server) • High latency and low bandwidth between nodes - a high hop count will result in a high end-to-end latency • self-organization (e.g., adding new nodes and removing disconnected ones) - transferring large files may take a long time • symmetric communications (e.g., peers act as clients and servers) • Churn • scalability (thanks to high aggregate capacity and load distribution) - nodes may disconnect temporarily - new nodes are constantly joining the system, while others leave the • shared ownership (i.e., storage and bandwidth are contributed by peers) overlay permanently • overlay construction and routing (i.e., nodes form a logical network on • Security top of the underlying IP network) - P2P clients run on machines under full control of their users - data sent to other nodes may be erased, corrupted, disclosed, etc. - malicious users may try to bring down the system (e.g., routing attack) • Selfishness a message from one peer to - users may run hacked P2P clients in order to avoid contributing resources another is sent through the underlying IP network 7 8
Problems Topology Some of the problems that a P2P systems designer must face: Some common topologies: • Overlay construction and maintenance • Flat unstructured: a node can connect to any other node - maintain a given overlay topology (e.g., random, two-level, ring, etc.) - only constraint: maximum degree d max - fast join procedure • Data location - usually very tolerant to churn - locate a given data object among a large number of nodes - good for data dissemination, bad for location • Data dissemination • Two-level unstructured: nodes connect to a supernode - propagate data in an efficient and robust manner - supernodes form a small overlay • Per-node state - used for indexing and forwarding - keep the amount of state per node small - large state and high load on supernodes • Tolerance to churn • Flat structured: constraints based on node ids - maintain system invariants (e.g., topology, data location, data availability) - allows for efficient data location despite node arrivals and departures - constraints require long join and leave procedures - less robust in high-churn environments 9 10 Data location - Flooding Data location - Flooding Problem: find the set of nodes S that store a copy of object O (1) Flooding (cont.) Flooding in a flat unstructured network: Solutions: (1) Flooding : send a search message to all nodes [first Gnutella protocol] • A search message contains either keywords or an object id obj Advantages : - simplicity - no topology constraints search horizon for Disadvantages : TTL = 2 search - high network overhead (huge traffic generated by each search request) - flooding stopped by TTL (which produces search horizon) Objects that lie outside of the horizon are not found - only applicable to small number of nodes 11 12
Data location - Superpeers Data location - Superpeers (2) Two-level overlay : use superpeers to index the locations of an object (2) Two-level overlay (cont.) [eMule, Gnutella 2, BitTorrent] • Each node connects to a superpeer and advertises the list of objects it stores request obj response • Search requests are sent to the superpeer, which forwards them to other superpeers Advantages : - highly scalable Disadvantages : - superpeers must be realiable, powerful and well connected to the Internet (expensive) • A two-level overlay is a partially centralized system - superpeers must maintain large state • In some systems superpeers do not connect to each other (e.g., BitTorrent) - the system relies on a small number of superpeers 13 14 Data location - KBR Data location - KBR (cont.) (3) Structured networks : use a routing algorithm that implements Key-Based Key-Based Routing [Pastry] route(k=8955,msg) Routing [Overnet, Kad, BitTorrent trackerless] Source node id: 04F2 04F2 Key-Based Routing (also known as Distributed Hash Tables, or DHTs ) E25A k = object id: 8955 obj works as follows: 8955 C52A 3A79 • each node is given a unique node identifier, or nodeid Hop # Hop id Shared prefix length 0 04F2 0 • given a key k , the node whose nodeid is numerically closest to k AC78 1 85E0 1 among all nodes in the network is known as the root of key k 2 8909 2 5230 3 8957 3 • given a routing key k , a KBR algorithm can route a message to the 8957 4 8954 3 (root of k) root of k in a small number of hops, usually O (log N ) 620F 8954 • the location of an object with id objectid is tracked by the root of obj obj8955 k = objectid 8909 8955 85E0 stored on Object 8955 is tracked by node 8954, 8821 nodes • thus, one can find the location of an object by routing a message to the which knows of two copies stored 620F,C52A overlay address space root of k = objectid and querying the root for the location of the object at nodes 620F and C52A [0000,FFFF] 15 16
Data location - KBR (cont.) Data location - KBR (cont.) Routing table for node 4F28 [Pastry] (3) Structured networks (cont.) Advantages : Node id: 4F28 - completely decentralized (no need for superpeers) Routing table - routing algorithm achieves low hop count for large network sizes 0 2A3 1 9BA 2 F34 … E 129 F 0A4 used to find Disadvantages : next hop with 4 0 9A 4 1 3C 4 2 88 … 4 E 01 N/A longer shared - each object must be tracked by a different node prefix 4F 0 4 4F 1 B N/A … 4F F 5 - objects are tracked by unreliable nodes (i.e., which may disconnect) 4F2 1 ... used to find the - keyword-based searches are more difficult to implement than nodeid closest Leaf Set with superpeers (because objects are located by their objectid) to a key that is close to the - the overlay must be structured according to a given topology 4F04 4F1B 4F21 - 4F30 4F55 4FF5 local nodeid in order to achieve a low hop count • In this example the routing table size is 4 x 15 = 60 entries , for a - routing tables must be updated every time a node joins or leaves the maximum network size of N = 65536 nodes . overlay • The average route length in this case is 4 hops . 17 18 Data location - Loosely structured overlays Data location - Loosely structured overlays (4) Loosely structured networks: use hints on the location of objects [Freenet] (4) Loosely structured networks (cont.) • Nodes locate objects by sending search requests containing the object id • A search response leaves routing hints on the path back to the source • Requests are propagated using a technique similar to flooding • Hints are used when propagating future requests for similar object ids • Objects with similar identifiers are grouped on the same nodes Hints request Hints AE5J: B for AE5J AE5J: D C 5B20: E C A A request B for AF02 B AE5J 5B20 AE5J 5B20 response F D E AF02 F D E AF02 Hints AE5J: F 19 20
Recommend
More recommend