Introduction Peer-to-Peer Computing • Peer-to-Peer (P2P) employ distributed resources to perform function in a decentralized manner – Resource can be: computing, storage, bandwidth… – Function can be: computing, data sharing, D. Milojicic, V. Kalogeraki, R. Lukose, K. collaboration … • The goal of this paper is to describe what is P2P Nagaraja, J. Pruyne, B. Richard, S. Rollins and what is not P2P and Z. Xu • P2P gained visibility during Napster – But was here before ( Doom , Internet telephony ) Technical Report HPL-2002-57 – But has moved beyond ( KaZaa , Gnutella ) HP Laboratories, Palo Alto – And includes more ( Seti@home ) • Simple definition is it include sharing – giving and March 2002 obtaining from peer community Taxonomy of Computer Systems What’s New and What’s Not Simplified Architecture Centralized Peer-to-Peer Client-Server Degree of Centralization Taxonomy of P2P Systems “Hybrid” Initial communication is centralized (Tough to get around. For example, how to find peers?) Pure: Gnutella, Freenet Hybrid: Napster Intermediate: KaZaa (super peers) 1
Outline Decentralization and Taxonomy • Introduction (done) • Components and Algorithms (next) • Systems • Case Studies • Summary P2P Components P2P Algorithms – Centralized Index (Specific applications here) (Different data types) (Robust when peers autonomous) • Search central index, download content from peer (Find and move data – Popular with Napster among) • Need representation for “best” peer (Overcome dynamic nature – Cheapest, closest, most available of peers) P2P Algorithms – Flooded Requests P2P Algorithms – Document Routing • Each request flooded (broadcast) to directly connected peers • When document published, generate hash – Repeat until answered or too many hops (5-9) • Uses lots of network capacity based on name and content • Move document node with ID closest to hash • Revise with • Requests also migrate to such node – “Super-Peer” to concentrate most requests – Note, requires knowing document name ahead of – Caching of recent requests time, so harder to do search 2
Outline P2P Systems • Introduction • Historical (done) • Components and Algorithms • Distributed Computing (done) • Systems • File Sharing (next) • Case Studies • Collaboration • Summary Historical (1 of 2) Historical (2 of 2) • Prior to continuously connected computers • Most early distributed systems were P2P (Internet) had UUNet and Fidonet – Examples: • Email (on top of SMTP peers) – Would periodically dial-up and exchange • Usenet News (on top of NNTP peers) information (email and bboard) – Message routing – Local servers communicated with peers • Similar to Gnutella • File Transfer (via FTP) centralized • In “modern” area, first widely used P2P was – But since many ran own server, similar to instant messaging today’s file sharing • P2P interest shift came because of legal – Indexing system named “Archie” to query ramifications (Napster) across FTP servers • Exactly like Napster – (MLC: plus traffic! See next paper.) Distributed Computing P2P Systems • Clusters • Historical – Inexpensive PCs plus open source software • Distributed Computing � super computer • NASA’s Beowulf project, MOSIX, … • File Sharing • Collaboration – Issues include delegation and migration • Grid computing – Connect distributed computers so can use idle cycles – Transparent way to add jobs, have work executed, results returned 3
How it Works Distributed Computing • Parallelizable job • Historical – Split into subtasks • PCs agree to – January 1999, 10k computers broke RSA participate challenge in less than 24 hours • Centralized • Users realized the power of Internet PCs dispatcher • Recent • When PCs idle (screensaver), – seti@home and genome@home subtasks work • Send results to – Realize a teraflop centralized DB • P2P? Application Area Examples P2P Systems • Financial – Complex market simulations (pricing, • Historical portfolios, credit, …) • Distributed Computing – Run-during night, but real-time important – Plus, larger so only big institutions • File Sharing – Use P2P – speedup 15 hours to 30 minutes, • Collaboration and available to smaller companies • Biotechnology – Colossal amounts of data (3 billion sequences in human genome dbase) – Only high-perf clusters and approximation – But using P2P can do exact and used by smaller companies File Sharing File Sharing Examples • One of the most successful • Napster • Features – Centralized index, single peer download – Since centralized does not scale well, performance – Large, when otherwise could not store • Multimedia content inherently large files may suffer • Morpheus – Available, from multiple sources – Simultaneous downloads from multiple peers – Anonymity to protect publisher and reader – Encryption for privacy – Manageability for better performance • KaZaa (download from close hosts) • Issues: bandwidth consumption, search, and – Distribute centralized among SuperNodes – Use “intelligent” selection for peers security – MD5 checksums to verify content 4
P2P Systems Collaboration • Historical • Instant messaging to chat to online games • Distributed Computing • Finding location of peers still a challenge • File Sharing • Use centralized server for peer location • Collaboration – NetMeeting, GameSpy, … • Use out-of-band system to identify peers – Ie- call on telephone and give IP Outline Case Studies • Introduction • Avaki (done) (distributed computing) • Components and Algorithms (done) (distributed computing) • seti@home • Systems • Groove (done) (collaboration) • Case Studies • Magi (next) (collaboration) • Summary • FreeNet (file sharing) • Gnutella (file sharing) • JXTA (platforms) • .Net (platforms) Seti@home Magi (1 of 2) • Search for Extraterrestrial Intelligence • Background • P2P infrastructure for building secure, – Search through massive amounts of radio telescope data to look for signals collaborative applications – Build huge virtual computer by using idle cycles on – Started as research project from UC Internet computer Berkeley 1998, commercial release 2001 • Runs computation as part of screen saver • Uses standard technology: HTTP, XML, – Old enough project so robust tools • Features WebDAV – Fault resilience – since clients can stop at anytime, – "Web-based Distributed Authoring and use checkpointing every 10 minutes Versioning“ - extensions to HTTP to allow – Scalability – horizontal, but vertical (to db) could collaborative edits at remote web servers still be a bottleneck (still, many users) • Was largest non-Sun Java project • Lessons – Can apply this technology to real problems – Expected 100k participants, but have 3 million 5
Magi (2 of 2) FreeNet • File sharing with primary design is to make system anonymous – Read, Publish, Store • Completely decentralized – File location based on hash (and on path in-between) – Hash generated automatically – Users find hash names by out-of-band source (ie- posted on Web page) • Nodes cache until full, then LRU • Core is micro-Apache server • Nodes do “search” to announce presence to others • Users could build modules over Magi services • Scales to O(log n) • Uses DNS to find Magi servers • Available as open source • No fault resilience • Lessons: issues of anonymity (good for discourse, • JVM and Server means maybe tough for PDA bad for intellectual property rights) • Existing standards makes highly interoperable .NET Summary • More than P2P (c#, tools, Web servers), but “My Services” has a lot of P2P stuff • As P2P matures, infrastructure will improve • Microsoft introduced in 2000 • Goals is to enable Web servers to variety – Increased interoperability – More robust software of devices. Focus on user data. • Will remain an important technology because: – Scalability a concern, especially with global “Passport” login connections gives puid . That used for services. – Ad-hoc, disconnected networks lend themselves to P2P Cons: - only Windows? – Some applications inherently P2p Future Work • Algorithms – Scalable, anonymity, connectivity • Applications – Beyond music and movie sharing • Platforms – Tools to build better, newer P2P systems 6
Recommend
More recommend