Modeling Peer-Peer File Sharing Systems Zihui Ge, Daniel R. Figueiredo, Sharad Jaiswal, Jim Kurose, Don Towsley INFOCOM 2003
Outline P2P file sharing architectures Napster (centralized) Gnutella (flooding) Chord (routing) General model framework for P2P file sharing closed queuing network model parameters model solution Apply model to study performance scalability, freeloaders, etc Summary 2
User behavior in P2P file sharing search file transfer phase query response file request phase user user user generates query file transfer 3
Centralized Indexing Architecture (CIA) ❏ Central server (cluster) File owner N 2 stores global index N 3 N 1 ❏ Napster Internet DB user N 5 N 4 Lookup(“LetItBe”) search file transfer phase query response phase file request user 4 generates query file transfer
Distributed Indexing with Flooded queries Architecture (DIFA) ❏ Limited-scope flooding to N 2 locate files N 3 N 1 File owner ❏ Gnutella Internet user N 5 Lookup(“LetItBe”) N 4 search file transfer phase query response phase file request user 5 generates query file transfer
Distributed Indexing with Hash- directed queries Architecture (DIHA) Hash(“LetItBe”)=N 4 ❏ Hash-directed query File owner N 2 RT N 3 N 1 ❏ Tapestry, Chord, CAN RT RT ❏ Only handles exact query Internet N 5 N 4 user RT RT Lookup(“LetItBe”) Hash(“LetItBe”)=N 4 search file transfer phase query response phase file request user 6
File transfers File owner N 2 N 3 N 1 Internet ❏ File transfers download(“LetItBe”) directly between N 5 file owner and N 4 user receiver search file transfer phase query response phase file request user 7 generates query file transfer
Modeling P2P file sharing systems: challenges Unique workload/service model: peers generate workload (queries, downloads) but also add service capacity (file sharing, process query) Complex peer behavior: transient: off-line, on-line (inactive), on-line (active - query, download) different classes of peers: ■ freeloaders ■ service capacity 8
A general model On-Line File download Query 1 Thinking p 1 Processing q p off M p M Off-Line Closed loop, fixed population of peers No structural dependency on architecture 9
A general model w/ multiple classes of peers On-Line File download Thinking 1 p 1 Query Processing q p off M p M Off-Line Different classes of peers have different behaviors 10
Modeling query processing Modeled by a single server queue Service rate of queue is a function of # peers on-line: μ q (N a ) Query failure prob. (q) associated with each file request File download Thinking 1 p 1 Query Processing q p off M p M Off-Line 11
Modeling file downloading Associate each unique file in system with a “service capacity” modeled by single server queue Requests chosen w/ probability p j : rank j (by req. popularity) Service capacity μ f (N a , i) is function of # replicas: file availability: rank i (by # of replicas) File download Thinking 1 # peers on-line p 1 Query Processing q p off M p M Off-Line 12
Model parameters Capacity for Capacity for proc. Capacity for = C 1 = C 0 N a / i = C 3 N a /log N a downloading queries in CIA proc. file w/ availability rank: queries in DIHA C 0 N a / i α i File download C 0 Σ c W (c) N a (c) / i Thinking 1 p 1 Query p off Processing q M p M Off-Line 13
Model solutions Performance metric: expected system throughput (# files downloaded per unit of time) Approximate numerical solution bottleneck analysis with multiple classes of peers set of non-linear equations, solved via fixed-point mostly independent of service rate functions ■ flexibility to use other functions Simulation in more general cases approximations validated 14
Scalability with Population 1000 Workload: System Troughput: T 1000 files 100 12 hour off-line 30 minutes on-line CIA idle period 10 DIFA average of 5 downloads per DIHA 1 active period 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 Total Population: N System throughput scales with population size in distributed indexing architectures 15
Impact of Freeloaders 100,000 non-freeloaders Freeloaders: 160 Do not share files System Throughput 120 Support query processing 80 More aggressive CIA DIFA shorter think times 40 DIHA double file downloads 0 0 1000 2000 3000 4000 Number of Freeloaders (in thousands) P2P can support large number of 16 freeloaders
Impact of Freeloaders (Cont’d) 100,000 non-freeloaders 12 Throughput of non-freeloader Freeloaders impact 10 non-freeloaders 8 marginal effect 6 when system not saturated 4 CIA DIFA 2 DIHA 0 0 1000 2000 3000 4000 Number of Freeloaders (in thousands) P2P can support large number of 17 freeloaders
Mismatch between file availability and request popularity What if service capacity 60 500,000 peers doesn’t match 55 CIA popularity? System throughput 50 DIFA Each file ranked by DIHA 45 request popularity ( j ) 40 # replicas available ( i ) 35 Randomly match 30 ranks within a window 25 20 i-w < j < i+w 0 1 1 10 100 1000 10000 . Rank permutaion window: w Small mismatches have little effect; large mismatches do 18
Supernodes (Kazaa) Kazaa: 1400 1:11 # supernodes : # total nodes = 2-level hierarchy 1:12 1200 1:10 System Throughput top-level 1000 1:20 1:6 well provisioned 800 supernodes 600 ■ higher capacity 400 1:1 gnutella-like 200 bottom-level 0 connect to single 0.E+00 1.E+08 2.E+08 3.E+08 4.E+08 supernode Total Population: N Hierarchical design improves system thruput 19
Summary Simple models: insights into fundamental performance questions of P2P file sharing systems compare different architectures scalability on peer population impact of freeloaders impact of imbalance of file availability and request popularity Model extensions : hierarchical peer structure off-line to on-line transition phase 20
Recommend
More recommend