power law revisited a large scale measurement study of
play

Power-law revisited: A large scale measurement study of P2P content - PowerPoint PPT Presentation

Power-law revisited: A large scale measurement study of P2P content popularity Gyrgy Dn Niklas Carlsson School of Electrical Engineering Department of Computer Science KTH, Royal Institute of Technology University of Calgary Stockholm,


  1. Power-law revisited: A large scale measurement study of P2P content popularity György Dán Niklas Carlsson School of Electrical Engineering Department of Computer Science KTH, Royal Institute of Technology University of Calgary Stockholm, Sweden Calgary, Canada 27 April 2010, I PTPS San Jose, CA 1

  2. P2P Content Popularity • I nstantenous popularity • Concurrent number of peers • Effectiveness of locality awareness • Little data available • Power-law ? • Download popularity • Number of peers that downloaded content • Effectiveness of caching • Several measurements • Power-law but flattened head (Mandelbrot-Zipf) • Measurements limited in time and coverage • How accurate are they? • How accurate can they be? L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, and X. Zhang, Measurement, Analysis, and Modeling of BitTorrent-like Systems, in Proc. ACM IMC, Oct. 2005. M. Hefeeda and O. Saleh, Traffic modeling and proportional partial caching for peer-to-peer systems, IEEE/ACM Trans. on Networking, vol. 16, no. 6, pp. 1447 1460, 2008. 2

  3. Measuring P2P Content Popularity Overlay structure  Measurement methodology • • Tracker based (BitTorrent) • Peer harvesting • Tracker query - scrape • Deep packet inspection • Unstructured (Gnutella, Ares, FastTrack) • Monitoring queries and replies • Deep packet inspection • Measurement = Sample of population wide popularity • Probability sampling - difficult • Opportunity sampling • Inference can be misleading 3

  4. Measurement Methodology • Screen scrape of Mininova.org • Largest torrent search engine • 31 Aug. 2008, 15 Oct. 2008, 31 Aug. 2009 • Scrape URL of 1690 BitTorrent trackers • Scrape of 721 BitTorrent trackers (S,L,D) • 15 Sept. 2008 to 17 Aug. 2009 • weekly, daily at 8pm GMT • Almost instantaneous (< 30mins) Time Tracker scrapes Mininova scrapes 4

  5. Zipf’s Law and Beyond • Zipf’s Law f  1 f , ) ( ) r   Zipf ( f 1 r • Heavy tail       ar lim e f ( ) r a 0, 0  Zipf ( f , )  1 r • Mandelbrot-Zipf Law f  1 f , , ) ( ) r      ( MZipf f 1 ( r ) • Flattened head • Generalized Zipf Law f  1 f ( ) r             ( , , , ) GZipf f (1/ ) r 1 (1 / ( / ) e ) • Light tail       ar lim ( ) 0, 0 e f r a    GZipf ( f , , , )  1 r • Flattened head • Power-law trunk 5

  6. Zipf’s Law and Beyond - Example Head Trunk 6 10 Popularity 4 10 Tail 2 10 Zipf(1e+007,1) MZipf(1e+007,50,1) GZipf(2e+005,0.02,1e-005,1) 0 10 0 2 4 6 10 10 10 10 Rank 6

  7. What we measured (I) • Instantaneous popularity 6 10 15.09.2008 Max peers 22.09.2008 23.09.2008 4 17.08.2009 10 Number of peers Total peers: 42 million 2 10 Power-law? 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) Active swarms 7

  8. What we measured (II) • Download popularity 10 10 Max downloads: 17 Aug.2009 (48 weeks) 50 million Number of downloads (15 Sept.2008 to) 16 Mar.2009 (26 weeks) Total downloads: 13 Oct.2008 (4 weeks) 8.3 billion 22 Sept.2008 (1 week) Power-law? 5 10 Power-law? Active swarms 0 10 0 2 4 6 8 10 10 10 10 10 Torrent rank (r) 8

  9. Instantaneous Popularity • I nstantaneous popularity 15 Sept 2008, 8pm GMT Max: 1.6x10 5 , Total: 4.23x10 7 , Active: 2.93x10 6 • 6 10 Number of peers (leechers,seeds) Peers Leechers Seeds 4 10 2 10 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) 9

  10. Power-law or Double-power-law? • I nstantaneous popularity 15 Sept 2008, 8pm GMT Max: 1.6x10 5 , Total: 4.23x10 7 , Active: 2.93x10 6 • 6 10 Power-law trunk Number of peers (leechers,seeds) Peers hypothesis: Leechers Seeds -Max: 10 6 4 10 -Total: 6.1x10 7 -Active: 9.5x10 6 2 10 Sampling Zipf(1.6e+05, 0.60) Zipf(1e+06, 0.86) artifact? GZipf(1.5e+05, 0.08, 1e-06, 0.86) 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) 10

  11. Sampling and Exponential cutoff • I nstantaneous popularity 15 Sept 2008, 8pm GMT 2.93x10 6 samples from Double-Zipf in two ways • • PropTor (discover torrent proportional to its popularity) • UnifTor (discover torrent uniform at random) 6 10 Double-Zipf fit Measured PropTor UnifTor Number of peers 4 10 PropTor sampling Total: 4.23x10 7 2 introduces 10 Total: 4.02x10 7 exponential PMCC=0.99 cutoff 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) 11

  12. Download Popularity • Download popularity over 4 and 48 weeks Active: 2.29x10 6 and 7.17x10 6 torrents • Number of downloads (15 Sept.2008 to) 10 10 8 10 6 10 4 10 2 10 Measured 13 Oct.2008 Measured 17 Aug.2009 0 10 0 2 4 6 8 10 10 10 10 10 Torrent rank (r) 12

  13. Power-law vs. Exponential cutoff • Download popularity over 4 and 48 weeks Active: 2.29x10 6 and 7.17x10 6 torrents • Number of downloads (15 Sept.2008 to) 10 10 Measured 17 Aug.2009 Zipf(5e+07, 0.50) 8 10 Zipf(5e+08, 0.95) GZipf(3.4e+07,0.06,1.1e-06,0.95) 6 10 4 48 weeks 4 weeks 10 Double power- Double power- Measured 13 Oct.2008 2 law hypothesis: law hypothesis: 10 Zipf(1.3e+07, 0.35) Zipf(5e+08, 1.20) -Active: 1.43x10 9 -Active: 1.77x10 7 GZipf(8.4e+06,0.033,1.5e-06,1.20) 0 10 0 2 4 6 8 10 10 10 10 10 Torrent rank (r) 13

  14. Sampling and Exponential cutoff • Download popularity over 4 weeks (15 Sept.2008-13 Oct.2008) 2.29x10 6 samples from Double-Zipf in two ways • • PropTor (discover torrent proportional to its popularity) • UnifTor (discover torrent uniform at random) 8 10 Double-Zipf fit Measured PropTor Number of downloads 6 10 UnifTor PropTor 4 10 sampling Total: 1.31x10 9 introduces Total: 1.21x10 9 2 exponential 10 PMCC=0.99 cutoff 0 10 0 2 4 6 8 10 10 10 10 10 Torrent rank (r) 14

  15. Impact of Sampling Large torrents • Instantaneous popularity 15 Sept 2008, 8pm GMT 2.93x10 6 active torrents, 4.23x10 7 total peers overrepresented • • sampled in 5 ways 6 10 PirateBay, PropTor, Original Mininova UnifTor: 6.55x10 5 torrents PirateBay PropTor Number of peers 4 10 UnifTor PropPeer: 4.23x10 5 PropPeer peers (1% of total) Mininova: 9.7x10 5 2 10 torrents Heavy-tailed 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) 15

  16. Impact of Sampling • Download popularity over 4 weeks Large torrents 2.29x10 6 active torrents, 1.31x10 9 total downloads • overrepresented • Sampled in 5 ways 8 10 Original PirateBay, PropTor, Mininova UnifTor: 1.69x10 6 torrents 6 PirateBay Number of downloads 10 PropTor UnifTor Mininova: 4.95x10 5 PropPeer active torrents 4 10 PropPeer: 1.31x10 6 peers (0.1% of total) 2 10 Heavy-tailed 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) 16

  17. Summary • Large measurement study of P2P content popularity • Instantaneous popularity • Download popularity • Instantaneous popularity • Power-law head?, power-law trunk • Tail may be power-law • Download popularity • Flat head, power-law trunk • Tail may be power-law for short periods • Not power-law for long periods • Sampling and measured characteristics • Infer with care 17

  18. Power-law revisited: A large scale measurement study of P2P content popularity György Dán Niklas Carlsson School of Electrical Engineering Department of Computer Science KTH, Royal Institute of Technology University of Calgary Stockholm, Sweden Calgary, Canada 27 April 2010, I PTPS San Jose, CA 18

Recommend


More recommend