analysis and modeling of the kad p2p network
play

Analysis and modeling of the KAD P2P network Bachelor thesis - PowerPoint PPT Presentation

Lehrstuhl f ur Netzarchitekturen und Netzdienste Analysis and modeling of the KAD P2P network Bachelor thesis summary presentation Maximilian Sievert Lehrstuhl f ur Netzarchitekturen und Netzdienste Institut f ur Informatik


  1. Lehrstuhl f¨ ur Netzarchitekturen und Netzdienste Analysis and modeling of the KAD P2P network Bachelor thesis summary presentation Maximilian Sievert Lehrstuhl f¨ ur Netzarchitekturen und Netzdienste Institut f¨ ur Informatik Technische Universit¨ at M¨ unchen May 29, 2013 M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 1

  2. Outline Introduction and Context Crawling framework and conducted crawls Evaluation Conclusion and future research M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 2

  3. Part I Introduction and context M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 3

  4. Motivation and goals P2P network simulators Used to analyze interaction between P2P overlay and IP underlay Behavior of P2P nodes Geographic distribution of nodes, AS Analysis of KAD (aMule/eMule) to determine metrics PlanetLab vantage points Reasons for KAD one of the largest active P2P networks simple, open source protocol M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 4

  5. Kademlia / Kad Kademlia P2P distributed hash table (DHT) Kademlia: structured P2P network, XOR distance metric Routing Table: unbalanced binary tree of k-buckets Protocol changes in Kad (eMule, aMule): 128 bit md4 key/node IDs instead of 160 bit 2 protocol versions: Packet compression above certain size M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 5

  6. Related work Steiner 2008: Blizzard crawler: [IP , TCP , UDP , ID] mapping snapshot daily full crawls for a year, zone crawls every 5 minutes for 6 months Jie Yu et al 2009 ‘ID Repetition in Kad’: similar crawler ID reuse port aliasing non-persistent IDs silent peers M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 6

  7. Part II Crawling framework M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 7

  8. Crawling process Adaptation of Steiner’s crawler Blizzard. Own additions: Protocol version 2 decompression throttle delay parameter Data structures: Queue U of discovered yet uncontacted nodes Hashset D of all discovered nodes (stores IP , UDP port) Process: Initialize U with inital set of starting peers Sender-thread loop, Receiver-thread loop Abort conditions: U empty for a while, timeout, network issue Output: Binary dump of sent requests / received responses, text log M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 8

  9. Crawling parameters Limitations: Bandwidth Firewall/Rate restrictions Parameters: Request type: number of contacts, 1-31 Request burst size: under 10 to avoid remote spam block Request throttling: Limit on nodes queried per second Zone filter: restrict ’valid’ nodes to specific ID prefix M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 9

  10. Conducted crawls TU Munich (net.in.tum.de) Early crawls: # Start Duration Discovered Queried Responsive 1 22/03/11 15:24 00:59:50 2.685.010 63% 26% 2 22/03/11 21:54 00:59:56 1.950.599 65% 29% 3 22/03/11 22:55 00:51:20 1.920.401 64% 29% 4 05/04/11 16:27 00:59:53 3.070.211 64% 24% 5 06/04/11 13:24 00:46:46 2.816.465 55% 29% 6 13/04/11 01:38 00:56:56 1.960.204 100% 21% 7 14/04/11 20:31 01:14:24 2.334.305 69% 32% 8 18/04/11 18:11 01:59:57 2.735.059 81% 19% 9 19/04/11 04:47 01:53:29 2.229.108 100% 16% 10 19/04/11 23:42 01:35:16 1.853.972 100% 20% 11 20/04/11 09:05 01:59:58 2.421.390 90% 15% 12 20/04/11 18:07 01:59:59 2.596.452 81% 18% 13 20/04/11 20:26 01:53:58 2.153.941 100% 20% Adaptation and addition of features and parameters to crawler. M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 10

  11. Conducted crawls TU Munich (net.in.tum.de) # Start Duration P Discovered Queried Responsive 14 09/05/11 12:49 01:59:56 3 ms b8 2.733.412 77% 19% 15 10/05/11 07:34 01:39:59 3 ms 2.388.565 76% 19% 16 10/05/11 14:52 01:59:59 3 ms 2.939.943 68% 21% 17 10/05/11 19:25 02:23:21 3 ms 2.517.320 100% 20% 18 10/05/11 22:15 01:50:54 3 ms 2.033.205 100% 23% 19 18/05/11 09:50 01:59:57 2 ms 2.523.813 99% 16% 20 18/05/11 12:24 01:59:57 2 ms 2.637.730 90% 16% 21 19/05/11 10:23 02:08:34 2 ms 2.615.946 100% 16% 22 19/05/11 13:46 02:30:18 2 ms 3.002.986 100% 16% 23 19/05/11 17:24 07:18:46 2 ms 3.589.031 100% 17% 24 23/05/11 01:28 02:38:23 5 ms -n 2.229.040 100% 20% 25 26/05/11 13:11 03:54:46 10 ms -m 2.957.286 70% 20% 26 30/05/11 15:12 03:35:54 4 ms -m 2.771.051 63% 21% 27 30/05/11 20:09 03:23:33 4 ms -m 2.214.414 100% 22% 28 20/06/11 07:28 01:09:37 4 ms r7 2.359.295 37% 43% 29 20/06/11 08:52 07:22:31 4 ms r7 5.787.889 100% 18% 30 22/06/11 12:00 03:09:35 4 ms r7 3.821.829 100% 17% M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 11

  12. Conducted crawls PlanetLab nodes as global vantage points: # Start Duration P Discovered Queried Responsive China (planetlab-1.sjtu.edu.cn) 31 19/05/11 13:33 07:31:44 10 ms 3.682.385 100% 15% 32 22/05/11 13:08 08:06:00 10 ms 3.875.197 100% 15% 33 20/06/11 04:27 03:09:08 10 ms r7 2.795.254 49% 28% Brazil (plab2.larc.usp.br) 34 19/05/11 13:38 08:08:17 10 ms 4.059.290 100% 17% 35 22/05/11 23:56 02:29:54 10 ms 2.259.123 100% 21% US: Denver (linux2.cs.du.edu) 36 20/06/11 00:50 05:12:27 10 ms 3.895.919 58% 25% 37 20/06/11 06:12 04:22:08 10 ms 3.818.514 54% 29% 38 22/06/11 04:03 09:59:58 20 ms 4.034.685 73% 19% US: California (planet4.cs.ucsb.edu) 39 19/05/11 14:11 00:26:25 10 ms 1.934.332 12% 61% US: Mass. (planetlab2.cs.umass.edu) 40 20/05/11 19:15 05:16:11 5 ms 2.300.916 100% 16% Italy (planetlab2.di.unito.it) 41 19/05/11 16:42 05:24:48 10 ms 2.779.760 90% 19% 42 20/05/11 19:32 03:03:37 5 ms 2.034.142 100% 18% M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 12

  13. Part III Evaluation M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 13

  14. Topology: network size M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 14

  15. Topology: ID distribution IDs commonly (aMule, eMule) randomly initialized once, then persistent. Figure : 8 bit ID prefix histogram for nodes in crawl 20110622 M M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 15

  16. Topology: ID distribution (filtered) Notable IDs: 0x0000... 0x09262ce48db41838ce94c80cdaab3fab 0x025e747cea687ccab41c95fa62a27a5d 0x1000000 4 byte prefix Figure : Filtered 8 bit ID prefix histogram for nodes in crawl 20110622 M Except for a number of client classes the assumption is valid. M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 16

  17. Topology: Node IN degree Measured as: number of unique remote nodes having a node as a contact in their routing table Figure : Histogram of observed in degrees of all/responsive nodes in crawl 20110510 M2 M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 17

  18. Geographic distribution Determined by IP mapping. Figure : Overview of geographic location of nodes in crawl 20110622 M M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 18

  19. Geographic distribution: continents Figure : Distribution of observed key metrics by continents of crawl 20110510 M2 (left) and 20110622 M (right) M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 19

  20. Geographic distribution: widely known peers Over all crawls: stable nodes (found in at least 10 crawls) with persistently large IN degree (average over 1000) Figure : Geographic location of widely known and stable peers. M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 20

  21. Autonomous systems Mapping data: 38143 distinct autonomous systems Participating number of AS: 6658 for 20110510 M2 7626 for 20110622 M [IP , TCP] clients AS in 20110510 M2 AS in 20110622 M 1 2504 2601 2 - 10 2583 3092 11 - 100 1113 1384 101 - 1000 330 392 1001 - 10000 96 120 10001 - 100000 29 33 100001 - 1000000 3 4 ASN / Name of crawl responsive ID uniqueness AS4134 Chinanet 27.2547% 15.61% 41.08 % AS4837 CNCGROUP China169 Backbone 19.9828% 20.80% 41.48 % AS3269 Telecom Italia S.p.a. 4.2992% 34.02% 82.11 % AS4808 CNCGROUP IP network China169 Beijing Province Network 3.0233% 20.37% 67.42 % AS4812 China Telecom (Group) 2.6414% 13.86% 74.38 % AS3462 Data Communication Business Group 2.1754% 40.30% 83.83 % AS3352 Internet Access Network of TDE 2.1413% 27.50% 82.02 % AS3215 France Telecom - Orange 1.7685% 30.37% 88.64 % AS9394 CHINA RAILWAY Internet(CRNET) 1.7389% 27.12% 79.33 % AS1267 Infostrada S.p.A 1.5942% 35.96% 77.10 % M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 21

  22. Autonomous systems: characteristics Characterization of AS by following features: Responsiveness: percentage of nodes from AS that responded ID uniqueness: Number of unique IDs in AS divided by Number of nodes in AS M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 22

Recommend


More recommend