Understanding Block-level Address Usage in the Visible Internet Xue Cai and John Heidemann USC/Information Sciences Institute Aug. 31, 2010, SIGCOMM’10 xuecai@isi.edu 1
The Discovery of Halley's Comet xuecai@isi.edu 2
The Discovery of Halley's Comet “It’s the same 2 historical records object which (year 1531, 1607) returns to earth 1 observation every 76 years. ” (year 1682) Edmond Halley 1 simple 3 simple characteristic an astronomer observations of the comet SIMPLE observations inferred SIMPLE conclusion can have TREMENDOUS value. xuecai@isi.edu 3
pings responses Internet Our Q: what can simple Address Utilization? Dynamic Addressing? observations about the …… Internet say? xuecai@isi.edu 4
Key Contributions Methodology positive non- negative time response - Active probing, pattern analysis, clustering, classification Application - Network management, resource allocation, Internet trend study Validation - USC’s network, the general Internet, consistency across time xuecai@isi.edu 5
Key Contributions Methodology Application Validation Group addresses More frequent probing? USC’s network, Spatial into blocks by Block sizes? General Internet, Correlation? usage Block-level usage? Consistency Find blocks with Resource reallocation? USC’s network, Address General Internet, less than 10% Efficient management? Utilization? Consistency time responsive Blocks switching Botnet detection? USC’s network, Dynamic state (up/down) Spam filtering? General Internet, Addressing? frequently Click fraud? Consistency Utilize standard Auto content serving? USC’s network, Low-bitrate deviation of RTTs Network management? General Internet Identification? xuecai@isi.edu 6
Key Contributions Methodology Application Validation Group addresses More frequent probing? USC’s network, Spatial into blocks by Block sizes? General Internet, Correlation? usage Block-level usage? Consistency Address Utilization? See See See paper paper paper Dynamic Addressing? Utilize standard Auto content serving? USC’s network, Low-bitrate deviation of RTTs Network management? General Internet Identification? xuecai@isi.edu 7
Related Work • J. Heidemann, Y. Pradkin, R. Govindan, C. Papadopoulos, G. Bartlett, and J. Bannister. Census and Survey of the Visible Internet. In Proceedings of the ACM Internet Measurement Conference (IMC) , p. 169-182. Vouliagmeni, Greece, October, 2008. • What’s the same? – Collection methodology (and datasets) – Error bounds on ping census accuracy: undercounts by about 40% – Preliminary metrics • What’s new? deeper understanding; new interpretation • new metrics – block-level analysis, not just addresses – RTT, not just responsivness • new algorithms – block identification – low-bitrate identification • new conclusions – evaluation of block utilization – trends of address utilization – trends of dynamic addressing xuecai@isi.edu 8
Key Contributions Methodology Application Validation Group addresses More frequent probing? USC’s network, Spatial into blocks by Block sizes? General Internet, Correlation? usage Block-level usage? Consistency Address Utilization? See See See paper paper paper Dynamic Addressing? Utilize standard Auto content serving? USC’s network, Low-bitrate deviation of RTTs Network management? General Internet Identification? xuecai@isi.edu 9
Background: What space? • IPv4 address space • address block : p/n: addresses with common n -bit prefix p • a.b.c.d and a.b.c .( d+ 1) are adjacent addresses A /24 block ( p/24 ) with 256 addresses, Layout Hilbert Curve keeps adjacent addresses physically near each other. Hilbert Curve xuecai@isi.edu 10
Hypothesis: Spatial Correlation • What is Spatial Correlation? – adjacent addresses are likely to be used in the same way spatial correlation of address blocks usage blocks • Usage blocks – are NOT allocated blocks , but correlated • Internet addresses are allocated in blocks (ICANN to regional registries to ISPs to you) • addresses in one block are usually assigned to similar users – are what we want to observe if exist • observable blocks usage blocks xuecai@isi.edu 11
Spatial Correlation: Application • Why care? – Efficiently select representative addresses to conduct more detailed study • Addresses in one block are used in the same way • So only need few representatives to probe in the future xuecai@isi.edu 12
Spatial Correlation: Methodology Input : data for Data Collection individual addresses Output : address sharing Representation similar usage grouped into observable blocks Block Identification xuecai@isi.edu 13
Spatial Correlation: Data Collection How How ? Ping each address in random /24 blocks every 11 minutes for a week and collect the probe responses . 1% of the allocated IPv4 address space probed. addresses positive non- negative time response Data Collection Why Why ? Systematic pings reveal more information. Representation time Validity of ping : IMC’08 paper established error bounds: not perfect, but often pretty good; ~40% undercount Block Identification xuecai@isi.edu 14
Spatial Correlation: Data Collection positive positive 1 address non- non- negative negative time time response response 1 /24 block (256 consecutive addresses) address time Data Collection 24,000 random /24s Representation time Block Identification xuecai@isi.edu 15
Spatial Correlation: Representation Why One survey: > 5 billion ping responses, need more meaningful representation to represent address usage Data Collection 24,000 random /24s Representation Block Identification xuecai@isi.edu 16
Spatial Correlation: Representation given series of ping responses over time positive non- negative time response each represents period to next probe Data Collection a series of up durations Representation Block Identification xuecai@isi.edu 17
Spatial Correlation: Representation probing duration length: 10 1 st duration 2 nd duration 3 rd duration length: 1 length: 2 length: 2 How 3 metrics to capture address usage Availability (A ) Volatility (V) Median-Up (U) := normalized sum := normalized # of up := median up Data Collection of up durations durations duration Example : Example : Example : Representation = (2+2+1) / 10 = 0.5 = 3 / (10/2) = 0.6 = median(2,2,1) = 2 Intuition : Intuition : Intuition : Block Identification utilization efficiency high V infers dynamics typical duration xuecai@isi.edu 18
Spatial Correlation: Block Identification 1D address time positive 2D negative & non-response Data Collection 1D 2D Representation low high Availability(A) Volatility(V) low high Hilbert Block Identification Curve White: Non-response xuecai@isi.edu 19
Spatial Correlation: Block Identification intra-block variance + intra-block variance Data Collection Idea : examine each Representation block size, if block is homogeneous, stop else split and Block How recurse Identification xuecai@isi.edu 20
Spatial Correlation: Block Identification not homogeneous => split homogeneous => stop not homogeneous => split Data Collection Idea : examine each Representation block size, if block is homogeneous, stop else split and Block How recurse Identification xuecai@isi.edu 21
Spatial Correlation: Validation • Validation is hard – Where to find ground truth? • decentralized management • usage block ground truth? • Use three complementary ways: – Compare to USC’s network ( operator provided truth ) – Compare to general Internet ( hostname inferred truth ) – Evaluate different samples and dates • is 1% of the Internet enough? yes! • trends change some over time • details: paper section 5.3 xuecai@isi.edu 22
Spatial Correlation: USC’s Network • Why Why – quite solid truth (operator provided) – knowledge of both allocated blocks and usage blocks • How How – compare observable blocks (result to validate) with usage blocks (ground truth) xuecai@isi.edu 23
Spatial Correlation: USC’s Network ground truth usage blocks approach is mostly false-neg. : incomplete non-use blocks we (23%) missed to sometimes identify error (20%) but what is found is correct false-pos. : blocks we wrongly very accurate when it reaches a conclusion identified xuecai@isi.edu 24
Spatial Correlation: General Internet • Why Why – unbiased truth (randomly selected) • How How – Infer usage blocks from hostnames • dhcp-host-xxx.example.net – compare observable blocks (result to validate) with usage blocks (ground truth) xuecai@isi.edu 25
Spatial Correlation: General Internet mostly correct (and more than USC) ground truth is hard to infer methodology more complete when evaluate with unbiased sample xuecai@isi.edu 26
Key Contributions Methodology Application Validation Group addresses More frequent probing? USC’s network, Spatial into blocks by Block sizes? General Internet, Correlation? usage Block-level usage? Consistency Address Utilization? See See See paper paper paper Dynamic Addressing? Utilize standard Auto content serving? USC’s network, Low-bitrate deviation of RTTs Network management? General Internet Identification? xuecai@isi.edu 27
Recommend
More recommend