Active Server Sibling Resolution Robert Beverly, Arthur Berger ∗ Naval Postgraduate School ∗ MIT/Akamai rbeverly@nps.edu, awberger@mit.edu January 7, 2013 NPS IPv6 Measurement Meeting 2013 Beverly & Berger (NPS) NPS-SIX 2013 1 / 24
Sibling Resolution Intro Outline Sibling Resolution Intro 1 Methodology 2 Results 3 Beverly & Berger (NPS) NPS-SIX 2013 2 / 24
Sibling Resolution Intro Sibling Resolution New Problem We Term “Sibling Resolution:” Given a candidate ( IPv 4 , IPv 6 ) address pair, determine if these addresses are assigned to the same cluster, device, or interface. Sibling resolution may be either active or passive. Lots of prior work on passive sibling associations: e.g. web-bugs, javascript, etc. Prior work focuses on clients (adoption, performance) This work: Targeted, active test: on-demand for any given pair Infrastructure: finding server siblings Beverly & Berger (NPS) NPS-SIX 2013 3 / 24
Sibling Resolution Intro Motivation Why? IPv4 and IPv6 expected to co-exist (for a long while?) → dual-stacked devices Track adoption (and dis-adoption) Track IPv6 evolution Security: Inter-dependence of IPv6 on IPv4 (and vice-versa) e.g. attack on IPv6 resource affecting IPv4 service Performance: Measurements of IPv4 vs. IPv6 performance Desire to isolate path vs. host performance Correlating geolocation, reputation, etc with IPv4 host counterpart. Beverly & Berger (NPS) NPS-SIX 2013 4 / 24
Methodology Outline Sibling Resolution Intro 1 Methodology 2 Results 3 Beverly & Berger (NPS) NPS-SIX 2013 5 / 24
Methodology Targeted, Active Technique Targeted, Active Technique Intuition: IPv4 and IPv6 share a common transport-layer (TCP) stack Leverage prior work on physical device fingerprinting using TCP timestamp clockskew [Kohno 2005] TCP timestamp option: “TCP Extensions for High Performance” [RFC1323, May 1992] Universal support for TCP timestamps (modulo middleboxes, proxies). Enabled by default. Beverly & Berger (NPS) NPS-SIX 2013 6 / 24
Methodology TCP Timestamp Clock Skew TCP Timestamp Clock Skew TS value: 4 bytes containing current clock Note: RFC does not specify value of TS (assume millisec for now) Note: TS clock � = system clock Note: TS clock frequently unaffected by system clock adjustments (e.g. NTP) Basic Idea: Probe over time. Fingerprint is clock skew (and remote clock resolution). Beverly & Berger (NPS) NPS-SIX 2013 7 / 24
Methodology TCP Timestamp Clock Skew Some Details Must be able to connect to remote TCP service on each host Periodically connect to TCP service. Given a sequence of timestamp offsets, use linear programming to obtain a line that minimizes distance to points, constrained to be under data points. Obtain: y 4 = α 4 x + β 4 and y 6 = α 6 x + β 6 Angle between lines then: � � α 4 − α 6 θ ( α 4 , α 6 ) = tan − 1 � � � � 1 + α 4 α 6 � � Siblings if: θ < τ Beverly & Berger (NPS) NPS-SIX 2013 8 / 24
Methodology Examples Example Example Gather 4 timestamp series: www.caida.org (v4 and v6) www.ripe.net (v4 and v6) Beverly & Berger (NPS) NPS-SIX 2013 9 / 24
Methodology Examples Example 40 Observe different skew 30 20 slopes (one negative) observed offset (msec) 10 Different timestamp 0 -10 granularity -20 -30 y = 0 . 029938 x equates -40 to skew of ≈ 1.8ms / Host A (IPv6) -50 Host B (IPv4) α =0.029938 β =-3.519 -60 minute, or ≈ 15 minutes α =-0.058276 β =-1.139 -70 0 200 400 600 800 1000 per year. measurement time(sec) False siblings! CAIDA IPv6 vs. RIPE IPv4 Beverly & Berger (NPS) NPS-SIX 2013 10 / 24
Methodology Examples Example 40 10 30 0 20 -10 observed offset (msec) 10 observed offset (msec) 0 -20 -10 -30 -20 -40 -30 -40 -50 Host A (IPv6) Host A (IPv6) -50 Host B (IPv4) Host A (IPv4) -60 α =0.029938 β =-3.519 α =-0.058253 β =-1.178 -60 α =-0.058276 β =-1.139 α =-0.058276 β =-1.139 -70 -70 0 200 400 600 800 1000 0 200 400 600 800 1000 measurement time(sec) measurement time(sec) False Siblings True Siblings CAIDA IPv4 vs. CAIDA IPv6: identical slopes ( θ = 0 . 0098) CAIDA IPv6 vs. RIPE IPv4: different slopes ( θ = 31 . 947) Beverly & Berger (NPS) NPS-SIX 2013 11 / 24
Methodology Examples Complications 250 193.110.128.199 2001:67c:2294:1000::f199 200 150 observed offset (msec) Not always so distinct of 100 a difference! 50 Slope angle difference: 0 θ = 2 . 046 -50 0 10000 20000 30000 40000 50000 60000 70000 measurement time(sec) www.marca.com (#6 on alexa ipv6) Beverly & Berger (NPS) NPS-SIX 2013 12 / 24
Methodology Examples Complications 4.5e+09 apache.org V4 apache.org V6 4e+09 Raw TCP timestamps 3.5e+09 3e+09 Deterministically random TCP Timestamp 2.5e+09 and monotonic for a 2e+09 single connection 1.5e+09 1e+09 Random across 5e+08 connections. Looks like 0 0 50 100 150 200 noise to us. TCP Packet Sample www.apache.com Beverly & Berger (NPS) NPS-SIX 2013 13 / 24
Methodology Examples Complications 0.025 203.5.76.12 2001:388:1:5062::cb05:4c0c 0.02 0.015 observed offset (msec) 0.01 What’s going on here? 0.005 0 -0.005 0 10000 20000 30000 40000 50000 60000 70000 measurement time(sec) Beverly & Berger (NPS) NPS-SIX 2013 14 / 24
Methodology Examples Complications 2e+15 209.85.225.160 2001:4860:b007::a0 0 -2e+15 -4e+15 Also detects load observed offset (msec) -6e+15 -8e+15 balancing among -1e+16 servers -1.2e+16 -1.4e+16 But how to deal with it? -1.6e+16 -1.8e+16 -2e+16 0 10000 20000 30000 40000 50000 60000 70000 measurement time(sec) Beverly & Berger (NPS) NPS-SIX 2013 15 / 24
Results Outline Sibling Resolution Intro 1 Methodology 2 Results 3 Beverly & Berger (NPS) NPS-SIX 2013 16 / 24
Results Machine Sibling Inference Machine Sibling Inference Methodology: Analyze Alexa top 100,000 websites Pull A and AAAA records 1398 ( ≈ 1 . 4%) have IPv6 DNS Repeatedly fetch root HTML page via IPv4 and IPv6 via deterministic IP address Record all packets Beverly & Berger (NPS) NPS-SIX 2013 17 / 24
Results Machine Sibling Inference Alexa 100K Targeted Machine-Sibling Inference Case Count v4 and v6 non-monotonic (possible siblings) 109 (7.8%) v4 or v6 non-monotonic (non-siblings) 140 (10.0%) v4 and v6 no timestamps (possible siblings) 94 (6.7%) v4 or v6 no timestamps (non-sibling) 101 (7.2%) Our technique fails when timestamps are not monotonic across TCP flows (e.g. load-balancer or BSD OS) Or, when timestamps are not supported (e.g. middlebox) Note, can disambiguate non-siblings Beverly & Berger (NPS) NPS-SIX 2013 18 / 24
Results Machine Sibling Inference Alexa 100K Targeted Machine-Sibling Inference Case Count v4 and v6 non-monotonic (possible siblings) 109 (7.8%) v4 or v6 non-monotonic (non-siblings) 140 (10.0%) v4 and v6 no timestamps (possible siblings) 94 (6.7%) v4 or v6 no timestamps (non-sibling) 101 (7.2%) Skew-based siblings 839 (60.0%) Skew-based non-siblings 115 (8.3%) Total 1398 (100%) 25.5% (356) non-siblings 43% of skew-based non-siblings are in different ASes Beverly & Berger (NPS) NPS-SIX 2013 19 / 24
Results DNS Machine Siblings DNS Machine Siblings With respect to collecting DNS siblings, would like to differentiate between machine and equipment siblings. Tie passive and active DNS collection with skew-based inference. For addresses with an DNS equivalence class: Add IP to machine sibling group with small θ < 1 . 0 Else θ ≥ 1 . 0, create new sibling group with single IP . Until all IPs of equipment equivalence class clustered Beverly & Berger (NPS) NPS-SIX 2013 20 / 24
Results DNS Machine Siblings DNS Machine Siblings 1 Fraction Equipment Equiv Classes 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 Num Machine Equiv Classes Relationship between equipment siblings and machine siblings. Beverly & Berger (NPS) NPS-SIX 2013 21 / 24
Results Evaluating Sibling Inference Accuracy Evaluating Inference Accuracy Seek to understand the accuracy of timestamp-based sibling inference Use ground-truth dual-stacked Akamai machines No load-balancers or middleboxes Experiment: 100 known-siblings, 100 known non-siblings (random v4/v6 pairs drawn from Akamai population) Hardest scenario : single organization, similar boxes, same operating system, etc. Beverly & Berger (NPS) NPS-SIX 2013 22 / 24
Results Evaluating Sibling Inference Accuracy Evaluating Inference Accuracy Prediction sibling non 84 13 sibling ′ TP FN Actual 43 54 non ′ FP TN Threshold τ = 0 . 002 gives best results! 71% accuracy, 66% precision, 87% recall (f-score: 0.75) Beverly & Berger (NPS) NPS-SIX 2013 23 / 24
Results Evaluating Sibling Inference Accuracy Evaluating Inference Accuracy Prediction sibling non 97 0 sibling ′ TP FN Actual 94 3 non ′ FP TN No false negatives w/ τ = 0 . 05 (but more FP’s) 52% accuracy, 51% precision, 100% recall (f-score: 0.67) Beverly & Berger (NPS) NPS-SIX 2013 24 / 24
Results Current Work Current Work Quantify whether vantage point imparts any difference on results Refine inference algorithm to deal with load-balancers Refine algorithm to produce better accuracy, eliminate false positives Beverly & Berger (NPS) NPS-SIX 2013 25 / 24
Recommend
More recommend