Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Craig Prince and Danny Wyatt December 6, 2004, CSE561 Networks
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping What and Why Internet Mapping What is it? Figure out what the internet looks like Find routers and their interconnections Discern a topology What is it good for? Research Simulations Problem diagnosis Routing in overlay networks Spying on competing ISPs
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping How How do you map the internet? Cannot directly observe it Have to send traceroutes through it What you see depends on Source Target Routing policies ... and the topology itself! Can only control source and target... Errors can occur that do not reflect true topology... Things change over time...
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping How How do you map the internet? Cannot directly observe it Have to send traceroutes through it What you see depends on Source Target Routing policies ... and the topology itself! Can only control source and target... Errors can occur that do not reflect true topology... Things change over time... ...so just add as many as you can
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping How How do you map the internet? Cannot directly observe it Have to send traceroutes through it What you see depends on Source Target Routing policies ... and the topology itself! Can only control source and target... Errors can occur that do not reflect true topology... Things change over time... ...so just add as many as you can Is more really better?
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping Aims Aims How do different mapping tools compare in their efficient use of data? Are some kinds of measurements more valuable than others? If we are uncertain of our observations, how would different methods address that uncertainty?
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Mapping Tools The Data: 3 Mapping Tools Skitter 24 distributed sources Each uses 1 or more of 4 lists of preselected target Continually loop through lists We use 3 days: 12/18-20, 2002
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Mapping Tools The Data: 3 Mapping Tools Skitter 24 distributed sources Each uses 1 or more of 4 lists of preselected target Continually loop through lists We use 3 days: 12/18-20, 2002 Scriptroute 70 distributed PlanetLab nodes Each used same list of 125,000 address prefixes Attempted all traces once a day for three days (same as above)
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Mapping Tools The Data: 3 Mapping Tools Skitter 24 distributed sources Each uses 1 or more of 4 lists of preselected target Continually loop through lists We use 3 days: 12/18-20, 2002 Scriptroute 70 distributed PlanetLab nodes Each used same list of 125,000 address prefixes Attempted all traces once a day for three days (same as above) Rocketfuel 837 distributed public traceroute servers ≈ 60 , 000 targets Heuristic pruning of source-target pairs to maximize coverage Data collected over January, 2002
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Methodology Some Definitions A map is a directed graph G = ( V , E ) There is some impossible, true map ˆ G = ( ˆ V , ˆ E ) with 100% perfect coverage A map is made by aggregating many measurements Sources Targets Coverage is how well one map approximates another Marginal coverage is how much each measurement contributes to its map We evaluate the marginal coverage of each of the three tools
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Methodology More Definitions: Confidence Weighting Traceroutes are noisy sensors with probability of error d n ( e ) is number of observations of edge e ∈ E Probability that e exists is P ( e ) = 1 − d n ( e ) P e ∈ E P ( e ) Edge coverage of G is mean probability of all edges: | E | Node coverage is defined similarly For each analysis, also consider how it compares according to different values of d
Node Coverage per Source 1400000 140000 Error Probability 0.0 0.3 0.5 1200000 120000 0.9 Total Probed 1000000 100000 Node Coverage Node Coverage 800000 80000 600000 60000 400000 40000 Error Probability 0.0 0.3 200000 20000 0.5 0.9 Total Probed 0 0 0 5 10 15 20 25 0 10 20 30 40 50 60 70 Source Source Scriptroute Skitter 80000 Error Probability 0.0 0.3 70000 0.5 0.9 Total Probed 60000 50000 Node Coverage 40000 30000 20000 10000 0 0 100 200 300 400 500 600 700 800 900 Source Rocketfuel
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Analyses Entropy Entropy � H ( A ) = − P ( a ) log( P ( a )) a ∈ A Average number of bits needed to encode each event a We take the entropy of the mean node and edge distributions Should always be changing
Edge Entropy per Source 1 1 0.8 0.8 0.6 0.6 Edge Entropy Edge Entropy 0.4 0.4 Error Probability 0.0 Error Probability 0.3 0.0 0.5 0.3 0.9 0.2 0.5 0.2 Opt 0.9 Opt 0 0 0 5 10 15 20 25 0 10 20 30 40 50 60 70 Source Source Scriptroute Skitter 1 0.9 0.8 0.7 0.6 Edge Entropy 0.5 0.4 0.3 Error Probability 0.0 0.3 0.2 0.5 0.9 Opt 0.1 0 0 100 200 300 400 500 600 700 800 900 Source Rocketfuel
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Analyses K-L Divergence Kullback-Leibler Divergence � p A ( a ) � � KL ( A || B ) = p A ( a ) log p B ( a ) a Also known as relative entropy Average extra bits per event for encoding according to the wrong distribution We measure divergence between coverage up to a measurement and final coverage Marginal utility is the decrease in K-L divergence between measurements
K-L Divergence per Source 4 3 Error Probability 0.0 0.3 0.5 3.5 0.9 2.5 Error Probability Opt 0.0 0.3 3 0.5 0.9 2 Opt Target KL Divergence Edge KL Divergence 2.5 1.5 2 1.5 1 1 0.5 0.5 0 0 0 5 10 15 20 25 0 10 20 30 40 50 60 70 Source Source Scriptroute Skitter 6 Error Probability 0.0 0.3 0.5 5 0.9 Opt 4 Edge KL Divergence 3 2 1 0 0 100 200 300 400 500 600 700 800 900 Source Rocketfuel
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Analyses K-L Divergence K-L Divergence per Target 8 7 Error Probability Error Probability 0.0 0.0 0.3 7 0.3 0.5 6 0.5 0.9 0.9 Opt Opt 6 5 Edge KL Divergence 5 Edge KL Divergence 4 4 3 3 2 2 1 1 0 0 0 5000 10000 15000 20000 25000 30000 35000 0 10000 20000 30000 40000 50000 60000 Target Target Scriptroute Rocketfuel
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Conclusions Conclusions Adding targets is more useful than adding sources Half of all coverage comes from the first few sources Rocketfuel does increase its per measurement return More targets always yield more information More sources have diminished returns, but higher than other tools There is a pronounced trade off in confidence Rocketfuel has more divergence between different error probabilities More redundant tools are less effected
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Conclusions Conclusions These metrics can be used as heuristics for quicker mapping Reordering the second two days of Skitter data according to the first day: 1200000 1200000 Error Probability 0.0 0.3 0.5 Error Probability 0.9 0.0 1000000 1000000 Total Probed 0.3 0.5 0.9 Total Probed 800000 800000 Node Coverage Node Coverage 600000 600000 400000 400000 200000 200000 0 0 0 5 10 15 20 25 0 5 10 15 20 25 Source Source Day 1 Day 2 and 3, reordered
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Conclusions Conclusions Questions?
Recommend
More recommend