CS 557 Routing Measurements End to End Routing Behavior in the Internet Vern Paxson, 1996 Internet Routing Instability Labovitz, Malan, Jahanian, 1997 Spring 2013
End to End Routing Behavior • Objective: – Understand the actual behavior of Internet routing • Approach: – Use traceroute to measure routes from multiple sites. • Contributions: – Analysis of how routing is really behaving in 1994-1996. – Example of how to conduct large-scale measurements – Importance of observing real data
Review and Expected Behavior (1/2) • How does traceroute work? – Start with a TTL 1, get an ICMP reply from router 1 hop away. – Next use a TTL 2, get an ICMP message from router 2 hops away. – Continue until reach destination
Review and Expected Behavior (2/2) • traceroute to 129.82.100.64 (129.82.100.64), 30 hops max, 40 byte packets � • 1 FastEthernet6-0.civ-service1.Canberra.telstra.net (203.50.1.65) 0.236 ms 0.176 ms 0.243 ms � • 2 GigabitEthernet3-0.civ-core2.Canberra.telstra.net (203.50.10.129) 0.762 ms 0.814 ms 0.776 ms � • 3 GigabitEthernet2-2.dkn-core1.Canberra.telstra.net (203.50.6.126) 1.052 ms 1.008 ms 0.942 ms � • 4 Pos4-1.ken-core4.Sydney.telstra.net (203.50.6.69) 4.983 ms 4.953 ms 5.036 ms � • 5 10GigabitEthernet3-0.pad-core4.Sydney.telstra.net (203.50.6.86) 5.31 ms 5.281 ms 5.2 ms � • 6 GigabitEthernet2-2.syd-core02.Sydney.net.reach.com (203.50.13.42) 26.281 ms 5.318 ms 5.322 ms � • 7 i-4-0.syd-core01.net.reach.com (202.84.221.89) 5.475 ms 5.456 ms 5.528 ms � • 8 i-12-1.wil-core02.net.reach.com (202.84.144.65) 162.252 ms 162.236 ms 162.178 ms � • 9 i-6-2.wil04.net.reach.com (202.84.251.186) 162.542 ms 162.561 ms 162.509 ms � • 10 lax-brdr-01.inet.qwest.net (205.171.4.53) 162.866 ms 162.401 ms 162.305 ms � • 11 lax-core-02.inet.qwest.net (205.171.19.41) 162.745 ms 162.563 ms 162.469 ms � • 12 bur-core-01.inet.qwest.net (205.171.8.42) 168.971 ms 168.827 ms 169.185 ms � • 13 dia-core-03.inet.qwest.net (205.171.8.118) 204.15 ms 204.166 ms 203.956 ms � • 14 dvr-edge-09.inet.qwest.net (205.171.10.70) 204.313 ms 204.007 ms 204.078 ms � • 15 65.121.56.106 (65.121.56.106) 204.027 ms 203.851 ms 203.971 ms � • 16 peer01.ari-co.icg.net (170.147.161.87) 204.062 ms 204.299 ms 204.243 ms � • 17 165.236.232.190 (165.236.232.190) 205.499 ms 205.336 ms 205.43 ms � • 18 csu-frgp-gw.colostate.edu (129.82.10.5) 206.788 ms 206.451 ms 207.029 ms � • 19 129.82.2.10 (129.82.2.10) 207.259 ms 206.967 ms 207.849 ms � • 20 yuma.acns.colostate.edu (129.82.100.64) 206.985 ms 206.941 ms 207.193 ms �
Path Vector Routing and Loops 1. Link(D,X) fails => Path(D,X)=none Path(X)=C,D,X Path(X)=B,C,D,X 2. Update Next(C,X)=D Next(B,X)=C Path(A,X)=A,B,C,D,X B C arrives at D => Path(D,X)=? Claim D will ignore this path … why?? D A X Path(X)=D,X Path(X)=A,B,C,D,X Next(D,X)=X Next(A,X)=B
Internet Routing Loops
Prevalence and Persistence • Prevalence: how likely is it you will encounter a route? • Persistence: how long will the route last? • Very different metrics – Can be prevalent, but not persistent – Why is persistence important? – Why is prevalence important?
Internet Route Persistence
[Pax96] Conclusions • Important to measure the actual system behavior. • Some conclusions as of 1996.. – Routing pathologies are emerging as a challenge for the growing Internet. – Internet routes are heavily dominated by a prevalent route. – But wide variation in persistence – About 2/3 of paths persisted for days or weeks. • Next we consider how well BGP responds to changes in policy and topology … .
Internet Routing Instability • Objective: – Analyze BGP updates and identify BGP routing behaviors and pathologies • Approach: – Log BGP updates collected from peering point. • Contributions: – Identification of routing pathologies. – Identification of routing convergence problems
Exchange Points • Public Exchange Points – Network and physical location for connecting BGP routers from different Autonomous Systems. – Not all routers peer with each other (polices) UUNet Regional AS1 Verio Regional AS2 Sprint Regional AS3 Regional AS4 AT&T Regional AS5 Monitoring Point
Multi-Homing and BGP (1/2) 10.0.0.0/9 AS1 10.0.0.0/8 10.128.0.0/10 Path=AS4,{AS1,AS2,AS3} AS2 AS4 10.192.0.0/10 10.192.0.0/10 Path=AS4,AS5 AS5 All traffic to 10.192.0.0/10 AS3 will follow link AS5-AS3 (/10 more specific than /8)
Multi-Homing and BGP (2/2) 10.0.0.0/9 10.0.0.0/8 Path=AS4,{AS1,AS2,AS3} AS1 10.192.0.0/10 10.128.0.0/10 Path AS4, AS3 AS2 AS4 10.192.0.0/10 10.192.0.0/10 Path=AS4,AS5 AS5 Traffic to 10.192.0.0/10 AS3 Split between link AS5-AS3 and AS4-AS3
Types of Routing Events • Forwarding Instability (change of path) – WADiff = withdraw, announce different – AADiff = implicit withdrawal by replacing with different route. • Possible Pathologies – WADup = Withdraw then reannounce – AADup = Implicit withdraw by replacing with new route that has same AS Path and same next route • Pathologies – WWDup = withdraw already withdrawn route
Gross Observations • Internet Stats in 1997 – 45,000 prefixes (BGP destinations) – 1300 Autonomous Systems – 1500 AS paths observed in updates • BGP Updates – Average of 125 updates per prefix per day – Over 30 million updates on one day – Surprise since BGP should only send update if path changes – Dominated by WWDup, AADup, and WADup
MAE-East Gross Observations Duplicate Withdrawals not shown
Duplicate Withdrawals • Dominate the BGP Update traffic – 500,000 to 6 million duplicate withdrawals per day at MAE-East • ISP I Example – 259 prefixes announces – 2.4 million withdrawals – Withdrawals for 14,112 prefixes – Withdrawals for nearly 14K prefixes never announced in the first place. • Partial Explanation: Stateless BGP – Don ’ t keep track of what you advertised – Thus propagate any withdraw to all neighbors • Even if you never announced the route in the first place
Some Duplicate Explanations • Stateless BGP – Don ’ t keep track of what you advertised – Thus propagate any withdraw to all neighbors • Even if you never announced the route in the first place • BGP MRAI Timer – MRAI Timer: Advertise only once every 30 seconds – Time 0: update P1 sent – Time 10: P1 changes to P2 but annoucment delayed due to MRAI – Time 20: P2 changes back to P1 so delayed update modified to list P1 – Time 30: update listing P1 sent • Self-Synchronization (recall earlier paper) – Need to add jitter to MRAI timer
Routing Instability Major upgrade in end of May. More instability at 10am
No Dominant Problem AS No Single AS dominates instability. Not a correlation between size and portion of instability. Instability is evenly distributed across routers
Yet The Internet Mostly Works
Conclusions • Very high percentage of pathological updates. • No dominant AS responsible for problems. • Lots more current work on BGP measurements – Need to understand the current system – Reminder that systems don ’ t behave as expected – Fix current problems to keep network running – Draw lessons for future protocol designs
Recommend
More recommend