cs 557 routing measurements
play

CS 557 Routing Measurements End to End Routing Behavior in the - PowerPoint PPT Presentation

CS 557 Routing Measurements End to End Routing Behavior in the Internet Vern Paxson, 1996 Internet Routing Instability Labovitz, Malan, Jahanian, 1997 Spring 2013 End to End Routing Behavior Objective: Understand the actual behavior of


  1. CS 557 Routing Measurements End to End Routing Behavior in the Internet Vern Paxson, 1996 Internet Routing Instability Labovitz, Malan, Jahanian, 1997 Spring 2013

  2. End to End Routing Behavior • Objective: – Understand the actual behavior of Internet routing • Approach: – Use traceroute to measure routes from multiple sites. • Contributions: – Analysis of how routing is really behaving in 1994-1996. – Example of how to conduct large-scale measurements – Importance of observing real data

  3. Review and Expected Behavior (1/2) • How does traceroute work? – Start with a TTL 1, get an ICMP reply from router 1 hop away. – Next use a TTL 2, get an ICMP message from router 2 hops away. – Continue until reach destination

  4. Review and Expected Behavior (2/2) • traceroute to 129.82.100.64 (129.82.100.64), 30 hops max, 40 byte packets � • 1 FastEthernet6-0.civ-service1.Canberra.telstra.net (203.50.1.65) 0.236 ms 0.176 ms 0.243 ms � • 2 GigabitEthernet3-0.civ-core2.Canberra.telstra.net (203.50.10.129) 0.762 ms 0.814 ms 0.776 ms � • 3 GigabitEthernet2-2.dkn-core1.Canberra.telstra.net (203.50.6.126) 1.052 ms 1.008 ms 0.942 ms � • 4 Pos4-1.ken-core4.Sydney.telstra.net (203.50.6.69) 4.983 ms 4.953 ms 5.036 ms � • 5 10GigabitEthernet3-0.pad-core4.Sydney.telstra.net (203.50.6.86) 5.31 ms 5.281 ms 5.2 ms � • 6 GigabitEthernet2-2.syd-core02.Sydney.net.reach.com (203.50.13.42) 26.281 ms 5.318 ms 5.322 ms � • 7 i-4-0.syd-core01.net.reach.com (202.84.221.89) 5.475 ms 5.456 ms 5.528 ms � • 8 i-12-1.wil-core02.net.reach.com (202.84.144.65) 162.252 ms 162.236 ms 162.178 ms � • 9 i-6-2.wil04.net.reach.com (202.84.251.186) 162.542 ms 162.561 ms 162.509 ms � • 10 lax-brdr-01.inet.qwest.net (205.171.4.53) 162.866 ms 162.401 ms 162.305 ms � • 11 lax-core-02.inet.qwest.net (205.171.19.41) 162.745 ms 162.563 ms 162.469 ms � • 12 bur-core-01.inet.qwest.net (205.171.8.42) 168.971 ms 168.827 ms 169.185 ms � • 13 dia-core-03.inet.qwest.net (205.171.8.118) 204.15 ms 204.166 ms 203.956 ms � • 14 dvr-edge-09.inet.qwest.net (205.171.10.70) 204.313 ms 204.007 ms 204.078 ms � • 15 65.121.56.106 (65.121.56.106) 204.027 ms 203.851 ms 203.971 ms � • 16 peer01.ari-co.icg.net (170.147.161.87) 204.062 ms 204.299 ms 204.243 ms � • 17 165.236.232.190 (165.236.232.190) 205.499 ms 205.336 ms 205.43 ms � • 18 csu-frgp-gw.colostate.edu (129.82.10.5) 206.788 ms 206.451 ms 207.029 ms � • 19 129.82.2.10 (129.82.2.10) 207.259 ms 206.967 ms 207.849 ms � • 20 yuma.acns.colostate.edu (129.82.100.64) 206.985 ms 206.941 ms 207.193 ms �

  5. Path Vector Routing and Loops 1. Link(D,X) fails => Path(D,X)=none Path(X)=C,D,X Path(X)=B,C,D,X 2. Update Next(C,X)=D Next(B,X)=C Path(A,X)=A,B,C,D,X B C arrives at D => Path(D,X)=? Claim D will ignore this path … why?? D A X Path(X)=D,X Path(X)=A,B,C,D,X Next(D,X)=X Next(A,X)=B

  6. Internet Routing Loops

  7. Prevalence and Persistence • Prevalence: how likely is it you will encounter a route? • Persistence: how long will the route last? • Very different metrics – Can be prevalent, but not persistent – Why is persistence important? – Why is prevalence important?

  8. Internet Route Persistence

  9. [Pax96] Conclusions • Important to measure the actual system behavior. • Some conclusions as of 1996.. – Routing pathologies are emerging as a challenge for the growing Internet. – Internet routes are heavily dominated by a prevalent route. – But wide variation in persistence – About 2/3 of paths persisted for days or weeks. • Next we consider how well BGP responds to changes in policy and topology … .

  10. Internet Routing Instability • Objective: – Analyze BGP updates and identify BGP routing behaviors and pathologies • Approach: – Log BGP updates collected from peering point. • Contributions: – Identification of routing pathologies. – Identification of routing convergence problems

  11. Exchange Points • Public Exchange Points – Network and physical location for connecting BGP routers from different Autonomous Systems. – Not all routers peer with each other (polices) UUNet Regional AS1 Verio Regional AS2 Sprint Regional AS3 Regional AS4 AT&T Regional AS5 Monitoring Point

  12. Multi-Homing and BGP (1/2) 10.0.0.0/9 AS1 10.0.0.0/8 10.128.0.0/10 Path=AS4,{AS1,AS2,AS3} AS2 AS4 10.192.0.0/10 10.192.0.0/10 Path=AS4,AS5 AS5 All traffic to 10.192.0.0/10 AS3 will follow link AS5-AS3 (/10 more specific than /8)

  13. Multi-Homing and BGP (2/2) 10.0.0.0/9 10.0.0.0/8 Path=AS4,{AS1,AS2,AS3} AS1 10.192.0.0/10 10.128.0.0/10 Path AS4, AS3 AS2 AS4 10.192.0.0/10 10.192.0.0/10 Path=AS4,AS5 AS5 Traffic to 10.192.0.0/10 AS3 Split between link AS5-AS3 and AS4-AS3

  14. Types of Routing Events • Forwarding Instability (change of path) – WADiff = withdraw, announce different – AADiff = implicit withdrawal by replacing with different route. • Possible Pathologies – WADup = Withdraw then reannounce – AADup = Implicit withdraw by replacing with new route that has same AS Path and same next route • Pathologies – WWDup = withdraw already withdrawn route

  15. Gross Observations • Internet Stats in 1997 – 45,000 prefixes (BGP destinations) – 1300 Autonomous Systems – 1500 AS paths observed in updates • BGP Updates – Average of 125 updates per prefix per day – Over 30 million updates on one day – Surprise since BGP should only send update if path changes – Dominated by WWDup, AADup, and WADup

  16. MAE-East Gross Observations Duplicate Withdrawals not shown

  17. Duplicate Withdrawals • Dominate the BGP Update traffic – 500,000 to 6 million duplicate withdrawals per day at MAE-East • ISP I Example – 259 prefixes announces – 2.4 million withdrawals – Withdrawals for 14,112 prefixes – Withdrawals for nearly 14K prefixes never announced in the first place. • Partial Explanation: Stateless BGP – Don ’ t keep track of what you advertised – Thus propagate any withdraw to all neighbors • Even if you never announced the route in the first place

  18. Some Duplicate Explanations • Stateless BGP – Don ’ t keep track of what you advertised – Thus propagate any withdraw to all neighbors • Even if you never announced the route in the first place • BGP MRAI Timer – MRAI Timer: Advertise only once every 30 seconds – Time 0: update P1 sent – Time 10: P1 changes to P2 but annoucment delayed due to MRAI – Time 20: P2 changes back to P1 so delayed update modified to list P1 – Time 30: update listing P1 sent • Self-Synchronization (recall earlier paper) – Need to add jitter to MRAI timer

  19. Routing Instability Major upgrade in end of May. More instability at 10am

  20. No Dominant Problem AS No Single AS dominates instability. Not a correlation between size and portion of instability. Instability is evenly distributed across routers

  21. Yet The Internet Mostly Works

  22. Conclusions • Very high percentage of pathological updates. • No dominant AS responsible for problems. • Lots more current work on BGP measurements – Need to understand the current system – Reminder that systems don ’ t behave as expected – Fix current problems to keep network running – Draw lessons for future protocol designs

Recommend


More recommend