pinpointing delay and forwarding anomalies using large
play

Pinpointing Delay and Forwarding Anomalies Using Large-Scale - PowerPoint PPT Presentation

Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements Romain Fontugne 1 , Emile Aben 2 , Cristel Pelsser 3 , Randy Bush 1 November 1, 2017 1 IIJ Research Lab, 2 RIPE NCC, 3 University of Strasbourg / CNRS 1 / 25


  1. Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements Romain Fontugne 1 , Emile Aben 2 , Cristel Pelsser 3 , Randy Bush 1 November 1, 2017 1 IIJ Research Lab, 2 RIPE NCC, 3 University of Strasbourg / CNRS 1 / 25

  2. Understanding Internet health? 2 / 25

  3. Understanding Internet health? 2 / 25

  4. Understanding Internet health? 2 / 25

  5. Understanding Internet health? (Problems) Manual observations • Traceroute / Ping / Operators’ group mailing lists • Slow process • Small visibility 3 / 25

  6. Understanding Internet health? (Problems) Manual observations • Traceroute / Ping / Operators’ group mailing lists • Slow process • Small visibility → Our goal: Systematically pinpoint network disruptions • Delay changes • Forwarding anomalies (not covered here, see the paper) 3 / 25

  7. Silly solution: frequent traceroutes to the whole Internet! → Doesn’t scale → Overload the network 4 / 25

  8. Better solution: mine results from deployed platforms → Cooperative and distributed approach → Using existing data, no added burden to the network 5 / 25

  9. RIPE Atlas Actively measures Internet connectivity • Multiple types of measurement: ping, traceroute , DNS, SSL, NTP and HTTP • 10 000 active probes! • Data for numerous measurements is made publicly available 6 / 25

  10. RIPE Atlas: traceroutes Two repetitive large-scale measurements • Builtin : traceroute every 30 minutes to all DNS root servers ( ≈ 500 server instances) • Anchoring : traceroute every 15 minutes to 189 collaborative servers Analyzed dataset • May to December 2015 • 2.8 billion IPv4 traceroutes • 1.2 billion IPv6 traceroutes 7 / 25

  11. Monitor delays with traceroute? Traceroutes from CZ to BD 300 250 200 RTT (ms) 150 100 50 Challenges: 0 0 6 8 10 2 4 12 Number of hops • Noisy data • Traffic asymmetry • Packet loss 8 / 25

  12. Monitor delays with traceroute? Traceroute to “www.target.com” Round Trip Time (RTT) between B and C? Report abnormal RTT between B and C? 9 / 25

  13. What is the RTT between B and C? ? Differential RTT : ∆ CB = RTT C - RTT B = RTT CB 10 / 25

  14. What is the RTT between B and C? RTT C - RTT B = RTT CB ? • No! • Traffic is asymmetric • RTT B and RTT C take different return paths! 11 / 25

  15. What is the RTT between B and C? RTT C - RTT B = RTT CB ? • No! • Traffic is asymmetric • RTT B and RTT C take different return paths! • Differential RTT : ∆ CB = RTT C − RTT B = d BC + e p 11 / 25

  16. Problem with differential RTT Monitoring ∆ CB over time: 30 20 ∆ RTT 10 0 Time → Delay change on BC? CD? DA? BA??? 12 / 25

  17. Proposed Approach: Use probes with different return paths Differential RTT: ∆ CB = x 0 13 / 25

  18. Proposed Approach: Use probes with different return paths Differential RTT: ∆ CB = { x 0 , x 1 } 13 / 25

  19. Proposed Approach: Use probes with different return paths Differential RTT: ∆ CB = { x 0 , x 1 , x 2 , x 3 , x 4 } 13 / 25

  20. Proposed Approach: Use probes with different return paths Differential RTT: ∆ CB = { x 0 , x 1 , x 2 , x 3 , x 4 } Median ∆ CB : • Stable if a few return paths delay change • Fluctuate if delay on BC changes 13 / 25

  21. Median Diff. RTT: Example Tier1 link, 2 weeks of data, 95 probes: 130.117.0.250 (Cogent, Zurich) - 154.54.38.50 (Cogent, Munich) 400 Differential RTT (ms) 300 Raw values 200 100 0 −100 −200 −300 −400 5.6 Differential RTT (ms) 5.4 5.2 5.0 Median Diff. RTT Normal Reference 4.8 5 5 5 5 5 5 5 1 1 1 1 1 1 1 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 4 6 8 0 2 4 0 0 0 0 1 1 1 n n n n n n n u u u u u u u J J J J J J J • Stable despite noisy RTTs • Conf. interval: Wilson score • Normally distributed • Normal ref.: exp. smooth. 14 / 25

  22. Detecting Delay Changes 72.52.92.14 (HE, Frankfurt) - 80.81.192.154 (DE-CIX (RIPE)) 30 Differential RTT (ms) 25 Median Diff. RTT 20 Normal Reference 15 Detected Anomalies 10 5 0 −5 −10 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D Significant RTT changes: Confidence interval not overlapping with the normal reference 15 / 25

  23. Results Analyzed dataset • Atlas builtin / anchoring measurements • From May to Dec. 2015 • Observed 262k IPv4 and 42k IPv6 links We found a lot of delay changes! Let’s see only two prominent examples 16 / 25

  24. Case study: DDoS on DNS root servers Two attacks: • Nov. 30th 2015 • Dec. 1st 2015 Almost all server are anycast • Congestion at the 531 sites? • Found 129 instances altered by the attacks 17 / 25

  25. Observed delay changes 193.0.14.129 (K -root) - 74.208.6.124 (1&1, Kansas City) 12 Differential RTT (ms) 10 Median Diff. RTT 8 Normal Reference 6 4 Detected Anomalies 2 0 −2 −4 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D • Certain servers are 72.52.92.14 (HE, Frankfurt) - 80.81.192.154 (DE-CIX (RIPE)) 30 Differential RTT (ms) 25 Median Diff. RTT affected only by one 20 Normal Reference 15 10 Detected Anomalies attack 5 0 −5 −10 • Continuous attack in 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D Russia 188.93.16.77 (Selectel, St. Petersburg) - 95.213.189.0 (Selectel, Moscow) 10 Differential RTT (ms) 9 8 7 6 5 4 3 2 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D 18 / 25

  26. Unaffected root servers 193.0.14.129 (K -root) - 212.191.229.90 (Poznan, PL) 0.14 Differential RTT (ms) 0.12 0.10 0.08 0.06 0.04 Median Diff. RTT 0.02 0.00 Normal Reference −0.02 −0.04 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D Very stable delay during the attacks • Thanks to anycast! • Far from the attackers 19 / 25

  27. Congested links for servers F, I, and K → Concentration of malicious traffic at IXPs 20 / 25

  28. Case study: Telekom Malaysia BGP leak 21 / 25

  29. Case study: Telekom Malaysia BGP leak 22 / 25

  30. Case study: Telekom Malaysia BGP leak 22 / 25

  31. Case study: Telekom Malaysia BGP leak 22 / 25

  32. Case study: Telekom Malaysia BGP leak Not only with Google... but about 170k prefixes! 22 / 25

  33. Congestion in Level3 Rerouted traffic has congested Level3 (120 reported links) • Example: 229ms increase between two routers in London! 67.16.133.130 - 67.17.106.150 350 Differential RTT (ms) 300 Median Diff. RTT 250 Normal Reference 200 Detected Anomalies 150 100 50 0 −50 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 8 9 0 1 2 3 0 0 1 1 1 1 n n n n n n u u u u u u J J J J J J 23 / 25

  34. Congestion in Level3 Reported links in London: Delay increase Delay & packet loss → Traffic staying within UK/Europe may also be altered 24 / 25

  35. Summary Detect and locate delay and forwarding anomalies in billions of traceroutes • Non-parametric and robust statistics • Diverse root causes: remote attacks, routing anomalies, etc... • Give a lot of new insights on reported events Online detection for network operators • http://ihr.iijlab.net/ 25 / 25

Recommend


More recommend