monitoring the nz internet with amp
play

Monitoring the NZ Internet with AMP WAND Update NZNOG 2014 The - PowerPoint PPT Presentation

Monitoring the NZ Internet with AMP WAND Update NZNOG 2014 The Active Measurement Project Monitor machines situated throughout the NZ Internet Participating NZ ISPs Universities Alongside name servers Monitors continually


  1. Monitoring the NZ Internet with AMP WAND Update NZNOG 2014

  2. The Active Measurement Project Monitor machines situated throughout the NZ Internet • Participating NZ ISPs • Universities • Alongside name servers • Monitors continually perform a scheduled set of measurements (tests) • Target other AMP monitors and sites of interest • Test frequency • Low impact tests: at least once a minute • High impact tests: multiple minutes between tests • Results give a view of network performance between sites over time •

  3. The Active Measurement Project We’ve been running AMP for 10 years • You may remember it from previous NZNOGs • Current MBIE-funded project • NZ Internet is critical infrastructure • Requires constant monitoring • Rewriting AMP software from ground up to better serve this purpose • Apply lessons learned from the earlier deployment • Combine with our existing work in anomaly detection • Find network events, report on them so they can be resolved asap •

  4. Measuring the Internet The modern Internet is very dynamic • Some services move around in ways that most people can’t predict • Physical or logical location can affect how services treat you • All tests make extensive use of DNS • Important to test to the addresses that users will actually hit • Resolve addresses every time the test is run • Often there are many of these, test to them all! • We are *starting* to see IPv6 deployment • A large number of sites don’t do IPv6, but we are ready for them • The IPv6 path is often poor compared to the IPv4 one •

  5. Current AMP Tests ICMP Ping • Latency and loss from monitor to target • Traceroute • Route and path lengths from monitor to target • DNS • Response time for queries to target DNS server • HTTP • Performance when fetching all elements of a webpage • Pipelining, multiple/parallel connections, caching, etc • “ User experience ” •

  6. Upcoming AMP Tests Throughput • How much data can we push between monitors • TCP Ping • Many sites firewall ICMP, so need an alternative • High-rate Ping and UDP Packet Streams • Network jitter and reordering • Loss characterisation •

  7. Data Collection Developed system for persistent storage of network measurement data • Backed by a postgresql database • Don’t aggregate or discard any measurements (unlike RRD) • Always get full detail, even if you go back a year • Disk space is cheap, so storage isn’t a major issue • Query speed is the big obstacle • Flexible design • Easily extended to store data collected by other measurement tools • Smokeping, Munin, Cacti, etc. • Other WAND measurement projects •

  8. Visualisation Revamped AMP graphs • Interactive rather than static graphs • Easier on the eyes • Matrix • Condensed design to support large meshes • Show IPv4 and IPv6 results at the same time • Graph styles • Grouping measurements to create Smokeping-style graphs • Rainbow graphs to visualise traceroute paths •

  9. Matrix

  10. Event Detection Automate finding interesting changes in network behaviour • Increases in latency, taking the scenic route, traffic bursts / plunges • Needs to happen in close to real-time • Inform the network operator before the phones start ringing • Minimal false positive rate • Don’t want to be crying wolf too often • Grade events based on their significance and alert accordingly • Send text to operator if event is very urgent • Send email if less urgent •

  11. Event Detection Network measurements are essentially time series data • Plenty of techniques for finding anomalies in time series • Not many of these techniques work well in real-time • Trade-off between accuracy and timeliness • No one technique to rule them all • Detecting spikes vs detecting plunges • Noisy data vs consistent data • Tendency towards false positives •

  12. Event Detection - Fusion Our approach: data fusion • Implement any potentially useful detector • Combine results and infer likelihood of an event • If many detectors fire around the same time, it’s probably an event • Less reliable detectors tend to fire first -- early warning • More reliable detectors fire later -- confirmation • Exploring different data fusion techniques • Dempster-Shafer, Bayes, Fuzzy Logic and others • Current Masters project •

  13. Event Detection - Techniques Heuristic methods • Mode, Loss, Path Change, Plunge, Status Change • Threshold methods • Plateau, Arima-Shewhart • Variance methods • Jitter variance, T-Entropy • Probabilistic methods • Changepoint, Hidden Markov Model •

  14. Event Detection - Ongoing Work Group events based on common properties • If a site goes down, don’t report separate events for each monitor • Ranking events based on severity • Combine magnitude and likelihood of being significant • Develop system for alerting operators when major events occur • User-configurable • Learn from operator feedback • Continue to extend and improve library of detection methods • Especially for metrics other than latency •

  15. Interesting Observations Unusual things happen all the time • Networking is hard • Event detection is really good for finding interesting behaviours • Very helpful in producing the following slides :) •

  16. Traceroute graph showing alternative paths

  17. Latency graphs with “smoke” and loss colouring

  18. Google moves around a lot! 30ms to >150ms

  19. Youtube can be far away (especially on v6)

  20. F-root redeployment at APE went well

  21. Well, unless you’re these guys...

  22. Or these guys...

  23. Or even us at Waikato...

  24. Trademe likes to swap datacentres pretty regularly

  25. Traceroute graphs can make pretty patterns

  26. Hosting an AMP monitor http://wand.net.nz/amp/request/monitor

  27. Feedback Public website is at http://amp.wand.net.nz • Send us any comments, suggestions, bug reports • amp@wand.net.nz • Content providers • Put yourself forward as a test target • http://wand.net.nz/amp/request/target •

Recommend


More recommend