best path vs multi path overlay routing
play

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) - PowerPoint PPT Presentation

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) Alex C. Snoeren (UCSD) Hari Balakrishnan (MIT) October 2003 http://nms.lcs.mit.edu/ron/ Overview Best-path vs. redundant overlay routing What tactics work best to Reduce


  1. Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) Alex C. Snoeren (UCSD) Hari Balakrishnan (MIT) October 2003 http://nms.lcs.mit.edu/ron/

  2. Overview Best-path vs. redundant overlay routing • What tactics work best to – Reduce loss? – Reduce latency? – Avoid outages? • In what circumstances do they perform best? • Implications for new strategies

  3. Context: Reliability via Path Diversity �� �� �� �� �� �� • Backup links provide alternatives ➔ Mechanisms for obtaining diversity (existing diversity) ➔ Mechanisms for using diversity (overlay techniques)

  4. Obtaining Diversity Engineered diversity: �� �� �� �� �� �� Exploiting existing diversity: �� �� �� �� �� ��

  5. Existing AS-level Redundancy • Traceroute between 12 hosts, showing Autonomous Systems (AS’s) AS5650 AS3 AS1239 AS5050 AS6521 AS13649 MIT Sightpath Aros CCI MA−Cable AS9 AS1742 AS1785 AS7015 AS701 AS210 AS6114 CMU Utah NYU AS226 AS7922 AS7018 AS702 UTREP AS1103 AS6453 AMNAP AS3967 VU−NL AS7280 AS145 AS1200 AS8297 AS3356 CA−T1 AS3756 AS9057 AS8709 AS13790 Abilene vBNS AS26 AS1 AS3561 AS1790 NYSERNet Cornell Known private peering AS209

  6. Exploiting Diversity via overlays �� �� �� �� �� �� �� �� �� �� �� �� • Send packets through cooperating peers • End-hosts only, no network support

  7. Exploiting Diversity via Overlays Reactive Routing Probes and Routing Updates �� �� • Probe paths �� �� �� �� • Route via best • RON (SOSP’01) �� �� �� �� �� �� Detour

  8. Exploiting Diversity via Overlays Probes and Routing Updates �� �� Reactive Routing �� �� �� �� • Probe paths • Route via best �� �� �� �� �� �� Redundant Routing �� �� �� �� �� �� • Parallel paths • No probing �� �� �� �� • Mesh routing �� �� (SOSP’01)

  9. Reactive vs. Redundant Routing 100% % Capacity used by data Probe/Redundant Traffic Capacity limit Data Traffic 0 Desired Loss Rate Improvement 0% 100% • Capacity limits probing and redundancy

  10. Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence

  11. Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit Reactive Redundant 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence

  12. Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit Reactive Redundant 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence • Overhead scaling: throughput vs. nodes

  13. 8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing

  14. 8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss

  15. 8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss Direct Rand 2pkts, Redundant routing, simplest

  16. 8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss Direct Rand 2pkts, Redundant routing, simplest Lat Loss 2pkts, Reactive + Redundant (Falls back to random)

  17. Probing on Internet Testbed Each node repeats: 1. Pick random node j 2. Pick one of the 8 routing types ( direct, loss, lat, etc. ) in round-robin order. Send to j . 3. Delay for random interval [0.6s - 1.2s] Probes are one-way, recorded at sender & receiver.

  18. Datasets From Internet Deployment Dataset Nodes Time Measurements RON wide 17 5 days 4.7M RON narrow 17 3 days 2.8M RON 2003 30 14 days 32.6M ✔ Variety of network types and bandwidths 5 int’l, 3 Cable/DSL, 7 universities... ✔ N 2 path scaling ∼ 900 paths

  19. One-way Loss Rates Are Low 1 0.9 0.8 fraction of paths 90% of paths under 1% loss rate 0.7 • Overall loss 0.6 0.5 0.42% 0.4 0.3 in 2003 0.2 2003 dataset 0.1 2002 dataset 0 0 1 2 3 4 5 6 7 average path−wide loss rate (%) • Includes quiescent periods • Outages still (painfully) apparent

  20. Duplication Reduces Overall Loss Type Loss % direct 0.42 direct direct 0.30 dd 10ms 0.27 dd 20ms 0.27

  21. Duplication Reduces Overall Loss Type Loss % direct 0.42 direct direct 0.30 dd 10ms 0.27 dd 20ms 0.27 Lat 0.43 Loss 0.33 Direct Rand 0.26 Lat Loss 0.23

  22. Loss Probabilities Sanity Check • 0.42% loss << [Paxson 94,95] (2.8%, 5%). • Unloaded paths vs. loaded by TCP transfer • Conditional loss probabilities are similar P ( lose P2 | lost P1 ) Study ∼ 50% Paxson TCP Bolot 8ms spacing 60% RON 2003 no spacing 72% RON 2003 20ms 65% RON 2003 direct rand 62%

  23. Latency Improvements 1 0.95 5% of connections exhibit large latency improvement Fraction of paths 0.9 0.85 Mean Latency lat loss 46.8 ms 0.8 lat 48.0 direct rand 51.7 0.75 direct 54.1 0.7 0 50 100 150 200 250 300 Latency (ms) Unlike loss, most latency from specific bad paths

  24. # High Loss Periods (1 hr, normalized) > 0% Type direct 1 (8817) direct direct 0.59 dd 20ms 0.43 Lat 1.2 ← Worse than naive duplication Loss 0.80 Direct Rand 0.44 for low loss situations Lat Loss 0.38

  25. # High Loss Periods (1 hr, normalized) > 0% > 30% Type direct 1 (8817) 1 (630) direct direct 0.59 0.93 dd 20ms 0.43 0.91 Lat 1.2 0.96 ← on par Loss 0.80 0.91 Direct Rand 0.44 0.92 Lat Loss 0.38 0.89

  26. # High Loss Periods (1 hr, normalized) > 0% > 30% > 60% Type direct 1 (8817) 1 (630) 1 (255) direct direct 0.59 0.93 0.98 dd 20ms 0.43 0.91 0.98 Lat 1.2 0.96 0.91 0.86 ★ Loss 0.80 0.91 0.92 ★ Direct Rand 0.44 0.92 0.84 ★ Lat Loss 0.38 0.89

  27. Measurement Summary ✔ Redundant beats reactive for low loss – “Meshing” beats controls during outages ✔ Reactive finds specific good paths – Latency improvements – Low loss paths ✘ No overlay technique near independent paths – Hypothesis: Access link failures – More severe outages harder to correct

  28. Why Not FEC? Redundant assumption: Fast recovery, low rate 0.42% loss rate → need little redundancy 1st packet lost Recovery X ...100 packets... Failure losses bursty ( ≥ 0 . 5 conditional loss) ✘ Spread FEC over even more packets ➔ Latency-critical traffic: 2-redundant mesh

  29. Conclusions • Loss rate for low-rate traffic low (0.42%) • Conditional loss probability high (0.72) even for random mesh (0.62) • 40-60% of loss avoidable ✔ Reundant: Avoiding low loss rates ✔ Reactive: Avoiding high loss, latency ➔ Low loss suggests selective approach ...

  30. Future Work Strategies for avoiding losses and outages: • Selective redundancy: Protecting SYNs, etc. (shameless plug: Currently implementing) • Selective probing: Activate on first loss Measurements: • Engineered network redundancy impact? (testing now, looking for multihomed sites) http://nms.lcs.mit.edu/ron/

  31. Scaling • Reactive: Scales with # nodes • Redundant: Scales with traffic volume

  32. Best Path Scaling Routing and probing add packets: Responsiveness vs. overhead vs. size 35000 Overhead 30000 Overhead (bits/second) 30 nodes 25000 13.3Kbps 10 nodes 20000 2.2Kbps 15000 10000 50 nodes 33Kbps 5000 0 0 5 10 15 20 25 30 35 40 45 50 Number of Nodes • 50 nodes near limit, enough for many apps.

  33. Best Path Routing �� �� �� �� �� �� �� �� �� �� �� �� Probes and Routing • Frequently measure all inter-node paths • Exchange routing information • Route along app-specific best path consistent with routing policy

  34. Probing and Outage Detection Node A Node B I n i t i a l P i n ID 5: time 10 g 1 e ID 5: time 33 s n o p s e R ID 5: time 15 R e s p o n s e 2 ID 5: time 39 Record "success" with RTT 5 Record "success" with RTT 6 • Probe every random(14) seconds • 3 packets, both sides get RTT and reachability • If “lost probe,” send next immediately Timeout based on RTT and RTT variance • If N lost probes, notify outage

  35. Architecture: Probing �� �� �� �� �� �� �� �� �� �� �� �� ➔ Probe between nodes, determine path qualities � N 2 � – O probe traffic with active probes – Passive measurements

Recommend


More recommend