Internet Routing Inefficient • BGP is designed for scalability, sacrificing performance Resilient Overlay Networks • Link outages common, but routing tables take minutes to update CS294-4 Presentation • Summarized data creates inefficient paths Nikita Borisov • No response to congestion Sep 15, 2003 Network Redundancies Network Redundancies • Multiple paths exist between most hosts – Many are not advertised due to private peering • Link outages lead to non-transitive reachability – A and C can’t reach each other but B can reach them both • Indirect paths often offer better performance – (though possibly violate AUPs)
RON goals Overlay Network • Fast failure detection and recovery • Small network - 3-50 nodes – Seconds, not minutes • Continuous measurement of each pairwise link • Integration with application – Optimize routes for latency, throughput, etc. • Connectivity/performance stats distributed globally • Fine-grained policy specification • Pick best path out of direct and indirect – E.g. keep commercial traffic off Internet2 ones – Restrict search to one indirect hop Failure Detection Performance Metrics • Active monitoring • Estimate latency based on RTT of probes – Send probes on each virtual link – Moving weighted average – One probe every 14s – Assume latency is symmetric – Fast timeout probes if one is lost • Estimate loss rate based on probes received • Detect failure in under 20s – Average of last 100 samples • Estimate TCP throughput – Faster than any TCP timeout – Good enough for even human scale – Model TCP performance based on latency and loss rate
Path Selection Routing Policy • Policies specify which virtual links to use • Always route around outages • Application can optimize for latency, loss rate, • Separate routing tables per policy throughput • Packets classified with policy tag and – Throughput hard to optimize routed accordingly – Avoid bad-throughput routes instead • Sample policy: exclusive clique – Exhaustively search all one-hop paths – Only members of clique can use links between • Introduce hysteresis to prevent “route flapping” each other – E.g. Internet2 hosts Measurements Performance Problems • RON worse in some cases • Two studies (RON 1 and RON 2 ) – Measurement inaccuracies • RON recovers from 100% (RON 1 ) or 60% – Information propagation delays (RON 2 ) outages and high loss rates – Hysteresis • Routes around bad throughput failures • But … – Doubles TCP throughput in 5% of all samples – RON win in most cases • Reduces loss rate by 0.05 in 5% of samples – RON loss never very large – RON win, though, can be dramatic
Overhead Question • Probing traffic - grows O(N) • Is this overhead excessive? • Routing state traffic - grows O(N 2 ) – Less than 10% of a broadband link • What if RONs become more popular? • Total BW consumed • Is using a RON “cheating”? – 2.2Kbps with 10 nodes – 33Kbps with 50 nodes • A limiting factor for scaling Applications Discussion • Videoconferencing • Cooperating ISPs • Branch offices of companies • Others?
Recommend
More recommend