SMORE: Semi-Oblivious Tra ffi c Engineering Praveen Kumar * Yang Yuan* Chris Yu ‡ Nate Foster* Robert Kleinberg* Petr Lapukhov # Chiun Lin Lim # Robert Soulé § * Cornell ‡ CMU # Facebook § USI Lugano
WAN Tra ffj c Engineering
WAN Tra ffj c Engineering Objectives Challenges Gbps Performance Robustness Operational simplicity Latency
WAN Tra ffj c Engineering Objectives Challenges Unstructured Heterogeneous topology capacity Gbps Performance Robustness Misprediction Unexpected & Tra ffj c Bursts failures Device Update limitations overheads Operational simplicity Latency
TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 1 1 1 100
TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 1 1 1 100
TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 1 100 1 1 100
TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 1 100 1 1 100
TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 Optimal TE? 1 100 (MCF) 1 1 100
Operational Cost of Optimality Solver Time Time (seconds) Tra ffj c Matrix
Operational Cost of Optimality Path Churn Churn (# paths) Tra ffj c Matrix
Towards a Practical Model Topology (+ demands) Path 1 Selection Paths Rate Demands 2 Adaptation Splitting Ratio
Towards a Practical Model Topology Computing (+ demands) and updating paths is typically Path 1 expensive and Selection slow. Paths But updating Rate Demands 2 splitting ratios is Adaptation cheap and fast! Splitting Ratio
Towards a Practical Model Topology Computing (+ demands) and updating paths is typically c Path i t a 1 expensive and t S Selection slow. Paths But updating Rate Demands c 2 splitting ratios is i m Adaptation a cheap and fast! n y D Splitting Ratio
Path Selection Challenges • Selecting a good set of paths is tricky! • Route the demands (ideally, with competitive latency ) • React to changes in demands (diurnal changes, tra ffj c bursts, etc.) • Be robust under mis-prediction of demands • Have su ffj cient extra capacity to route demands in presence of failures • and more …
Approach A static set of cleverly-constructed paths can provide near-optimal performance and robustness! Desired path properties: • Low stretch for minimizing latency • High diversity for ensuring robustness { • Capacity aware • Good load balancing for performance • Globally optimized
Path Properties: Capacity Aware A D • Traditional approaches to routing based on shortest paths (e.g., B G E ECMP, KSP) are generally not capacity aware C F 100 Gbps 10 Gbps
Path Properties: Capacity Aware A A D • Traditional approaches to routing based on shortest paths (e.g., B B G E ECMP, KSP) are generally not capacity aware C F C 100 Gbps 10 Gbps
Path Properties: Capacity Aware A A D • Traditional approaches to routing based on shortest paths (e.g., ❌ B B G E ECMP, KSP) are generally not capacity aware C F C 100 Gbps 10 Gbps
Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A D B G E C F CSPF Globally optimal
Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A A D B G E C F CSPF Globally optimal
Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A A D B B G E C F CSPF Globally optimal
Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A A D B B G E C F C CSPF Globally optimal
Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A A D A A D B B G B B G E E C F C F C C CSPF Globally optimal
Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize
Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize
Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize
Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize
Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize
Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize
Oblivious Routing
VLB Mesh • Route through random intermediate node 2 1 • Works well for mesh topologies 3 N • WANs are not mesh-like • Good resilience … 4 • Poor performance & latency
VLB Mesh • Route through random intermediate node 2 1 • Works well for mesh topologies 3 N • WANs are not mesh-like • Good resilience … 4 • Poor performance & latency
VLB Not Mesh • Route through random intermediate node • Works well for mesh topologies • WANs are not mesh-like • Good resilience • Poor performance & latency
VLB Not Mesh • Route through random intermediate node • Works well for mesh topologies • WANs are not mesh-like • Good resilience • Poor performance & latency
Oblivious [Räcke ‘08] Not Mesh • Generalizes VLB to non-mesh • Distribution over routing trees • Approximation algorithm for low-stretch trees [FRT ’04] • Penalize links based on usage Probability • O(log n) competitive Low-stretch routing trees
Oblivious [Räcke ‘08] Not Mesh • Generalizes VLB to non-mesh • Distribution over routing trees • Approximation algorithm for low-stretch trees [FRT ’04] • Penalize links based on usage Probability • O(log n) competitive Low-stretch routing trees
Oblivious [Räcke ‘08] Not Mesh • Generalizes VLB to non-mesh • Distribution over routing trees • Approximation algorithm for low-stretch trees [FRT ’04] • Penalize links based on usage Probability • O(log n) competitive Low-stretch routing trees
Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized SPF / ECMP ❌ ❌ ❌ ✔ CSPF ❌ ❌ ✔ ✔ k-shortest paths ❌ ❌ ? ✔ Edge-disjoint KSP ❌ ❌ ✔ ✔ MCF ❌ ❌ ✔ ✔ VLB ❌ ❌ ❌ ✔ B4 ❌ ✔ ✔ ? SMORE / Oblivious ✔ ✔ ✔ ✔
SMORE: Semi-Oblivious Routing Oblivious Routing computes a set of paths Path which are low-stretch, robust and have Selection good load balancing properties LP Optimizer balances load by dynamically Rate adjusting splitting ratios used to map Adaptation incoming tra ffj c fm ows to paths Semi-Oblivious Tra ffj c Engineering: The Road Not Taken [NSDI ’18]
Semi-Oblivious Routing in Practice? • ▼ Previous work [Hajiaghayi et al.] established a worst-case competitive ratio that is not much better than oblivious routing: Ω (log(n)/log (log(n))) • � But the real-world does not typically exhibit worst-case scenarios • � Implicit correlation between demands and link capacities Question: How well does semi-oblivious routing perform in practice?
Evaluation
Facebook’s Backbone Network YA T ES YATES: Rapid Prototyping for Tra ffi c Engineering Systems [SOSR ’18] Source: https://research.fb.com/robust-and-e ffi cient-tra ffi c-engineering-with-oblivious-routing/
Performance Throughput Congestion Drop Max. Link Utilization Metric Time
Performance Throughput Congestion Drop Max. Link Utilization Metric Time
Robustness Throughput Congestion Drop Max. Link Utilization Failure Drop Metric Time
Recommend
More recommend