smore semi oblivious tra ffi c engineering
play

SMORE: Semi-Oblivious Tra ffi c Engineering Praveen Kumar * Yang - PowerPoint PPT Presentation

SMORE: Semi-Oblivious Tra ffi c Engineering Praveen Kumar * Yang Yuan* Chris Yu Nate Foster* Robert Kleinberg* Petr Lapukhov # Chiun Lin Lim # Robert Soul * Cornell CMU # Facebook USI Lugano WAN Tra ffj c Engineering WAN Tra ffj


  1. SMORE: Semi-Oblivious Tra ffi c Engineering Praveen Kumar * Yang Yuan* Chris Yu ‡ Nate Foster* 
 Robert Kleinberg* Petr Lapukhov # Chiun Lin Lim # Robert Soulé § * Cornell ‡ CMU # Facebook § USI Lugano

  2. WAN Tra ffj c Engineering

  3. WAN Tra ffj c Engineering Objectives Challenges Gbps Performance Robustness Operational simplicity Latency

  4. WAN Tra ffj c Engineering Objectives Challenges Unstructured Heterogeneous topology capacity Gbps Performance Robustness Misprediction Unexpected & Tra ffj c Bursts failures Device Update limitations overheads Operational simplicity Latency

  5. TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 1 1 1 100

  6. TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 1 1 1 100

  7. TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 1 100 1 1 100

  8. TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 1 100 1 1 100

  9. TE Approaches SDN-Based Traditional Centralized Distributed 1 1 1 1 1 1 Optimal TE? 
 1 100 (MCF) 1 1 100

  10. Operational Cost of Optimality Solver Time Time (seconds) Tra ffj c Matrix

  11. Operational Cost of Optimality Path Churn Churn (# paths) Tra ffj c Matrix

  12. Towards a Practical Model Topology (+ demands) Path 1 Selection Paths Rate Demands 2 Adaptation Splitting Ratio

  13. Towards a Practical Model Topology Computing (+ demands) and updating paths is typically Path 1 expensive and Selection slow. Paths But updating Rate Demands 2 splitting ratios is Adaptation cheap and fast! Splitting Ratio

  14. Towards a Practical Model Topology Computing (+ demands) and updating paths is typically c Path i t a 1 expensive and t S Selection slow. Paths But updating Rate Demands c 2 splitting ratios is i m Adaptation a cheap and fast! n y D Splitting Ratio

  15. Path Selection Challenges • Selecting a good set of paths is tricky! • Route the demands (ideally, with competitive latency ) • React to changes in demands (diurnal changes, tra ffj c bursts, etc.) • Be robust under mis-prediction of demands • Have su ffj cient extra capacity to route demands in presence of failures • and more …

  16. Approach A static set of cleverly-constructed paths can provide near-optimal performance and robustness! Desired path properties: • Low stretch for minimizing latency • High diversity for ensuring robustness { • Capacity aware • Good load balancing for performance • Globally optimized

  17. Path Properties: Capacity Aware A D • Traditional approaches to routing based on shortest paths (e.g., B G E ECMP, KSP) are generally not capacity aware C F 100 Gbps 10 Gbps

  18. Path Properties: Capacity Aware A A D • Traditional approaches to routing based on shortest paths (e.g., B B G E ECMP, KSP) are generally not capacity aware C F C 100 Gbps 10 Gbps

  19. Path Properties: Capacity Aware A A D • Traditional approaches to routing based on shortest paths (e.g., ❌ B B G E ECMP, KSP) are generally not capacity aware C F C 100 Gbps 10 Gbps

  20. Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A D B G E C F CSPF Globally optimal

  21. Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A A D B G E C F CSPF Globally optimal

  22. Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A A D B B G E C F CSPF Globally optimal

  23. Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A A D B B G E C F C CSPF Globally optimal

  24. Path Properties: Globally Optimal Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal A A D A A D B B G B B G E E C F C F C C CSPF Globally optimal

  25. Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize

  26. Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize

  27. Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize

  28. Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize

  29. Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize

  30. Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized ❌ ❌ ❌ SPF / ECMP ✔ ❌ ❌ CSPF ✔ ✔ ❌ ❌ k-shortest paths ✔ ? ❌ ❌ Edge-disjoint KSP ✔ ✔ ❌ ❌ MCF ✔ ✔ ❌ ❌ ❌ VLB ✔ ❌ B4 ✔ ✔ ? ? - Di ffj cult to generalize

  31. Oblivious Routing

  32. VLB Mesh • Route through random intermediate node 2 1 • Works well for mesh topologies 3 N • WANs are not mesh-like • Good resilience … 4 • Poor performance & latency

  33. VLB Mesh • Route through random intermediate node 2 1 • Works well for mesh topologies 3 N • WANs are not mesh-like • Good resilience … 4 • Poor performance & latency

  34. VLB Not Mesh • Route through random intermediate node • Works well for mesh topologies • WANs are not mesh-like • Good resilience • Poor performance & latency

  35. VLB Not Mesh • Route through random intermediate node • Works well for mesh topologies • WANs are not mesh-like • Good resilience • Poor performance & latency

  36. Oblivious [Räcke ‘08] Not Mesh • Generalizes VLB to non-mesh • Distribution over routing trees • Approximation algorithm for low-stretch trees [FRT ’04] • Penalize links based on usage Probability • O(log n) competitive Low-stretch routing trees

  37. Oblivious [Räcke ‘08] Not Mesh • Generalizes VLB to non-mesh • Distribution over routing trees • Approximation algorithm for low-stretch trees [FRT ’04] • Penalize links based on usage Probability • O(log n) competitive Low-stretch routing trees

  38. Oblivious [Räcke ‘08] Not Mesh • Generalizes VLB to non-mesh • Distribution over routing trees • Approximation algorithm for low-stretch trees [FRT ’04] • Penalize links based on usage Probability • O(log n) competitive Low-stretch routing trees

  39. Path Selection Load balanced Algorithm Diverse Low-stretch Capacity Globally aware Optimized SPF / ECMP ❌ ❌ ❌ ✔ CSPF ❌ ❌ ✔ ✔ k-shortest paths ❌ ❌ ? ✔ Edge-disjoint KSP ❌ ❌ ✔ ✔ MCF ❌ ❌ ✔ ✔ VLB ❌ ❌ ❌ ✔ B4 ❌ ✔ ✔ ? SMORE / Oblivious ✔ ✔ ✔ ✔

  40. SMORE: Semi-Oblivious Routing Oblivious Routing computes a set of paths Path which are low-stretch, robust and have Selection good load balancing properties LP Optimizer balances load by dynamically Rate adjusting splitting ratios used to map Adaptation incoming tra ffj c fm ows to paths Semi-Oblivious Tra ffj c Engineering: The Road Not Taken [NSDI ’18]

  41. 
 
 Semi-Oblivious Routing in Practice? • ▼ Previous work [Hajiaghayi et al.] established a worst-case competitive ratio that is not much better than oblivious routing: Ω (log(n)/log (log(n))) • � But the real-world does not typically exhibit worst-case scenarios • � Implicit correlation between demands and link capacities 
 Question: How well does semi-oblivious routing perform in practice?

  42. Evaluation

  43. Facebook’s Backbone Network YA T ES YATES: Rapid Prototyping for Tra ffi c Engineering Systems [SOSR ’18] Source: https://research.fb.com/robust-and-e ffi cient-tra ffi c-engineering-with-oblivious-routing/

  44. Performance Throughput Congestion Drop Max. Link Utilization Metric Time

  45. Performance Throughput Congestion Drop Max. Link Utilization Metric Time

  46. Robustness Throughput Congestion Drop Max. Link Utilization Failure Drop Metric Time

Recommend


More recommend