the art of consistent sdn updates
play

The Art of Consistent SDN Updates Stefan Schmid Aalborg University - PowerPoint PPT Presentation

The Art of Consistent SDN Updates Stefan Schmid Aalborg University The Art of Consistent SDN Updates Stefan Schmid Aalborg University Smart students in Berlin & Wroclaw: Arne Ludwig, Jan Marcinkowski, Szymon Dudycz, Matthias Rost,


  1. Control Plane: Algorithms with a twist! ❏ Reduce latency and overhead : How to make control plane robust? Software transactional memory problem: What can be computed locally? network configuration = shared memory, ❏ Routing vs heavy-hitter detection? ❏ updates = transactions , but with a twist: LOCAL model! Insights apply: Ctrl verification vs optimization flows are uncontrolled, real-time transactions: do not abort! (And not only read!) ❏ SDN twist: pre-processing! Ctrl Ctrl Ctrl ❏ Hard in LOCAL: symmetry breaking ! But unlike ad-hoc networks : no need to discover network from scratch ❏ Careful: independent flow spaces does Topology events less frequent than flow related events not imply that controllers can ❏ If links fail : subgraph ! Find recomputed concurrently update without conflict: structures that are still useful in e.g., due to shared embedding ! subgraph (e.g., proof labelings) Atomic read-modify-write? ❏ Precomputation known to help for In-Band Synchronization for Distributed SDN Control Planes Liron Schiff, Petr Kuznetsov, and Stefan Schmid. relevant problems: load-balancing / ACM SIGCOMM Computer Communication Review ( CCR ), January matching 2016. HotSDN 2013

  2. Data Plane: Algorithms with a twist! ❏ Even in SDN: Keep some functionality in the data plane! ❏ E.g., for performance : OpenFlow local Ctrl fast failover: 1st line of defense ❏ SDN twist: data plane algorithms operate under simple conditions ❏ Failover tables are statically (proactively) preconfigured , w/o multiple faiures knowledge ❏ At runtime: local view only and header space is scarce resource ❏ W/ tagging: graph exploration ❏ W/o tagging: combinatorial problem ❏ Later: consolidate this with controller ! HotSDN 2014

  3. Data Plane: Algorithms with a twist! ❏ Even in SDN: Keep some With infinite header space ideal robustness possible. But what functionality in the data plane! about bounded header space? And ❏ E.g., for performance : OpenFlow local Ctrl resulting route lengths? fast failover: 1st line of defense Without good algorithms, routing may disconnect way before ❏ SDN twist: data plane algorithms physical network does! operate under simple conditions ❏ Failover tables are statically (proactively) preconfigured , w/o multiple faiures knowledge ❏ At runtime: local view only and header space is scarce resource ❏ W/ tagging: graph exploration ❏ W/o tagging: combinatorial problem ❏ Later: consolidate this with controller ! HotSDN 2014

  4. Data Plane: Algorithms with a twist! ❏ Even in SDN: Keep some With infinite header space ideal robustness possible. But what functionality in the data plane! about bounded header space? And ❏ E.g., for performance : OpenFlow local Ctrl resulting route lengths? fast failover: 1st line of defense Without good algorithms, routing may disconnect way before ❏ SDN twist: data plane algorithms physical network does! operate under simple conditions ❏ Failover tables are statically (proactively) preconfigured , w/o multiple faiures knowledge How (Not) to Shoot in Your Foot with SDN Local Fast Failover: A Load-Connectivity Tradeoff ❏ At runtime: local view only and header Michael Borokhovich and Stefan Schmid. space is scarce resource 17th International Conference on Principles of Distributed Systems ( OPODIS ), Nice, France, Springer LNCS, December 2013. ❏ W/ tagging: graph exploration Provable Data Plane Connectivity with Local Fast Failover: ❏ W/o tagging: combinatorial problem Introducing OpenFlow Graph Algorithms ❏ Michael Borokhovich, Liron Schiff, and Stefan Schmid. Later: consolidate this with controller ! ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking ( HotSDN ), Chicago, Illinois, USA, August 2014. HotSDN 2014

  5. Decoupling: Algorithms with a twist! ❏ Decoupling already challenging for a single switch! ? ? ? ❏ Network Hello World Ctrl application: MAC learning ❏ MAC learning has SDN twist : MAC learning SDN controller is decoupled: may miss response and keep flooding! ❏ Need to configure rules s.t. controller stays informed when necessary!

  6. Decoupling: Algorithms with a twist! ❏ In-band control: cheap but algorithmically challenging! Ctrl ❏ Distributed coordination algorithms to manage switches? ❏ Powerful fault-tolerance concept: self-stabilization Ctrl unmanaged! ❏ SDN twist : switches are simple ! ❏ Cannot actively participate in arbitrary self-stab spanning tree protocols ❏ Controller needs to install tree rules

  7. Decoupling: Algorithms with a twist! ❏ In-band control: cheap but algorithmically challenging! Ctrl ❏ Distributed coordination algorithms to manage switches? ❏ Powerful fault-tolerance concept: self-stabilization Ctrl unmanaged! ❏ SDN twist : switches are simple ! ❏ Cannot actively participate in arbitrary self-stab spanning tree protocols ❏ Controller needs to install tree rules Ground Control to Major Faults: Towards a Fault Tolerant and Adaptive SDN Control Network Liron Schiff, Stefan Schmid, and Marco Canini. IEEE/IFIP DSN Workshop on Dependability Issues on SDN and NFV ( DISN ), Toulouse, France, June 2016. DISN 2016

  8. Decoupling: Algorithms with a twist! ❏ Researchers proposed to Ctrl exploit SDN rule definition Ctrl Ctrl flexiblities to solve growing FIB size problem Ctrl ❏ OpenFlow-based IP router: caching and aggregation Ctrl ❏ Zipf law : many infrequent prefixes at controller ❏ Extremely distributed control  ❏ Online paging with SDN twist ❏ to ctrl Forwarding semantic: largest common prefix forwarding , i.e., dependencies: only offload root- contiguous set in trie ❏ Can do bypassing ICDCS 2014

  9. Decoupling: Algorithms with a twist! ❏ Researchers proposed to Ctrl exploit SDN rule definition Ctrl Ctrl flexiblities to solve growing FIB size problem Ctrl ❏ OpenFlow-based IP router: caching and aggregation Ctrl ❏ Zipf law : many infrequent prefixes at controller ❏ Extremely distributed control  ❏ Online paging with SDN twist ❏ to ctrl Competitive FIB Aggregation without Update Churn Forwarding semantic: largest Marcin Bienkowski, Nadi Sarrar, Stefan Schmid, and Steve Uhlig. common prefix forwarding , i.e., 34th International Conference on Distributed Computing Systems ( ICDCS ), Madrid, Spain, June 2014. dependencies: only offload root- contiguous set in trie Online Tree Caching Marcin Bienkowski, Jan Marcinkowski, Maciej Pacut, Stefan Schmid, ❏ Can do bypassing and Aleksandra Spyra. ArXiv Technical Report, February 2016. ICDCS 2014

  10. Interconnect: Algorithms with a twist! ❏ Another challenge: asynchronous Ctrl communication channel asynchronous He et al., ACM SOSR 2015: without network latency

  11. Interconnect: Algorithms with a twist! ❏ Another challenge: asynchronous Ctrl Not only because of network communication channel latency, but also data structures! asynchronous He et al., ACM SOSR 2015: without network latency

  12. What can possibly go wrong? Controller Platform untrusted trusted hosts hosts Invariant: Traffic from untrusted hosts to trusted hosts via firewall!

  13. What can possibly go wrong? Controller Platform asynchronous untrusted trusted hosts hosts Invariant: Traffic from untrusted hosts to trusted hosts via firewall!

  14. Example 1: Bypassed Waypoint Controller Platform insecure secure Internet zone

  15. Example 2: Transient Loop Controller Platform insecure secure Internet zone

  16. Tagging: A Universal Solution? ❏ Old route: red tag blue tag red ❏ New route: blue red red new route ❏ 2-Phase Update: blue ❏ Install blue flow old route rules internally blue ❏ Flip tag at ingress ports

  17. Tagging: A Universal Solution? Where to tag? Header space? Overhead? ❏ Old route: red tag blue tag red ❏ New route: blue red red new route ❏ 2-Phase Update: blue ❏ Install blue flow Cost of extra rules? old route rules internally blue ❏ Flip tag at ingress Time till new link ports becomes available?

  18. Alternative: Weaker Transient Consistency Idea: Packet may take a mix of old and new path, as long as weaker consistencies are fulfilled transiently, e.g. Loop-Freedom (LF) and Waypoint Enforcement (WPE). Schedule safe subsets in multiple rounds Round 1 Controller Platform Round 2 Controller Platform …

  19. The Spectrum of Consistency per-packet consistency Reitblatt et al., SIGCOMM 2012 weak, transient consistency correct network (loop-freedom, virtualization waypoint enforced) Ghorbani and Godfrey, HotSDN 2014 Mahajan and Wattenhofer, HotNets 2014 Ludwig et al., HotNets 2014 Strong Weak

  20. Going Back to Our Examples: LF Update? secure insecure zone Internet

  21. Going Back to Our Examples: LF Update! secure insecure zone Internet secure R1: insecure zone Internet secure insecure R2: zone Internet

  22. Going Back to Our Examples: LF Update! secure insecure zone Internet secure R1: insecure zone Internet LF ok! But: WPE violated in Round 1! secure insecure R2: zone Internet

  23. Going Back to Our Examples: WPE Update? secure insecure zone Internet

  24. Going Back to Our Examples: WPE Update! secure insecure zone Internet secure R1: insecure zone Internet secure insecure R2: zone Internet

  25. Going Back to Our Examples: WPE Update! secure insecure zone Internet secure R1: insecure zone Internet R2: … ok but may violate LF in Round 1! secure insecure zone Internet

  26. Going Back to Our Examples: Both WPE+LF? secure insecure zone Internet

  27. Going Back to Our Examples: WPE+LF! secure R1: insecure zone Internet secure insecure R2: zone Internet secure insecure R3: zone Internet

  28. Going Back to Our Examples: WPE+LF! secure R1: insecure zone Internet secure insecure R2: zone Internet R3: Is there always a WPE+LF schedule? secure insecure zone Internet

  29. What about this one?

  30. LF and WPE may conflict! ❏ Cannot update any forward edge in R1: WP ❏ Cannot update any backward edge in R1: LF No schedule exists!

  31. LF and WPE may conflict! ❏ Cannot update any forward edge in R1: WP ❏ Cannot update any backward edge in R1: LF Good Network Updates for Bad Packets: Waypoint Enforcement Beyond Destination-Based Routing Policies No schedule exists! Arne Ludwig, Matthias Rost, Damien Foucard, and Stefan Schmid. 13th ACM Workshop on Hot Topics in Networks ( HotNets ), Los Angeles, California, USA, October 2014.

  32. What about this one?

  33. What about this one? 1 ❏ Forward edge after the waypoint: safe! ❏ No loop, no WPE violation

  34. What about this one? 2 1 ❏ Now this backward is safe too! ❏ No loop because exit through 1

  35. What about this one? 3 2 1 ❏ Now this is safe: ready back to WP! 2 ❏ No waypoint violation

  36. What about this one? 3 4 2 4 1 ❏ Ok: loop-free and also not on the path (exit via ) 1

  37. What about this one? 3 4 2 4 1 ❏ Ok: loop-free and also not on the path (exit via ) 1

  38. What about this one? 3 4 5 2 4 1

  39. Back to the start: What if…. 1

  40. Back to the start: What if…. also this one?! 1 1

  41. Back to the start: What if…. also this one?! 1 1 ❏ Update any of the 2 backward edges? LF 

  42. Back to the start: What if…. also this one?! 1 1 ❏ Update any of the 2 backward edges? LF 

  43. Back to the start: What if…. also this one?! 1 1 ❏ Update any of the 2 backward edges? LF 

  44. Back to the start: What if…. also this one?! 1 1 ❏ Update any of the 2 backward edges? LF  ❏ Update any of the 2 other forward edges? WPE  ❏ What about a combination? Nope …

  45. Back to the start: What if…. also this one?! 1 1

  46. Back to the start: What if…. also this one?! 1 To update or not to update in the first round? 1 That is the question … … which leads to NP-hardness!

  47. Back to the start: What if…. also this one?! 1 To update or not to update in the first round? 1 That is the question … … which leads to NP-hardness! Transiently Secure Network Updates Arne Ludwig, Szymon Dudycz, Matthias Rost, and Stefan Schmid. 42nd ACM SIGMETRICS , Antibes Juan-les-Pins, France, June 2016.

  48. Let us focus on loop-freedom only: always possible in n rounds! (How?) But how to minimize rounds?

  49. Example: Optimal 2-Round Update Schedules

  50. Example: Optimal 2-Round Update Schedules Clear: in Round 1 (R1), I can only update „ forward “ links! What about last round? Observe: Update schedule read backward (i.e., updating from new to old policy ), must also be legal! I.e., in last round (R2), I can do all „ forward “ edges of old edges wrt to new ones! Symmetry !

  51. Optimal Algorithm for 2-Round Instances: Leveraging Symmetry! ❏ Classify nodes/edges with 2-letter code : ❏ F  , B  : Does (dashed) new edge point forward or backward wrt (solid) old path?

  52. Optimal Algorithm for 2-Round Instances: Leveraging Symmetry! ❏ Classify nodes/edges with 2-letter code : ❏ F  , B  : Does (dashed) new edge point forward or backward wrt (solid) F  F  F  B  B  B  old path?

  53. Optimal Algorithm for 2-Round Instances: Leveraging Symmetry! ❏ Classify nodes/edges with 2-letter code: Old policy from left to right! ❏ F  , B  : Does (dashed) new edge point forward or backward wrt (solid) F  F  F  B  B  B  old path?

  54. Optimal Algorithm for 2-Round Instances: Leveraging Symmetry! ❏ Classify nodes/edges with 2-letter code: Old policy from left to right! ❏ F  , B  : Does (dashed) new edge point forward or backward wrt (solid) F  F  F  B  B  B  old path? New policy from left to right!

  55. Optimal Algorithm for 2-Round Instances: Leveraging Symmetry! ❏ Classify nodes/edges with 2-letter code: Old policy from left to right! ❏ F  , B  : Does (dashed) new edge point forward or backward wrt (solid) F  F  F  B  B  B  old path? ❏  F,  B: Does the (solid) old edge point forward or backward wrt (dashed) new path? New policy from left to right!

  56. Optimal Algorithm for 2-Round Instances: Leveraging Symmetry! ❏ Classify nodes/edges with 2-letter code: Old policy from left to right! ❏ F  , B  : Does (dashed) new edge point forward or backward wrt (solid) F  F  F  B  B  B  old path? ❏  F,  B: Does the (solid) old edge point forward or backward wrt  B  B  F  B  F  F (dashed) new path? New policy from left to right!

  57. Optimal Algorithm for 2-Round Instances: Insight 1: In the 1st round, Leveraging Symmetry! I can safely update all ❏ Classify nodes/edges with 2-letter code: forwarding (F  ) edges! For sure loopfree. ❏ F  , B  : Does (dashed) new edge point forward or backward wrt (solid) F  F  F  B  B  B  old path? ❏  F,  B: Does the (solid) old edge point forward or backward wrt  B  B  F  B  F  F (dashed) new path?

  58. Optimal Algorithm for 2-Round Instances: Insight 1: In the 1st round, Leveraging Symmetry! I can safely update all ❏ Classify nodes/edges with 2-letter code: forwarding (F  ) edges! For sure loopfree. ❏ F  , B  : Does (dashed) new edge point forward Insight 2: Valid schedules or backward wrt (solid) F  F  F  B  B  B  are reversible! A valid old path? schedule from old to new read backward is a valid ❏  F,  B: Does the (solid) schedule for new to old! old edge point forward or backward wrt  B  B  F  B  F  F (dashed) new path?

  59. Optimal Algorithm for 2-Round Instances: Insight 1: In the 1st round, Leveraging Symmetry! I can safely update all ❏ Classify nodes/edges with 2-letter code: forwarding (F  ) edges! For sure loopfree. ❏ F  , B  : Does (dashed) new edge point forward Insight 2: Valid schedules or backward wrt (solid) F  F  F  B  B  B  are reversible! A valid old path? schedule from old to new read backward is a valid ❏  F,  B: Does the (solid) schedule for new to old! old edge point forward or backwart wrt Insight 3: Hence in the last  B  B  F  B  F  F (dashed) new path? round, I can safely update all forwarding (  F) edges! For sure loopfree.

  60. Optimal Algorithm for 2-Round Instances: Insight 1: In the 1st round, Leveraging Symmetry! I can safely update all ❏ Classify nodes/edges with 2-letter code: forwarding (F  ) edges! For sure loopfree. ❏ F  , B  : Does (dashed) new edge point forward Insight 2: Valid schedules or backward wrt (solid) 2-Round Schedule: If and only if are reversible! A valid old path? there are no BB edges! Then I can schedule from old to new update F  edges in first round read backward is a valid and  F edges in second round! ❏  F,  B: Does the (solid) schedule for new to old! old edge point forward or backwart wrt Insight 3: Hence in the last (dashed) new path? round, I can safely update all forwarding (  F) edges! For sure loopfree.

  61. Optimal Algorithm for 2-Round Instances: Insight 1: In the 1st round, Leveraging Symmetry! I can safely update all ❏ Classify nodes/edges with 2-letter code: forwarding (F  ) edges! For sure loopfree. ❏ F  , B  : Does (dashed) new edge point forward Insight 2: Valid schedules or backward wrt (solid) 2-Round Schedule: If and only if are reversible! A valid old path? there are no BB edges! Then I can schedule from old to new update F  edges in first round read backward is a valid and  F edges in second round! ❏  F,  B: Does the (solid) schedule for new to old! old edge point forward or backwart wrt Insight 3: Hence in the last That is, FB must be in (dashed) new path? round, I can safely update all forwarding (  F) edges! first round, BF must be in second round, and FF For sure loopfree. are flexible !

  62. Intuition Why 3 Rounds Are Hard ❏ Structure of a 3-round schedule: F  edges:  F edges: all edges: FF,FB FF,FB,BF,BB FF,BF Round 1 Round 3 Round 2 W.l.o.g., can do FB WLOG in R1 and BF in R3. FB BB BF Round 1 Round 2 Round 3 Boils ? ? FF down to:

  63. Intuition Why 3 Rounds Are Hard ❏ Structure of a 3-round schedule: Moving forward edges does not introduce loops, F  edges:  F edges: all edges: nor does making the FF,FB FF,FB,BF,BB FF,BF graph sparser. Round 1 Round 3 Round 2 W.l.o.g., can do FB WLOG in R1 and BF in R3. FB BB BF Round 1 Round 2 Round 3 Boils ? ? FF down to:

  64. Intuition Why 3 Rounds Are Hard A hard decision problem: when to update FF? BB ❏ We know: BB node v 6 can only be updated in R2 ❏ When to update FF nodes to make enable update BB in R2?

  65. Intuition Why 3 Rounds Are Hard A hard decision problem: when to update FF? Exit from loop BB ❏ We know: BB node v 6 can only be updated in R2 ❏ When to update FF nodes to make enable update BB in R2 ❏ E.g, updating FF-node v 4 in R1 allows to update BB v 6 in R2

  66. Intuition Why 3 Rounds Are Hard A hard decision problem: when to update FF? No exit from loop! BB ❏ We know: BB node v 6 can only be updated in R2 ❏ When to update FF nodes to make enable update BB in R2 ❏ E.g, updating FF-node v 4 in R1 allows to update BB v 6 in R2 ❏ But only if FF-node v 3 is not updated as well in R1: potential loop

  67. Intuition Why 3 Rounds Are Hard A hard decision problem: when to update FF? No exit from loop! BB ❏ We know: BB node v 6 can only be updated in R2 ❏ When to update FF nodes to make enable update BB in R2 ❏ E.g, updating FF-node v 4 in R1 allows to update BB v 6 in R2 ❏ But only if FF-node v 3 is not updated as well in R1: potential loop

  68. Intuition Why 3 Rounds Are Hard A hard decision problem: when to update FF? BB ❏ We know: BB node v 6 can only be updated in R2 ❏ When to update FF nodes to make enable update BB in R2 ❏ E.g, updating FF-node v 4 in R1 allows to update BB v 6 in R2 ❏ But only if FF-node v 3 is not updated as well in R1: potential loop ❏ Smells like a gadget: which FF nodes to update when is hard!

  69. Intuition Why 3 Rounds Are Hard A hard decision problem: when to update FF? BB Being greedy is bad! Don‘t update all FF! ❏ We know: BB node v 6 can only be updated in R2 ❏ When to update FF nodes to make enable update BB in R2 ❏ E.g, updating FF-node v 4 in R1 allows to update BB v 6 in R2 ❏ But only if FF-node v 3 is not updated as well in R1: potential loop ❏ Smells like a gadget: which FF nodes to update when is hard!

  70. Devil lies in details: original Intuition Why 3 Rounds Are Hard paths must also be valid! I.e., to prove that such a A hard decision problem: when to update FF? configuration can be reached. BB Being greedy is bad! Don‘t update all FF! ❏ We know: BB node v 6 can only be updated in R2 ❏ When to update FF nodes to make enable update BB in R2 ❏ E.g, updating FF-node v 4 in R1 allows to update BB v 6 in R2 ❏ But only if FF-node v 3 is not updated as well in R1: potential loop ❏ Smells like a gadget: which FF nodes to update when is hard!

  71. It‘s Good to Relax: How to update LF? … v n-2 v 2 s v 3 v 4 d v n-1

  72. LF Updates Can Take Many Rounds! … v n-2 v 2 s v 3 v 4 d v n-1 Invariant: need to update v 2 before v 3 !

  73. LF Updates Can Take Many Rounds! … v n-2 v 2 s v 3 v 4 d v n-1 Invariant: need to update v 3 before v 4 !

  74. LF Updates Can Take Many Rounds! 1 1 … v n-2 v 2 s v 3 v 4 d v n-1 3 2 n-3 n-2 Induction: need to update v i-1 before v i (before v i+1 etc.)!  (n) rounds?! In principle, yes …: Need a path back out before updating backward edge!

  75. It is good to relax! 1 1 … v n-2 v 2 s v 3 v 4 d v n-1 But: If s has been updated, nodes not on (s,d)-path!

  76. It is good to relax! 1 1 … v n-2 v 2 s v 3 v 4 d v n-1 2 2 2 But: If s has been updated, nodes not on Could be updated Could be updated Could be updated (s,d)-path! simultaneously! simultaneously! simultaneously!

  77. It is good to relax! Finally put back on path! 1 1 … v n-2 v 2 s v 3 v 4 d v n-1 2 2 2 3 But: If s has been updated, nodes not on Could be updated Could be updated Could be updated (s,d)-path! simultaneously! simultaneously! simultaneously!

  78. It is good to relax! Finally put back on path! 1 1 … v n-2 v 2 s v 3 v 4 d v n-1 2 2 2 3 But: If s has been updated, nodes not on Could be updated Could be updated Could be updated (s,d)-path! simultaneously! simultaneously! simultaneously! 3 rounds only!

  79. A log(n)-time Algorithm: Peacock in Action Shortcut Prune Shortcut Prune 93

  80. A log(n)-time Algorithm: Peacock in Action update Shortcut Prune Shortcut Prune Greedily choose far-reaching (independent) forward edges. 94

  81. A log(n)-time Algorithm: Peacock in Action update Shortcut Prune Shortcut Prune R1 generated many nodes in branches which can be updated simultaneously! 95

  82. A log(n)-time Algorithm: Peacock in Action Shortcut Prune Shortcut Prune Line re-established! (all merged with a node on the s-d-path) 96

  83. A log(n)-time Algorithm: Peacock in Action Shortcut Prune Shortcut Prune Peacock orders nodes wrt to distance: edge of length x can block at most 2 edges of length x, so distance 2x. 97

  84. A log(n)-time Algorithm: Peacock in Action Shortcut Prune Shortcut Prune At least 1/3 of nodes merged in each round pair (shorter s-d path): logarithmic runtime! 98

  85. A log(n)-time Algorithm: Peacock in Action Shortcut Prune Shortcut Prune 99

  86. A log(n)-time Algorithm: Peacock in Action Shortcut Prune Shortcut Prune Scheduling Loop-free Network Updates: It's Good to Relax! Arne Ludwig, Jan Marcinkowski, and Stefan Schmid. ACM Symposium on Principles of Distributed Computing ( PODC ), Donostia-San Sebastian, Spain, July 2015.

Recommend


More recommend