Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks … and Segment Routing! ... Stefan Schmid Jiri Srba University of Vienna, Austria Aalborg University, Denmark
Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks Stefan Schmid Jiri Srba University of Vienna, Austria Aalborg University, Denmark Teaser: Can we verify reachability under k failures without trying exponentially many options? Yes. MUCH FASTER! An Automata-Theoretic Approach.
Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks Stefan Schmid Jiri Srba University of Vienna, Austria Aalborg University, Denmark Kudos to collaborators: Jesper Stenbjerg Jensen, Jonas Sand Madsen, Troels Beck Krøgh at Aalborg University, Denmark
Configuring Networks is Hard … Datacenter, enterprise, carrier networks: mission-critical infrastructures . But even techsavvy companies struggle to provide reliable operations. We discovered a misconfiguration on this pair of switches that caused what's called a “ bridge loop ” in the network. A network change was […] executed incorrectly […] more “stuck” volumes and added more requests to the re-mirroring storm Service outage was due to a series of internal network events that corrupted router data tables Experienced a network connectivity issue […] interrupted the airline's flight departures, airport processing and reservations systems Credits: Nate Foster 1
… Especially Under Failures Example: BGP in Internet Datacenter X Y Datacenter C G D H A E B F G1 G2 P1 P2 Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- 2 wide Objectives and Device-level Configurations.
… Especially Under Failures Example: BGP in Internet Cluster with Cluster with services Datacenter services that that should be should be globally accessible only reachable . X Y internally . Datacenter C G D H A E B F G1 G2 P1 P2 Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- 2 wide Objectives and Device-level Configurations.
… Especially Under Failures X and Y announce to X and Y block what is Internet what is from G* Example: BGP in Internet from P*. (prefix). Datacenter X Y Datacenter C G D H A E B F G1 G2 P1 P2 Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- 2 wide Objectives and Device-level Configurations.
… Especially Under Failures X and Y announce to X and Y block what is Internet what is from G* Example: BGP in Internet from P*. (prefix). Datacenter X Y Datacenter C G D H What can go wrong? A E B F G1 G2 P1 P2 Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- 2 wide Objectives and Device-level Configurations.
… Especially Under Failures X and Y announce to X and Y block what is Internet what is from G* Example: BGP in Internet from P*. (prefix). Datacenter X X Y Datacenter C G D H If link (G,X) fails and traffic from G is rerouted A E B F via Y and C to X: X announces (does not block) G and H as it comes from C. (Note: BGP.) G1 G2 P1 P2 Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- 2 wide Objectives and Device-level Configurations.
Network Administration Today • Many forwarding tables with many rules , distributed across network B Policy ok? • Sysadmin responsible for: • Reachability: Can traffic from ingress port A reach egress port B? • Loop-freedom: Are the routes implied by the forwarding rules loop-free? • Non-reachability: Is it ensured that traffic A originating from A never reaches B? C • Waypoint ensurance: Is it ensured that traffic from A to B is always routed via a node C (e.g., a firewall)? 3
Network Administration Today • Many forwarding tables with many rules , distributed across network B Policy ok? • Sysadmin responsible for: • Reachability: Can traffic from ingress port A reach egress port B? • Loop-freedom: Are the routes implied by the forwarding rules loop-free? • Non-reachability: Is it ensured that traffic A originating from A never reaches B? C • Waypoint ensurance: Is it ensured that traffic from A to B is always routed via a node C (e.g., a firewall)? 3
Network Administration Today • Many forwarding tables with many rules , distributed across network B What if...?! • Sysadmin responsible for: • Reachability: Can traffic from ingress port A reach egress port B? • Loop-freedom: Are the routes implied by the forwarding rules loop-free? • Non-reachability: Is it ensured that traffic A originating from A never reaches B? C • Waypoint ensurance: Is it ensured that traffic from A to B is always routed via a node C (e.g., a firewall)? • … even under (multiple) failures! 3
Network Administration Today • Many forwarding tables with many rules , k failures = (𝑜 distributed across network 𝑙) possibilities B • Sysadmin responsible for: • Reachability: Can traffic from ingress port A reach egress port B? • Loop-freedom: Are the routes implied by the forwarding rules loop-free? • Non-reachability: Is it ensured that traffic A originating from A never reaches B? C • Waypoint ensurance: Is it ensured that traffic from A to B is always routed via a node C (e.g., a firewall)? • … even under (multiple) failures! 3
The Good News • Networks are becoming more programmable and logically centralized , have open interfaces, … • … are based on formal foundations … • … researchers develop high-level specification languages such as NetKAT . Enables a more automated network operation and verification! 4
The Bad News • For many traditional networks (still predominant !), such benefits are not available yet • Many existing tools cannot deal with failures • Super-polynomial runtime, verification PSPACE-hard • Other limitations: e.g., fixed header size 4
Tractability of Verification Reachability is undecidable in SDN: Can emulate a Turing machine. Self-loop: could be replaced by “dummy switch”. in’ out’ in out 5
Tractability of Verification Reachability is undecidable in SDN: Idea: packet header stores Can emulate a Turing machine. Turing machine configuration (tape, head, state). in’ out’ in out 5
Tractability of Verification Reachability is undecidable in SDN: Can emulate a Turing machine. in’ out’ in out Switch action: each time packet arrives, performs one Turing machine step and updates header. 5
Tractability of Verification Reachability is undecidable in SDN: Can emulate a Turing machine. Only if accept or reject, forwarded to out. Is it ever reached? Undecidable! in’ out’ in out 5
Our Contribution Polynomial-Time What-if Analysis for Prefix Rewriting Networks 6
Our Contribution Independently of the Reachability, loop- number of failures! No need freedom, to try combinations. waypointing, etc.! Polynomial-Time What-if Analysis for Prefix Rewriting Networks e.g., MPLS networks or Segment Routing networks Support arbitrary header sizes! 6
MPLS Networks • MPLS: forwarding based on top label of label stack in 1 10 11 20 21 12 v 3 v 4 out 1 v 1 v 2 in 2 22 Default routing of v 5 v 6 v 7 v 8 out 2 two flows 7
MPLS Networks • MPLS: forwarding based on top label of label stack swap push pop swap in 1 10 11 20 21 12 v 3 v 4 out 1 v 1 v 2 in 2 22 Default routing of v 5 v 6 v 7 v 8 out 2 two flows pop 7
MPLS Networks • MPLS: forwarding based on top label of label stack in 1 10 11 20 21 12 v 3 v 4 out 1 v 1 v 2 in 2 22 Default routing of v 5 v 6 v 7 v 8 out 2 two flows 7
MPLS Networks: 1 Failure • MPLS: forwarding based on top label of label stack in 1 10 11 20 21 12 v 3 v 4 out 1 v 1 v 2 Default routing of in 2 22 two flows v 5 v 6 v 7 v 8 out 2 • For failover: push and pop label in 1 12 v 3 v 4 out 1 v 1 v 2 One failure: push 30: in 2 22 30|11 11 route around (v 2 ,v 3 ) 30|21 21 v 5 v 6 v 7 v 8 out 2 8
MPLS Networks: 1 Failure • MPLS: forwarding based on top label of label stack in 1 10 11 20 12 21 v 3 v 4 out 1 v 1 v 2 Default routing of in 2 22 two flows v 5 v 6 v 7 v 8 out 2 If (v 2 ,v 3 ) failed, push 30 and • For failover: push and pop label forward to v 6 . in 1 12 v 3 v 4 out 1 v 1 v 2 One failure: push 30: in 2 Normal 22 30|11 11 route around (v 2 ,v 3 ) 30|21 swap 21 Pop v 5 v 6 v 7 v 8 out 2 8
MPLS Networks: 1 Failure • MPLS: forwarding based on top label of label stack in 1 10 11 20 12 21 v 3 v 4 out 1 v 1 v 2 Default routing of in 2 22 two flows v 5 v 6 v 7 v 8 out 2 If (v 2 ,v 3 ) failed, What about multiple link failures? push 30 and • For failover: push and pop label forward to v 6 . in 1 12 v 3 v 4 out 1 v 1 v 2 One failure: push 30: in 2 Normal 22 30|11 11 route around (v 2 ,v 3 ) 30|21 swap 21 Pop v 5 v 6 v 7 v 8 out 2 8
MPLS Networks: 2 Failures in 1 10 11 12 21 20 v 3 v 4 out 1 v 1 v 2 in 2 Original Routing 22 v 5 v 6 v 7 v 8 out 2 in 1 10 11 20 21 12 v 3 v 4 out 1 v 1 v 2 One failure : push 30: in 2 22 30|11 11 route around (v 2 ,v 3 ) 30|21 21 v 5 v 6 v 7 v 8 out 2 31|11 31|21 Push 30 in 1 10 Two failures : 11 20 12 21 v 3 v 4 out 1 Push 40 v 1 v 2 first push 30: route in 2 22 around (v 2 ,v 3 ) 40|30|11 11 40|30|21 21 Push recursively 40: v 5 v 6 v 7 v 8 out 2 route around (v 2 ,v 6 ) 31|11 30|11 31|21 30|21
Recommend
More recommend