Fibbing computes routing messages to inject in ~1ms simple merger (95 − th) merger (median) computation 10 10 merger (5 − th) time (s) time (sec) 0.1 0.1 median 0.001 0.001 0 20 40 60 80 0 20 40 60 80 % of nodes changing next-hop % of nodes changing next-hop % of nodes changing next − hop
Check out our webpage fibbing.net
Network programmability through synthesis Fibbing SyNET “the inputs” “the functions” current focus under submission
Fibbing is limited by the configurations running on the routers Works with a single protocol family Dijkstra-based shortest-path routing Can lead to loads of messages if the configuration is not adapted Suffers from reliability issues need to remove the lies upon failures
Inputs Outputs ! ip multicast-routing Network specification ( N ) ! ! ip multicast-routing ! interface Loopback0 ! ! ip address 120.1.7.7 255.255.255.255 interface Loopback0 ! ip ospf 1 area 0 ip address 120.1.7.7 255.255.255.255 ! ! ip ospf 1 area 0 router ospf 1 ! ! router-id 120.1.7.7 SyNET Physical topology ( φ N ) interface Ethernet0/0 ! redistribute bgp 700 subnets no ip address interface Ethernet0/0 ! ! no ip address router bgp 700 interface Ethernet0/0.17 ! neighbor 125.1.17.1 remote-as 100 encapsulation dot1Q 17 interface Ethernet0/0.17 ! ip address 125.1.17.7 255.255.255.0 encapsulation dot1Q 17 address-family ipv4 ip pim bsr-border ip address 125.1.17.7 255.255.255.0 High-level requirements ( φ R ) redistribute ospf 1 match internal external 1 external 2 ip pim sparse-mode ip pim bsr-border neighbor 125.1.17.1 activate ! ip pim sparse-mode ! ! address-family ipv4 multicast router ospf 1 network 125.1.79.0 mask 255.255.255.0 router-id 120.1.7.7 redistribute ospf 1 match internal external 1 external 2 redistribute bgp 700 subnets neighbor 125.1.17.1 activate !
SyNET can generate configurations for (small) networks # routers 4 9 16 static # protocols static, OSPF static, OSPF, BGP
SyNET can generate configurations for (small) networks # routers 4 9 16 static 1.8s 18.2s 116.1s # protocols static, OSPF 4.2s 37.0s 197.0s static, OSPF, BGP 13.8s 189.4s 577.4s
Check out our webpage synet.ethz.ch
Network programmability through synthesis Fibbing SyNET “the inputs” “the functions”
Now that we’ve programmability, What can we do with it?
Analyze Plan control Monitor Execute algorithms visibility programmability Adaptative Networked System
SWIFT Predictive Fast Reroute upon Remote BGP Disruptions Laurent Vanbever ETH Zürich (D-ITET) Munich Internet Research Retreat November 25 2016
25.9 seconds
25.9 seconds max. monthly downtime under a 99.999% SLA
IP routers are slow to converge upon remote link and node failures
R1
R2 0 1 R1 R3
R1 prefers to send traffic via R2 when possible, as it is much cheaper than via R3 R2 $ 0 1 R1 R3 $$$
preferred R2 $ 0 1 R1 R3 $$$
R3 R2 0 1 R1 R4 R3 R5
R3 300k R2 600k 300k 0 300k 1 R1 R4 600k R3 300k R5
R3 R1’s Forwarding Table 300k R2 prefix Next-Hop 600k 300k 0 1 1.0.0.0/24 0 300k 2 1.0.1.0/16 0 1 R1 … … … R4 600k 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 300k R5
What if R3 fails? R3 R1’s Forwarding Table R2 prefix Next-Hop 0 1 1.0.0.0/24 0 2 1.0.1.0/16 0 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5
R2 sends 300k routing messages withdrawing the routes from R3 R3 R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 0 2 1.0.1.0/16 0 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5
R1 receives the messages one-by-one and updates its forwarding table entry-by-entry R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 0 2 1.0.1.0/16 0 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5
R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 1 2 1.0.1.0/16 0 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5
R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 1 2 1.0.1.0/16 1 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5
R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 1 2 1.0.1.0/16 1 1 R1 … … … R4 300k 100.0.0.0/8 1 … … … R3 600k 200.99.0.0/24 0 R5
Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries
Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries Both of which are terribly slow…
Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries
We measured how long it takes for large bursts of BGP updates to propagate in the Internet dataset a month (July’16) worth of Internet updates from ~200 routers scattered around the globe methodology detect the beginning and end of a burst using a 10 sec sliding window
10 6 burst size 10 5 10 4 10 3 1101 809 308 247 10 3 nb of bursts 92 10 2 21 14 18 9 10 1 0-2 2-8 8-15 15-30 30-60 60-90 120-200 90-120 >200 burst duration (sec)
We found a total of 2619 bursts over the month 10 6 burst size 10 5 10 4 10 3 1101 809 308 247 10 3 nb of bursts 92 10 2 21 14 18 9 10 1 0-2 2-8 8-15 15-30 30-60 60-90 120-200 90-120 >200 burst duration (sec)
~15% of the bursts takes more than 15s to be learned 10 6 burst size 10 5 10 4 10 3 1101 809 308 247 10 3 nb of bursts 92 10 2 21 14 18 9 10 1 0-2 2-8 8-15 15-30 30-60 60-90 120-200 90-120 >200 burst duration (sec)
~10% of the bursts contained more than 100k prefixes 10 6 burst size 10 5 10 4 10 3 1101 809 308 247 10 3 nb of bursts 92 10 2 21 14 18 9 10 1 0-2 2-8 8-15 15-30 30-60 60-90 120-200 90-120 >200 burst duration (sec)
Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries
We measured how long it takes recent routers to update a growing number of forwarding entries Cisco Nexus 7k ETH recent routers 25 deployed
150 convergence time (s) 10 1 0.1 1K 5K 10K 50K 100K 200K 300K 400K 500K # of prefixes
worst-case 150 convergence 150 100 time (s) 10 10 1 1 0.1 .1 1K 5K 10K 50K 100K 300K 500K 1K 5K 10K 50K 100K 200K 300K 400K 500K # of prefixes
worst-case 150 convergence 150 100 time (s) median case 10 10 1 1 0.1 .1 1K 5K 10K 50K 100K 300K 500K 1K 5K 10K 50K 100K 200K 300K 400K 500K # of prefixes
Traffic can be lost for several minutes ~2.5 min. 150 150 100 10 10 1 1 0.1 .1 1K 5K 10K 50K 100K 300K 500K 1K 5K 10K 50K 100K 200K 300K 400K 500K # of prefixes
Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries prefix-based and hence, slow
SWIFT: Predictive Fast Rerouting Joint work with: Thomas Holterbach, Alberto Dainotti, Stefano Vissicchio
SWIFT: Predictive Fast Rerouting speed up… learning about the failure
SWIFT: Predictive Fast Rerouting speed up… learning about the failure solution predict the extent of a failure from few messages
SWIFT: Predictive Fast Rerouting speed up… learning about the failure solution predict the extent of a failure from few messages challenge speed and precision
SWIFT: Predictive Fast Rerouting speed up… learning updating about the failure the data plane solution predict the extent of a failure from few messages challenge speed and precision
SWIFT: Predictive Fast Rerouting speed up… learning updating about the failure the data plane solution predict the extent update groups of entries of a failure from instead of individual ones few messages challenge speed and precision
SWIFT: Predictive Fast Rerouting speed up… learning updating about the failure the data plane solution predict the extent update groups of entries of a failure from instead of individual ones few messages challenge speed and precision failure model
SWIFT: Predictive Fast Rerouting Predicting 1 out of few messages Updating 2 groups of entries Supercharging 3 existing systems
SWIFT: Predictive Fast Rerouting Predicting 1 out of few messages Updating groups of entries Supercharging existing systems
Recommend
More recommend