l ife g uard practical repair of persistent route failures
play

L IFE G UARD: Practical Repair of Persistent Route Failures Ethan - PowerPoint PPT Presentation

L IFE G UARD: Practical Repair of Persistent Route Failures Ethan Katz-Bassett (USC) Colin Scott, David Choffnes, Italo Cunha, Valas Valancius, Nick Feamster, Harsha Madhyastha, Tom Anderson, Arvind Krishnamurthy This work is generously funded


  1. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK NTT:Ping? Fr:GMU Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works L IFE G UARD : Practical Repair of Persistent Route Failures 12

  2. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU GMU:Ping! NTT Rostelecom Fr:NTT ! Forward path works L IFE G UARD : Practical Repair of Persistent Route Failures 12

  3. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works L IFE G UARD : Practical Repair of Persistent Route Failures 12

  4. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works L IFE G UARD : Practical Repair of Persistent Route Failures 12

  5. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Rostele: Ping? Fr:GMU Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13

  6. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13

  7. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13

  8. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13

  9. How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13

  10. How L IFE G UARD Locates Failures L IFE G UARD : 1. Maintains background historical atlas 2. Isolates direction of failure, measures working direction 3. Tests historical paths in failing direction in order to prune candidate failure locations 4. Locates failure as being at the horizon of reachability L IFE G UARD : Practical Repair of Persistent Route Failures 14

  11. Our Approach and Outline L IFE G UARD : L ocating I nternet F ailures E ffectively and G enerating U sable A lternate R outes D ynamically ! Locate the ISP / link causing the problem ! Suggest that other ISPs reroute around the problem L IFE G UARD : Practical Repair of Persistent Route Failures 15

  12. Our Approach and Outline L IFE G UARD : L ocating I nternet F ailures E ffectively and G enerating U sable A lternate R outes D ynamically ! Locate the ISP / link causing the problem ! Suggest that other ISPs reroute around the problem ! What would we like to add to BGP to enable this? ! What can we deploy today, using only available protocols and router support? L IFE G UARD : Practical Repair of Persistent Route Failures 15

  13. Our Goal for Failure Avoidance ! Enable content / service providers to repair persistent routing problems affecting them, regardless of which ISP is causing them Setting ! Assume we can locate problem ! Assume we are multi-homed / have multiple data centers ! Assume we speak BGP ! We use BGP-Mux to speak BGP to the real Internet: 5 US universities as providers L IFE G UARD : Practical Repair of Persistent Route Failures 16

  14. Self-Repair of Forward Paths Straightforward: Choose a path that avoids the problem. L IFE G UARD : Practical Repair of Persistent Route Failures 17

  15. Self-Repair of Forward Paths Straightforward: Choose a path that avoids the problem. L IFE G UARD : Practical Repair of Persistent Route Failures 17

  16. Self-Repair of Forward Paths Straightforward: Choose a path that avoids the problem. L IFE G UARD : Practical Repair of Persistent Route Failures 17

  17. Self-Repair of Forward Paths Straightforward: Choose a path that avoids the problem. L IFE G UARD : Practical Repair of Persistent Route Failures 17

  18. A Mechanism for Failure Avoidance Forward path: Choose route that avoids ISP or ISP-ISP link Reverse path: Want others to choose paths to my prefix P that avoid ISP or ISP-ISP link X ! Want a BGP announcement AVOID(X,P): ! Any ISP with a route to P that avoids X uses such a route ! Any ISP not using X need only pass on the announcement L IFE G UARD : Practical Repair of Persistent Route Failures 18

  19. Ideal Self-Repair of Reverse Paths L IFE G UARD : Practical Repair of Persistent Route Failures 19

  20. Ideal Self-Repair of Reverse Paths AVOID(L3,WS) L IFE G UARD : Practical Repair of Persistent Route Failures 19

  21. Ideal Self-Repair of Reverse Paths AVOID(L3,WS) AVOID(L3,WS) L IFE G UARD : Practical Repair of Persistent Route Failures 19

  22. Ideal Self-Repair of Reverse Paths AVOID(L3,WS) AVOID(L3,WS) AVOID(L3,WS) L IFE G UARD : Practical Repair of Persistent Route Failures 19

  23. Ideal Self-Repair of Reverse Paths AVOID(L3,WS) AVOID(L3,WS) AVOID(L3,WS) L IFE G UARD : Practical Repair of Persistent Route Failures 19

  24. Do paths exist that AVOID problem? L IFE G UARD repairs outages by instructing others to avoid particular routes. Q: Do alternative routes exist? A: Alternate policy-compliant paths exist in 90% of simulated AVOID(X,P) announcements. ! Simulated 10 million AVOIDs on actual measured routes. L IFE G UARD : Practical Repair of Persistent Route Failures 20

  25. Practical Self-Repair of Reverse Paths L IFE G UARD : Practical Repair of Persistent Route Failures 21

  26. Practical Self-Repair of Reverse Paths WS L IFE G UARD : Practical Repair of Persistent Route Failures 21

  27. Practical Self-Repair of Reverse Paths ATT ! WS WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21

  28. Practical Self-Repair of Reverse Paths L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21

  29. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21

  30. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21

  31. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21

  32. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AVOID(L3,WS) AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 22

  33. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22

  34. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS Qwest ! WS ! L3 ! WS AISP ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22

  35. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS ! L3 ! WS Qwest ! WS ! L3 ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22

  36. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS Qwest ! WS ! L3 ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22

  37. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS ATT ! WS ! L3 ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22

  38. Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS ? ATT ! WS ATT ! WS ! L3 ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22

  39. Practical Self-Repair of Reverse Paths UW ! Sprint ! Qwest ! WS ! L3 ! WS UW ! L3 ! ATT ! WS ? ATT ! WS ATT ! WS ! L3 ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22

  40. Practical Self-Repair of Reverse Paths UW ! Sprint ! Qwest ! WS ! L3 ! WS UW ! L3 ! ATT ! WS ? ATT ! WS ATT ! WS ! L3 ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22

  41. Stuff I Don’t Have Time to Talk About Results from real poisonings ! Poisoning in the wild / poisoning anomalies ! Case study of restoring connectivity Making poisoning flexible ! Monitoring broken path while it is disabled ! Allowing ISPs w/o alternatives to use disabled route L IFE G UARD ’s scalability ! Overhead and speed of failure location ! Router update load if many ISPs deploy our approach Alternatives to poisoning ! Compatibility with secure routing (BGPSEC, etc.) ! Comparing to other route control mechanisms L IFE G UARD : Practical Repair of Persistent Route Failures 23

  42. Can poisoning approximate AVOID effects? L IFE G UARD ’s poisoning repairs outages by disabling routes to induce route exploration. Q: Does poisoning disrupt working routes? A: No. As I will describe: (a) Under certain circumstances, we can disable a link without disabling the full ISP . (b) We can speed BGP convergence by carefully crafting announcements. L IFE G UARD : Practical Repair of Persistent Route Failures 24

  43. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 L IFE G UARD : Practical Repair of Persistent Route Failures 25

  44. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 L IFE G UARD : Practical Repair of Persistent Route Failures 25

  45. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 ! Forward direction is easy: choose a different route L IFE G UARD : Practical Repair of Persistent Route Failures 26

  46. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 ! Forward direction is easy: choose a different route L IFE G UARD : Practical Repair of Persistent Route Failures 26

  47. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 ! Forward direction is easy: choose a different route L IFE G UARD : Practical Repair of Persistent Route Failures 27

  48. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP L IFE G UARD : Practical Repair of Persistent Route Failures 28

  49. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP L IFE G UARD : Practical Repair of Persistent Route Failures 28

  50. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O-O-O O-A-O O-A-O O-A-O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP L IFE G UARD : Practical Repair of Persistent Route Failures 29

  51. What if some routes in an ISP still work? A ? ? Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O-O-O O-A-O O-A-O O-A-O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP L IFE G UARD : Practical Repair of Persistent Route Failures 30

  52. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! Selective advertising via just D1 is also blunt L IFE G UARD : Practical Repair of Persistent Route Failures 31

  53. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! Selective advertising via just D1 is also blunt L IFE G UARD : Practical Repair of Persistent Route Failures 31

  54. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 ? New path ? C1 C4 D1 D2 ? O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! Selective advertising via just D1 is also blunt L IFE G UARD : Practical Repair of Persistent Route Failures 32

  55. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! If D1 and D2 (transitively) connect to different PoPs of A , selectively poison via D2 and not D1 L IFE G UARD : Practical Repair of Persistent Route Failures 33

  56. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! If D1 and D2 (transitively) connect to different PoPs of A , selectively poison via D2 and not D1 L IFE G UARD : Practical Repair of Persistent Route Failures 33

  57. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O-O-O O-A-O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! If D1 and D2 (transitively) connect to different PoPs of A , selectively poison via D2 and not D1 L IFE G UARD : Practical Repair of Persistent Route Failures 34

  58. What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O-O-O O-A-O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! If D1 and D2 (transitively) connect to different PoPs of A , selectively poison via D2 and not D1 L IFE G UARD : Practical Repair of Persistent Route Failures 35

  59. Can poisoning approximate AVOID effects? L IFE G UARD ’s poisoning repairs outages by disabling routes to induce route exploration. Q: Does poisoning disrupt working routes? A: No. As I will describe: (a) “Selective poisoning” can avoid 73% of links without disabling entire AS. ‣ Real-world results from 5 provider BGP-Mux testbed (b) We can speed BGP convergence by carefully crafting announcements. L IFE G UARD : Practical Repair of Persistent Route Failures 36

  60. Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O B-A-O F C working paths that E-D-A-O avoid problem ISP X D-A-O A-O ! Naively, poisoning E B F-B-A-O causes path exploration even for these ISPs A-O O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 37

  61. Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O B-A-O F C working paths that E-D-A-O avoid problem ISP X D-A-O A-O ! Naively, poisoning E B F-B-A-O causes path exploration even for these ISPs A-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 38

  62. Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O B-A-O F C working paths that E-D-A-O avoid problem ISP X D-A-O A-O-X-O ! Naively, poisoning E B F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 39

  63. Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O-X-O B-A-O-X-O F C working paths that E-D-A-O E-D-A-O avoid problem ISP X D-A-O-X-O A-O-X-O ! Naively, poisoning E B F-B-A-O F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 40

  64. Naive Poisoning Causes Transient Loss ! Some ISPs may have E-D-A-O B-A-O-X-O E-D-A-O B-A-O-X-O F C working paths that E-D-A-O B-A-O-X-O E-D-A-O avoid problem ISP X F-B-A-O D-A-O-X-O F-B-A-O A-O-X-O ! Naively, poisoning E B F-B-A-O D-A-O-X-O F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 41

  65. Naive Poisoning Causes Transient Loss ! Some ISPs may have E-D-A-O E-D-A-O B-A-O-X-O E-D-A-O B-A-O-X-O F C working paths that E-D-A-O B-A-O-X-O E-D-A-O avoid problem ISP X D-A-O-X-O F-B-A-O F-B-A-O F-B-A-O A-O-X-O ! Naively, poisoning E B F-B-A-O D-A-O-X-O F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 42

  66. Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O-X-O E-D-A-O E-D-A-O B-A-O-X-O E-D-A-O B-A-O-X-O F C working paths that E-D-A-O E-D-A-O B-A-O-X-O E-D-A-O avoid problem ISP X D-A-O-X-O F-B-A-O F-B-A-O F-B-A-O A-O-X-O D-A-O-X-O ! Naively, poisoning E B F-B-A-O D-A-O-X-O F-B-A-O F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 43

  67. Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O-X-O B-A-O-X-O F C working paths that E-D-A-O-X-O avoid problem ISP X D-A-O-X-O A-O-X-O ! Naively, poisoning E B F-B-A-O-X-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 44

  68. Prepend to Reduce Path Exploration ! Most routing decisions B-A-O-O-O B-A-O-O-O F C based on: E-D-A-O-O-O (1) next hop ISP D-A-O-O-O A-O-O-O (2) path length E B F-B-A-O-O-O ! Keep these fixed to speed convergence A-O-O-O O-O-O D A ! Prepending prepares ISPs for later poison AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 45

Recommend


More recommend