tolerating application failures with legosdn
play

Tolerating Application Failures with LegoSDN Balakrishnan - PowerPoint PPT Presentation

Tolerating Application Failures with LegoSDN Balakrishnan Chandrasekaran Theophilus Benson Duke University Quality of Code In C, I never learned to use the debugger, so I used to never make mistakes I went millions and millions


  1. Tolerating Application Failures with LegoSDN Balakrishnan Chandrasekaran Theophilus Benson Duke University

  2. Quality of Code “In C, I never learned to use the debugger, so I used to never make mistakes …” “I went millions and millions of hours with no problems—probably tens of millions of hours with no problems.” — Arthur Whitney, creator of A , K and Q . ACM Queue, Feb 2009. October 28, 2014 HotNets 2014 | LegoSDN 2

  3. Bugs are endemic in software! § Bugs can be deterministic or non- deterministic § [STS] Pox Premature PacketIn – l2_multi routing module failed unexpectedly with a KeyError. October 28, 2014 HotNets 2014 | LegoSDN 3

  4. Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 4

  5. Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 5

  6. Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 6

  7. LegoSDN § Availability is of utmost importance – Second only to security October 28, 2014 HotNets 2014 | LegoSDN 7

  8. Fate-sharing § Fate-sharing relationships between – the SDN controller and the SDN application(s) (also between SDN applications) – the SDN application and the network § Failure in any one SDN application brings down the other applications, and the SDN controller. October 28, 2014 HotNets 2014 | LegoSDN 8

  9. Three-pronged approach 1 App1 App2 … A A A Controller Contain c crash in out October 28, 2014 HotNets 2014 | LegoSDN 9

  10. Three-pronged approach App1 App2 … A A A Controller Undo c changes es in out 2 October 28, 2014 HotNets 2014 | LegoSDN 10

  11. Three-pronged approach App1 App2 … A A A Controller Handle m e mes essage in 3 out October 28, 2014 HotNets 2014 | LegoSDN 11

  12. Controller architecture must support two new abstractions October 28, 2014 HotNets 2014 | LegoSDN 12

  13. Current architecture App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 13

  14. Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 14

  15. Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 15

  16. Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 16

  17. Isolate SDN-Apps from the network Sandbox App1 A Controller a October 28, 2014 HotNets 2014 | LegoSDN 17

  18. Isolate SDN-Apps from the network Sandbox App1 A Controller a October 28, 2014 HotNets 2014 | LegoSDN 18

  19. LegoSDN Sandbox Ap AppVisor S Stub App1 Lightweight wrapper A AppVisor Stub Ap AppVisor P Proxy xy AppVisor Proxy Message dispatcher Controller SDN-App is treated as a black-box. a NetLog Stub and proxy allow SDN-Apps to talk to controller. NetLog Ne tLog Transactional support October 28, 2014 HotNets 2014 | LegoSDN 19

  20. LegoSDN Sandbox Built o on t top o of F FloodLight App1 A Ported three applications bundled with AppVisor Stub FloodLight to LegoSDN AppVisor Proxy Controller a NetLog October 28, 2014 HotNets 2014 | LegoSDN 20

  21. Three-pronged approach App1 App2 … A A A Controller Handle m e mes essage in 3 out October 28, 2014 HotNets 2014 | LegoSDN 21

  22. How do you handle the crash inducing message? October 28, 2014 HotNets 2014 | LegoSDN 22

  23. 1. Crash and burn § Halt the application – SDN-App cannot continue processing – Other SDN-Apps can continue unaffected § No Compromise – Think of security related SDN-Apps Correctness : SDN-App’s ability to implement its functionality without change, according to the specification. October 28, 2014 HotNets 2014 | LegoSDN 23

  24. 2. Induce amnesia § Ignore or drop the crash inducing message – SDN-App will not see the message again § Complete Compromise October 28, 2014 HotNets 2014 | LegoSDN 24

  25. 3. Apply transformations § Transform the offending message into another one that the application can handle – application will continue with a modified input § Equivalence Compromise October 28, 2014 HotNets 2014 | LegoSDN 25

  26. Course of action? No Compromise Apply T ransformation(s) Complete Compromise Operator October 28, 2014 HotNets 2014 | LegoSDN 26

  27. Related work § Fault tolerance – via reboots – applying Paxos for leader selection § Debugging SDN-Apps or the controller October 28, 2014 HotNets 2014 | LegoSDN 27

  28. Message equivalence § How do you determine two messages are equivalent? October 28, 2014 HotNets 2014 | LegoSDN 28

  29. Rollbacks are non-trivial § Rollback of one or more rules installed changes controller’s view of the state of network – Might induce crashes of other SDN applications that rely on a consistent view of network state October 28, 2014 HotNets 2014 | LegoSDN 29

  30. Error propagation § Last message received by the SDN-App prior to the crash need not be the culprit! – How far along should we go back in history to find the root cause of the crash? – Recovery from an earlier checkpoint; How many checkpoints should we maintain? October 28, 2014 HotNets 2014 | LegoSDN 30

  31. Road ahead § Rethink controller architecture – LegoSDN is only the tip of the iceberg. § Resilient controllers can catalyze adoption § Failures need to be a first-class citizen October 28, 2014 HotNets 2014 | LegoSDN 31

  32. October 28, 2014 HotNets 2014 | LegoSDN 32

Recommend


More recommend