Tolerating Application Failures with LegoSDN Balakrishnan Chandrasekaran Theophilus Benson Duke University
Quality of Code “In C, I never learned to use the debugger, so I used to never make mistakes …” “I went millions and millions of hours with no problems—probably tens of millions of hours with no problems.” — Arthur Whitney, creator of A , K and Q . ACM Queue, Feb 2009. October 28, 2014 HotNets 2014 | LegoSDN 2
Bugs are endemic in software! § Bugs can be deterministic or non- deterministic § [STS] Pox Premature PacketIn – l2_multi routing module failed unexpectedly with a KeyError. October 28, 2014 HotNets 2014 | LegoSDN 3
Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 4
Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 5
Cascading Crashes App1 App2 … A A A Controller in out October 28, 2014 HotNets 2014 | LegoSDN 6
LegoSDN § Availability is of utmost importance – Second only to security October 28, 2014 HotNets 2014 | LegoSDN 7
Fate-sharing § Fate-sharing relationships between – the SDN controller and the SDN application(s) (also between SDN applications) – the SDN application and the network § Failure in any one SDN application brings down the other applications, and the SDN controller. October 28, 2014 HotNets 2014 | LegoSDN 8
Three-pronged approach 1 App1 App2 … A A A Controller Contain c crash in out October 28, 2014 HotNets 2014 | LegoSDN 9
Three-pronged approach App1 App2 … A A A Controller Undo c changes es in out 2 October 28, 2014 HotNets 2014 | LegoSDN 10
Three-pronged approach App1 App2 … A A A Controller Handle m e mes essage in 3 out October 28, 2014 HotNets 2014 | LegoSDN 11
Controller architecture must support two new abstractions October 28, 2014 HotNets 2014 | LegoSDN 12
Current architecture App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 13
Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 14
Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 15
Isolate SDN-Apps from the controller Sandbox Sandbox App1 App2 A A Controller October 28, 2014 HotNets 2014 | LegoSDN 16
Isolate SDN-Apps from the network Sandbox App1 A Controller a October 28, 2014 HotNets 2014 | LegoSDN 17
Isolate SDN-Apps from the network Sandbox App1 A Controller a October 28, 2014 HotNets 2014 | LegoSDN 18
LegoSDN Sandbox Ap AppVisor S Stub App1 Lightweight wrapper A AppVisor Stub Ap AppVisor P Proxy xy AppVisor Proxy Message dispatcher Controller SDN-App is treated as a black-box. a NetLog Stub and proxy allow SDN-Apps to talk to controller. NetLog Ne tLog Transactional support October 28, 2014 HotNets 2014 | LegoSDN 19
LegoSDN Sandbox Built o on t top o of F FloodLight App1 A Ported three applications bundled with AppVisor Stub FloodLight to LegoSDN AppVisor Proxy Controller a NetLog October 28, 2014 HotNets 2014 | LegoSDN 20
Three-pronged approach App1 App2 … A A A Controller Handle m e mes essage in 3 out October 28, 2014 HotNets 2014 | LegoSDN 21
How do you handle the crash inducing message? October 28, 2014 HotNets 2014 | LegoSDN 22
1. Crash and burn § Halt the application – SDN-App cannot continue processing – Other SDN-Apps can continue unaffected § No Compromise – Think of security related SDN-Apps Correctness : SDN-App’s ability to implement its functionality without change, according to the specification. October 28, 2014 HotNets 2014 | LegoSDN 23
2. Induce amnesia § Ignore or drop the crash inducing message – SDN-App will not see the message again § Complete Compromise October 28, 2014 HotNets 2014 | LegoSDN 24
3. Apply transformations § Transform the offending message into another one that the application can handle – application will continue with a modified input § Equivalence Compromise October 28, 2014 HotNets 2014 | LegoSDN 25
Course of action? No Compromise Apply T ransformation(s) Complete Compromise Operator October 28, 2014 HotNets 2014 | LegoSDN 26
Related work § Fault tolerance – via reboots – applying Paxos for leader selection § Debugging SDN-Apps or the controller October 28, 2014 HotNets 2014 | LegoSDN 27
Message equivalence § How do you determine two messages are equivalent? October 28, 2014 HotNets 2014 | LegoSDN 28
Rollbacks are non-trivial § Rollback of one or more rules installed changes controller’s view of the state of network – Might induce crashes of other SDN applications that rely on a consistent view of network state October 28, 2014 HotNets 2014 | LegoSDN 29
Error propagation § Last message received by the SDN-App prior to the crash need not be the culprit! – How far along should we go back in history to find the root cause of the crash? – Recovery from an earlier checkpoint; How many checkpoints should we maintain? October 28, 2014 HotNets 2014 | LegoSDN 30
Road ahead § Rethink controller architecture – LegoSDN is only the tip of the iceberg. § Resilient controllers can catalyze adoption § Failures need to be a first-class citizen October 28, 2014 HotNets 2014 | LegoSDN 31
October 28, 2014 HotNets 2014 | LegoSDN 32
Recommend
More recommend