defending distributed cyber physical systems with bounded
play

Defending Distributed Cyber-Physical Systems with Bounded Time - PowerPoint PPT Presentation

Defending Distributed Cyber-Physical Systems with Bounded Time Recovery Bri Brian Sa Sandler, Neeraj Gandhi, Linh Thi Xuan Phan, Andreas Haeberlen NSF/Intel CPS PI Meeting July 2018 1 Machines in Control Vulnerable CPS can cause


  1. Defending Distributed 
 Cyber-Physical Systems with 
 Bounded Time Recovery Bri Brian Sa Sandler, Neeraj Gandhi, Linh Thi Xuan Phan, Andreas Haeberlen NSF/Intel CPS PI Meeting July 2018 1

  2. Machines in Control • Vulnerable CPS can cause Bellingham, WA disaster. Oil pipeline explosion after the two controlling computers failed. • Explosion • Equipment damage • Power outages Iran Stuxnet vulnerability destroyed • … centrifuges used for nuclear enrichment. We want to pre reve vent Ivano-Frankivsk, Ukraine Controlling power grid systems disa sast ster. were compromised leaving residents in the dark. 2 BTR - NSF/Intel PI Meeting - July 2018

  3. Goal: General Defense Non-Crash Bugs Hacking Crashes Byzantine Faults 3 BTR - NSF/Intel PI Meeting - July 2018

  4. Example: Industrial Automation Let’s take a simple example system… N 1 N 2 N 4 N 3 S 1 S 2 A 1 A 2 A 3 A 4 4 BTR - NSF/Intel PI Meeting - July 2018

  5. Example: Industrial Automation This system will run four applications. 5 1 7 4 N 1 N 2 N 4 N 3 S 1 S 2 3 2 A 1 A 2 A 3 A 4 6 8 5 BTR - NSF/Intel PI Meeting - July 2018

  6. Example: Industrial Automation We’ll focus on the burner control application… 5 1 7 4 N 1 N 2 N 4 N 3 S 1 S 2 3 2 A 1 A 2 A 3 A 4 6 8 6 BTR - NSF/Intel PI Meeting - July 2018

  7. Example: Impact of Failures What can go wrong? N 4 can dro rop or delay delay messages and ruin the chemical processing. 5 1 N 4 can send an inco corre rrect ct 7 4 value to A 1 and light the va N 1 building on fire. N 2 N 4 N 3 S 1 S 2 3 2 A 1 A 2 A 3 A 4 6 8 7 BTR - NSF/Intel PI Meeting - July 2018

  8. State of the Art: Byzantine Fault Tolerance • Be Benefit fits • Adversarial Scenarios • Strong Guarantees • Nice Programming Model 8 BTR - NSF/Intel PI Meeting - July 2018

  9. Is continuous perfection required? • How bad is it if the adversary gains control? • Many CPS have properties Chemical that resist quick changes Vat • inertia • thermal capacity N 4 • We don’t have to always be perfect We ca can leve vera rage this! s! 9 BTR - NSF/Intel PI Meeting - July 2018

  10. For how long is faulty behavior okay? • Different applications have different tolerances. DC/DC converters (STM) 20 μ s Direct torque control (ABB) 25 μ s AC/DC converters 50 μ s Electronic throttle control (Ford) 5ms Traction control (Ford) 20ms Micro-scale race cars 40ms Autonomous vehicle steering 50ms Energy-efficient building control 500ms Source: M. Morari. Fast model predictive control (mpc). A time me peri riod usu sually y exi xist sts s where re faulty y behavi vior r is s ok k so so long as s the syst system m re return rns s to its s co corre rrect ct behavi vior r within that peri riod. 10 BTR - NSF/Intel PI Meeting - July 2018

  11. Approach: Bounded Time Recovery • BTR guarantees that system recovers from any fault within a short period of time, so that the end goal will be met • Weaker guarantee is often sufficient Recovery Correct Operation Correct Operation Period Time Fault Recovered 11 BTR - NSF/Intel PI Meeting - July 2018

  12. So, how do we make this happen? REBOUND 12 BTR - NSF/Intel PI Meeting - July 2018

  13. REBOUND 1. Planning • Before system is compromised, think about what it should do. • System operates in different modes for any given set of faults. • Can drop less critical tasks as necessary. N 1 N 1 N 2 fails N 2 N 2 N 3 N 3 N 1 : N 3 : N 4 N 4 N 4 : 13 BTR - NSF/Intel PI Meeting - July 2018

  14. REBOUND 2. Detection Nodes watch over each other to detect faults. Evidence N 4 is SEND… SEND… faulty 5 3 1 3 RECV… RECV… … … 7 4 N 4 is faulty. N 1 N 2 N 4 N 3 S 1 S 2 3 2 A 1 A 2 A 3 A 4 6 8 14 BTR - NSF/Intel PI Meeting - July 2018

  15. REBOUND 3. Consistency Flood evidence throughout the system. N 4 is faulty 1 5 3 3 7 4 N 1 N 2 N 4 N 3 S 1 S 2 3 2 A 1 A 2 A 3 A 4 6 8 15 BTR - NSF/Intel PI Meeting - July 2018

  16. REBOUND 4. Adaptation Each node independently transitions to a new mode N 4 is All nodes N 4 is 1 5 faulty faulty OK All node N 4 is 7 8 4 faulty OK N 1 N 2 l nodes N 4 is OK faulty N 4 N 3 S 1 S 2 3 2 3 A 1 A 2 A 3 A 4 All no N 4 6 8 All nodes N 4 is faulty faulty OK All nodes N 4 is All nodes N 4 is All nodes N 4 is OK faulty 16 faulty OK BTR - NSF/Intel PI Meeting - July 2018 faulty OK

  17. Outline • Problem Introduction • Bounded Time Recovery • REBOUND • Technical Components 1. Planning 2. Detection 3. Consistency 4. Adaptation • Results 17 BTR - NSF/Intel PI Meeting - July 2018

  18. 1. Planning For every* mode, we have a precomputed schedule and plan for every node. No Faults • Schedule generated offline • When tasks should run and where Node 1 Link 1-2 • Many constraints Faulty Faulty • Dependent scheduling problem Nodes … … 1&4 Faulty • Builds a tree * Can limit the number of faults to improve computation time. 18 BTR - NSF/Intel PI Meeting - July 2018

  19. 2. Detection I declare Omission Faults link N 1 – N be fault • Declare link faulty if an expected message from a neighbor is not received X N 1 N 2 • Declaration causes other nodes to change mode. • Leverage synchrony. RECV… Commission Faults 2 4 SEND… Audit/Witne • Witness/Audit Nodes and Replicas RECV… Task • If fault found, log is used as a proof of (runs a replica misbehavior. • Large improvement over PeerReview 2 4 2 4 RECV… RECV… • Adding synchrony SEND… SEND… RECV… Challenge: Bounding Time of Detection RECV… 19 BTR - NSF/Intel PI Meeting - July 2018

  20. 3. Consistency We need a solution where… • Any two good nodes agree on the state of the system or • The two become aware they cannot X communicate St Stra rawma man: flood the system periodically with signed attestations of current mode • Actual solution is more efficient 20 BTR - NSF/Intel PI Meeting - July 2018

  21. 4. Adaptation • Each node individually transitions when its mode changes. • When evidence is received a mode change occurs within a bounded period of time. N 1 N 1 N 1 N 1 N 4 fails N 1 fails N 2 fails N 2 N 2 N 2 N 2 N 3 N 3 N 3 N 3 N 3 : N 1 : N 3 : N 3 : N 4 N 4 N 4 N 4 N 4 : N 4 : N 1 & N 2 Faulty N 2 Faulty N 1 ,N 2 ,N 4 Faulty 21 BTR - NSF/Intel PI Meeting - July 2018

  22. Challenges • Bounding every step of the algorithms • Overhead of periodic flood • Multisignatures � drastically reduce traffic • Handling equivocation • Different nodes notifying of different faults to their neighbors • Proving everything • Correctness … • Completeness • Bounded detection • Bounded stabilization … • Planning • Unique problem … 22 BTR - NSF/Intel PI Meeting - July 2018

  23. Outline • Problem Introduction • Bounded Time Recovery • REBOUND • Technical Components 1. Planning 2. Detection 3. Consistency 4. Adaption • Results 23 BTR - NSF/Intel PI Meeting - July 2018

  24. Overhead of Schedule Tree f = # of faulty nodes protected against • Time depends on: • The number of nodes. • Degree of network. • Number of faulty nodes, f. • Only compute once for the lifetime of the system. • Subtrees easily parallelizable. 24 BTR - NSF/Intel PI Meeting - July 2018

  25. Recovery Unprotected System, N 2 Compromised 25 BTR - NSF/Intel PI Meeting - July 2018

  26. Recovery Protected System, N 2 Compromised Recovery Period 26 BTR - NSF/Intel PI Meeting - July 2018

  27. Recovery Protected System, N 1 , N 2 , N 3 Compromised 27 BTR - NSF/Intel PI Meeting - July 2018

  28. Ke Key y Idea: Period of Imperfection Many CPS can tolerate a short period of aulty behavior. Appro Ap roach ch: Bounded Time Recovery Bounded time recovery guarantees that the system quickly returns to correct behavior fter a fault. So Solution: REBOUND Algorithms and protocols to provide BTR or distributed systems. Thank you. 28 BTR - NSF/Intel PI Meeting - July 2018

Recommend


More recommend