safe ai for cps
play

Safe AI for CPS Andr Platzer Carnegie Mellon University Joint work - PowerPoint PPT Presentation

Safe AI for CPS Andr Platzer Carnegie Mellon University Joint work with Nathan Fulton Safety-Critical Systems "How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing Safety-Critical


  1. Safe AI for CPS André Platzer Carnegie Mellon University Joint work with Nathan Fulton

  2. Safety-Critical Systems "How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing

  3. Safety-Critical Systems "How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing

  4. This Talk Ensure the safety of Autonomous Cyber-Physical Systems. Best of both worlds: learning together with CPS safety Flexibility of learning • Guarantees of CPS formal methods • Diametrically opposed: flexibility+adaptability versus predictability+simplicity 1. Cyber-Physical Systems with Differential Dynamic Logic 2. Sandboxed reinforcement learning is provably safe 3. Model-update learning addresses uncertainty with multiple models

  5. Airborne Collision Avoidance System ACAS X Developed by FAA to replace current TCAS in aircraft ● Approximately optimizes MDP on a grid ● Advisory from lookup tables with 5D interpolation regions ● Identified safe region per advisory and proved in KeYmaera X ● dela y 1 6 2 3 5 1 4 STTT’17

  6. Comparison: ACAS X issues DNC h No change Following ownship 10,80 (ft) advisory DNC trajectory path 0 without DNC 10,400 intruder path 10,200 ownship 10,000 path with DNC 9,600 20 15 10 5 0 5 10 15 20 time to crossing (s) But CL1500 or no change would not lead to a collision

  7. Model-Based Verification Reinforcement Learning φ

  8. Model-Based Verification Reinforcement Learning pos < stopSign

  9. Model-Based Verification Reinforcement Learning ctrl pos < stopSign

  10. Model-Based Verification Reinforcement Learning ctrl pos < stopSign Approach : prove that control software achieves a specification with respect to a model of the physical system.

  11. Model-Based Verification Reinforcement Learning ctrl pos < stopSign Approach : prove that control software achieves a specification with respect to a model of the physical system.

  12. Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ●

  13. Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful”

  14. Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model ●

  15. Model-Based Verification Reinforcement Learning Act φ Benefits: Observe Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model. ●

  16. Model-Based Verification Reinforcement Learning Act φ Observe Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Automated analysis Optimal (effective) policies ● ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model. ●

  17. Model-Based Verification Reinforcement Learning Act φ Observe Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Automated analysis Optimal (effective) policies ● ● Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model. proof development ●

  18. Model-Based Verification Reinforcement Learning Act φ Observe Goal: Provably correct reinforcement learning Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Aomputational aids (ATP) Optimal (effective) policies ● ● Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model proof development ●

  19. Model-Based Verification Reinforcement Learning Act φ Observe Goal: Provably correct reinforcement learning Benefits: Benefits: 1. Learn Safety Strong safety guarantees No need for complete model ● ● 2. Learn a Safe Policy Aomputational aids (ATP) Optimal (effective) policies ● ● 3. Justify claims of safety Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model proof development ●

  20. Part I: Differential Dynamic Logic Trustworthy Proofs for Hybrid Systems

  21. Hybrid Programs x := t x=x 0 x=t y=y 0 y=y 0 z=z 0 z=z 0 ... ...

  22. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ...

  23. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ... If P is true: no change ?P If P is false: terminate

  24. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ... If P is true: no change ?P If P is false: terminate a* a ...a...

  25. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ... a ∪ b If P is true: no change ?P If P is false: terminate a* a ...a...

  26. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ... a ∪ b If P is true: no change ?P If P is false: terminate x=F(0) ... x’=f(x) a* x=x 0 a ...a... ⋮ ... x=F(T) ...

  27. Approaching a Stopped Car Own Car Stopped Car Is this property true? [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel ≥ 0 & t ≤ T} }* ](pos <= stoppedCarPos)

  28. Approaching a Stopped Car Own Car Stopped Car Assuming we only accelerate when it’s safe to do so , is this property true? [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel ≥ 0 & t ≤ T} }* ](pos <= stoppedCarPos)

  29. Approaching a Stopped Car Own Car Stopped Car safeDistance(pos,vel,stoppedCarPos,B) if we also assume the system is safe initially: safeDistance(pos,vel,stoppedCarPos,B) → [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel ≥ 0 & t ≤ T} }* ](pos <= stoppedCarPos)

  30. Approaching a Stopped Car Own Car Stopped Car safeDistance(pos,vel,stoppedCarPos,B) safeDistance(pos,vel,stoppedCarPos,B) → [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel ≥ 0 & t ≤ T} }* ](pos <= stoppedCarPos)

  31. The Fundamental Question Proofs give strong mathematical evidence of safety. Why would our program not work if we have a proof ?

  32. The Fundamental Question Why would our program not work if we have a proof ? 1. Was the proof correct?

  33. The Fundamental Question Why would our program not work if we have a proof ? 1. Was the proof correct? 2. Was the model accurate enough? ≠

  34. The Fundamental Question Why would our program not work if we have a proof ? 1. Was the proof correct? KeYmaera X 2. Was the model accurate enough? dI Tactic: DI Axiom: ODE & Controls Tooling [{x'=f&Q}]P↔([?Q]P←(Q→[{x'=f&Q}]P')) Example: [v’=r p v 2 -g,t’=1]v ≥ v 0 - gt ↔ Clever Bellerophon … ↔ Programs Side derivation: [v’:=r p v 2 -g][t’:=1]v’ ≥ -g*t’ ↔ (v ≥ v 0 - r p v 2 -g ≥ -g ↔ gt)’ ↔ H → r p ≥ 0 ... ↔ ... ↔ KyX qed Axioms ... H=r p ≥0 & r a ≥0 & g>0 & ...

  35. The Fundamental Question Why would our program not work if we have a proof ? 1. Was the proof correct? KeYmaera X 2. Was the model accurate enough? Safe RL dI Tactic: DI Axiom: ODE & Controls Tooling [{x'=f&Q}]P↔([?Q]P←(Q→[{x'=f&Q}]P')) Example: [v’=r p v 2 -g,t’=1]v ≥ v 0 - gt ↔ Clever Bellerophon … ↔ Programs Side derivation: [v’:=r p v 2 -g][t’:=1]v’ ≥ -g*t’ ↔ (v ≥ v 0 - r p v 2 -g ≥ -g ↔ gt)’ ↔ H → r p ≥ 0 ... ↔ ... ↔ KyX qed Axioms ... H=r p ≥0 & r a ≥0 & g>0 & ...

  36. Part II: Justified Speculative Control ≠ Safe reinforcement learning in partially modeled environments AAAI 2018

  37. Model-Based Verification Accurate, analyzable models often exist! { ∪ brake ∪ ?safeTurn; turn}; {?safeAccel;accel {pos’ = vel, vel’ = acc} }*

  38. Model-Based Verification Accurate , analyzable models often exist! { ∪ brake ∪ ?safeTurn; turn}; {?safeAccel;accel {pos’ = vel, vel’ = acc} discrete control }* Continuous motion

  39. Model-Based Verification Accurate , analyzable models often exist! { ∪ brake ∪ ?safeTurn; turn}; {?safeAccel;accel {pos’ = vel, vel’ = acc} discrete, non-deterministic }* Continuous control motion

  40. Model-Based Verification Accurate , analyzable models often exist! init → [ { ∪ brake ∪ ?safeTurn; turn}; { ?safeAccel;accel {pos’ = vel, vel’ = acc} }* ]pos < stopSign

Recommend


More recommend