towards best effort autonomy
play

Towards Best-Effort Autonomy R udiger Ehlers University of Bremen - PowerPoint PPT Presentation

Towards Best-Effort Autonomy R udiger Ehlers University of Bremen Dagstuhl Seminar 17071, February 2017 Based on Joint work with Salar Moarref & Ufuk Topcu (CDC 2016) 1 Motivation Highly autonomous systems... ... degrade in


  1. Towards Best-Effort Autonomy R¨ udiger Ehlers University of Bremen Dagstuhl Seminar 17071, February 2017 Based on Joint work with Salar Moarref & Ufuk Topcu (CDC 2016) 1

  2. Motivation Highly autonomous systems... ... degrade in performance over time ... need to work correctly in off-nominal conditions ... need to adapt without the need of a human operator 2

  3. Motivation Highly autonomous systems... ... degrade in performance over time ... need to work correctly in off-nominal conditions ... need to adapt without the need of a human operator Problem: We do not always know in advance how they are degrading... ...so we should be able to synthesize an adapted strategy in the field 2

  4. Connecting Theory and Practice.... Specification Estimated Probabilities Control Policy MDP Computation Environment Specification Result 3

  5. ω -regular control of MDPs – basic setting MDP 4

  6. ω -regular control of MDPs – basic setting Trace MDP X 0 , X 1 , X 2 , . . . ρ = ρ 0 ρ 1 ρ 2 4

  7. ω -regular control of MDPs – basic setting Trace | = ψ MDP X 0 , X 1 , X 2 , . . . ρ = ρ 0 ρ 1 ρ 2 4

  8. ω -regular control of MDPs – basic setting Trace Policy / | = ψ MDP Controller X 0 , X 1 , X 2 , . . . ρ = ρ 0 ρ 1 ρ 2 4

  9. ω -regular control of MDPs – basic setting Trace Policy / | = ψ MDP Controller X 0 , X 1 , X 2 , . . . ρ = ρ 0 ρ 1 ρ 2 4

  10. Simple example: patrolling Motion primitives 0 . 2 0 . 8 Specification GF ( green ) ∧ GF ( red ) ∧ GF ( purple ) P ( ρ | = ψ ) ≥ ( 0 . 8 ) 4 ∧ GF ( blue ) 5

  11. Simple example: patrolling Motion primitives 0 . 2 0 . 8 Specification GF ( green ) ∧ GF ( red ) ∧ GF ( purple ) P ( ρ | = ψ ) ≥ ( 0 . 8 ) 4 ∧ GF ( blue ) 5

  12. Using ω -regular specifications Ideas By assuming that traces are infinitely long , we can abstract from an unknown time until the system goes out of service . 6

  13. Using ω -regular specifications Ideas By assuming that traces are infinitely long , we can abstract from an unknown time until the system goes out of service . Using temporal logic for the specification with operators such as “finally” and “globally”, we do not need to set time bounds for reaching the system goals , which helps with maximizing the probability for a trace to satisfy the specification . 6

  14. Using ω -regular specifications Ideas By assuming that traces are infinitely long , we can abstract from an unknown time until the system goes out of service . Using temporal logic for the specification with operators such as “finally” and “globally”, we do not need to set time bounds for reaching the system goals , which helps with maximizing the probability for a trace to satisfy the specification . ω -regular specifications allow us to specify relatively complex behaviors easily. 6

  15. But do ω -regular specifications always make sense? A thought experiment Assume that a robot has to patrol between two regions (i.e., it needs to visit both regions infinite often) 7

  16. But do ω -regular specifications always make sense? A thought experiment Assume that a robot has to patrol between two regions (i.e., it needs to visit both regions infinite often) At every second, P ( robot breaks ) > 10 − 10 . 7

  17. But do ω -regular specifications always make sense? A thought experiment Assume that a robot has to patrol between two regions (i.e., it needs to visit both regions infinite often) At every second, P ( robot breaks ) > 10 − 10 . What is the maximum probability of satisfying the specification that some control policy can achieve? 7

  18. But do ω -regular specifications always make sense? A thought experiment Assume that a robot has to patrol between two regions (i.e., it needs to visit both regions infinite often) At every second, P ( robot breaks ) > 10 − 10 . What is the maximum probability of satisfying the specification that some control policy can achieve? It’s 0 as the robot will almost surely eventually break down. 7

  19. Main question of the this paper How can we compute policies that work towards the satisfaction of ω -regular specifications even in the case of inevitable non-satisfication? 8

  20. Motivational example problem 9

  21. Solving the problem by intuition A fact 10

  22. Solving the problem by intuition A fact We will all die, and it can happen any moment! 10

  23. Solving the problem by intuition A fact We will all die, and it can happen any moment! Human behavior But that does not keep us from planning for the long term (e.g., getting a PhD)! 10

  24. Solving the problem by intuition A fact We will all die, and it can happen any moment! Human behavior But that does not keep us from planning for the long term (e.g., getting a PhD)! Rationale We normally ignore the risk of catastrophic but very sparse events in decision making 10

  25. Solving the problem by intuition A fact However... We will all die, and it can ... while planning for the long happen any moment! term, humans minimize the risk of catastrophic events. Human behavior But that does not keep us from planning for the long term (e.g., getting a PhD)! Rationale We normally ignore the risk of catastrophic but very sparse events in decision making 10

  26. Solving the problem by intuition A fact However... We will all die, and it can ... while planning for the long happen any moment! term, humans minimize the risk of catastrophic events. Human behavior Example But that does not keep us from planning for the long term Not doing risky driving (e.g., getting a PhD)! Rationale We normally ignore the risk of catastrophic but very sparse events in decision making 10

  27. Solving the problem by intuition A fact However... We will all die, and it can ... while planning for the long happen any moment! term, humans minimize the risk of catastrophic events. Human behavior Example But that does not keep us from planning for the long term Not doing risky driving (e.g., getting a PhD)! So what we want is... Rationale ...a method to compute We normally ignore the risk of risk-averse policies that are at catastrophic but very sparse the same time optimistic that events in decision making the catastrophic event does not happen. 10

  28. Towards optimistic, but risk-averse policies (1) Try 1 Compute policies that after reaching a goal maximize the probability of reaching the respective next goal. 11

  29. Towards optimistic, but risk-averse policies (1) Try 1 Compute policies that after reaching a goal maximize the probability of reaching the respective next goal. Example Goal 1 Goal 2 GF ( goal 1 ) ∧ GF ( goal 2 ) ∧ G ( ¬ crash ) Specification: 10 − 10 (every second) Prob. car breaks: 11

  30. Towards optimistic, but risk-averse policies (2) Try 2 (similar to the work by Svorenova et al., 2013) Compute policies that maximize some value p such that whenever a goal is reached, the probability of reaching the respective next goal is at least p . 12

  31. Towards optimistic, but risk-averse policies (2) Try 2 (similar to the work by Svorenova et al., 2013) Compute policies that maximize some value p such that whenever a goal is reached, the probability of reaching the respective next goal is at least p . The same example as before Goal 1 Goal 2 Specification: GF ( goal 1 ) ∧ GF ( goal 2 ) ∧ G ( ¬ crash ) 10 − 10 (every second) Prob. car breaks: 12

  32. Towards optimistic, but risk-averse policies (3) But what about general ω -regular specifications? Example: ( GF ( red ) ∧ ( ¬ blue U green )) ∨ ( FG ( ¬ blue ) ∧ GF ( yellow )) What are the goals here and how can we compute risk-averse policies? 13

  33. Towards optimistic, but risk-averse policies (3) But what about general ω -regular specifications? Example: ( GF ( red ) ∧ ( ¬ blue U green )) ∨ ( FG ( ¬ blue ) ∧ GF ( yellow )) What are the goals here and how can we compute risk-averse policies? Idea Let the policy declare the goals. Then we can compute a policy together with its declaration. 13

  34. Declaring goals ( FG ( red ) F ( blue ∧ XG ¬ green ) ∨ ∨ G ¬ green Specification GF (( green ∧ ( ¬ blue U red )) ∨ ¬ green 0 green ¬ red Deterministic ¬ blue Parity Automaton blue red 2 1 ∧¬ red red 14

  35. Declaring goals (2) ¬ green 0 green ¬ red ¬ blue blue red 2 1 ∧¬ red red Definition of parity acceptance A parity automaton accepts a trace if the highest color that occurs infinitely often along the automaton’s run for the trace is even . 15

  36. Declaring goals (2) ¬ green 0 green ¬ red ¬ blue blue red 2 1 ∧¬ red red Definition of parity acceptance A parity automaton accepts a trace if the highest color that occurs infinitely often along the automaton’s run for the trace is even . So what are possible goals to be reached? Colors 0 and 2. 15

  37. Declaring goals (3) Main idea We require the system to decrease goal colors at most k times (for some k ∈ N ), and whenever an odd-colored state is visited, the goal color must be higher than the odd color. 16

  38. Declaring goals (3) Main idea We require the system to decrease goal colors at most k times (for some k ∈ N ), and whenever an odd-colored state is visited, the goal color must be higher than the odd color. Effect All infinite traces satisfying this new condition satisfy the original parity objective as well. 16

Recommend


More recommend