learning to learn to control
play

(Learning to) Learn to Control Jan K ret nsk y Technical - PowerPoint PPT Presentation

(Learning to) Learn to Control Jan K ret nsk y Technical University of Munich, Germany joint work with P . Ashok, T. Meggendorfer (TUM), T. Br azdil (Masaryk University Brno), K. Chatterjee, M. Chmel k, P . Daca, A.


  1. (Learning to) Learn to Control Jan Kˇ ret´ ınsk´ y Technical University of Munich, Germany joint work with P . Ashok, T. Meggendorfer (TUM), T. Br´ azdil (Masaryk University Brno), K. Chatterjee, M. Chmel´ ık, P . Daca, A. Fellner, T. Henzinger, T. Petrov, V. Toman (IST Austria), V. Forejt, M. Kwiatkowska, M. Ujma (Oxford University) D. Parker (University of Birmingham) Dagstuhl seminar: Computer-Assisted Engineering for Robotics and Autonomous Systems February 14, 2017

  2. Controller synthesis and verification 2/12

  3. Controller synthesis and verification 2/12

  4. Formal methods and machine learning 3/12 Formal methods + precise – scalability issues

  5. Formal methods and machine learning 3/12 Formal methods + precise – scalability issues MEM-OUT

  6. Formal methods and machine learning 3/12 Formal methods + precise – scalability issues

  7. Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use different objectives

  8. Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use

  9. Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use precise computation focus on important stuff

  10. Examples 4/12 ◮ Reinforcement learning for efficient controller synthesis ◮ MDP with functional spec (reachability, LTL) 1 ◮ MDP with performance spec (mean payoff/average reward) 2 ◮ Decision tree learning for efficient controller representation ◮ MDP 3 ◮ Games 4 1 Brazdil, Chatterjee, Chmelik, Forejt, K., Kwiatkowska, Parker, Ujma: Verification of Markov Decision Processes Using Learning Algorithms. ATVA 2014 Daca, Henzinger, K., Petrov: Faster Statistical Model Checking for Unbounded Temporal Properties. TACAS 2016 2 Ashok, Chatterjee, Daca, K., Meggendorfer: Value Iteration for Long-run Average Reward in Markov Decision Processes. Submitted 3 Brazdil, Chatterjee, Chmelik, Fellner, K.: Counterexample Explanation by Learning Small Strategies in Markov Decision Processes. CAV 2015 4 Brazdil, Chatterjee, K., Toman: Strategy Representation by Decision Trees in Reactive Synthesis. Submitted

  11. Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down 0 . 01 0 . 5 0 . 99 c t goal 1

  12. Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down 0 . 01 0 . 5 0 . 99 c t goal 1 controller σ P σ [ � goal ] max

  13. Example: Markov decision processes 5/12 p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c t goal 1 1 controller σ P σ [ � goal ] max

  14. Example: Markov decision processes 5/12 p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c t goal 1 1 controller σ P σ [ � goal ] max

  15. Example: Markov decision processes 5/12 p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c t goal 1 1 controller σ P σ [ � goal ] max

  16. Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c t goal 1 controller σ P σ [ � goal ] max

  17. Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c ACTION = down t goal 1 Y N controller σ P σ [ � goal ] max

  18. Example 1: Computing controllers faster 6/12 1: repeat a for all transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  19. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently 1: repeat a for all transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  20. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  21. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  22. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  23. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat a UpBound ( s −→ ) sample a path from s init ⊲ pick action arg max 2: a a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  24. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat a UpBound ( s −→ ) sample a path from s init ⊲ pick action arg max 2: a a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ faster & sure updates important parts of the system

  25. Example 1: Experimental results 7/12 Visited states Example PRISM with RL zeroconf 4,427,159 977 wlan 5,007,548 1,995 firewire 19,213,802 32,214 mer 26,583,064 1,950

  26. Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

  27. Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

  28. Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree) Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106

  29. Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree) Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106 * MEM-OUT in PRISM, whereas RL yields: 1887 619 13 0.00014

  30. Example 2: Computing small controllers 9/12 precise decisions DT, importance of decisions Importance of a decision in s with respect to � goal and controller σ :

  31. Example 2: Computing small controllers 9/12 precise decisions DT, importance of decisions Importance of a decision in s with respect to � goal and controller σ : P σ [ � s | � goal ]

  32. Some related work 10/12 Further examples on decision trees ◮ Garg, Neider, Madhusudan, Roth: Learning Invariants using Decision Trees and Implication Counterexamples . POPL 2016 ◮ Krishna, Puhrsch, Wies: Learning Invariants Using Decision Trees. Further examples on reinforcement learning ◮ Junges, Jansen, Dehnert, Topcu, Katoen: Safety-Constrained Reinforcement Learning for MDPs. TACAS 2016 ◮ David, Jensen, Larsen, Legay, Lime, Sorensen, Taankvist: On Time with Minimal Expected Cost! ATVA 2014

  33. Summary 11/12 Machine learning in verification ◮ Scalable heuristics ◮ Example 1: Speeding up value iteration ◮ technique : reinforcement learning, BRTDP ◮ idea : focus on updating “most important parts” = most often visited by good strategies ◮ Example 2: Small and readable strategies ◮ technique : decision tree learning ◮ idea : based on the importance of states, feed the decisions to the learning algorithm ◮ Learning in Verification (LiVe) at ETAPS ◮ Explainable Verification (FEVer) at CAV

  34. Discussion 12/12 Verification using machine learning ◮ How far do we want to compromise? ◮ Do we have to compromise? ◮ BRTDP , invariant generation, strategy representation don’t ◮ Don’t we want more than ML? ◮ ( ε -)optimal controllers? ◮ arbitrary controllers – is it still verification? ◮ What do we actually want? ◮ scalability shouldnt overrule guarantees? ◮ when is PAC enough? ◮ Oracle usage seems fine ◮ How much of it can work for examples from robotics?

Recommend


More recommend