(Learning to) Learn to Control Jan K ret nsk y Technical - PowerPoint PPT Presentation

(Learning to) Learn to Control Jan Kˇ ret´ ınsk´ y Technical University of Munich, Germany joint work with P . Ashok, T. Meggendorfer (TUM), T. Br´ azdil (Masaryk University Brno), K. Chatterjee, M. Chmel´ ık, P . Daca, A. Fellner, T. Henzinger, T. Petrov, V. Toman (IST Austria), V. Forejt, M. Kwiatkowska, M. Ujma (Oxford University) D. Parker (University of Birmingham) Dagstuhl seminar: Computer-Assisted Engineering for Robotics and Autonomous Systems February 14, 2017

Controller synthesis and verification 2/12

Formal methods and machine learning 3/12 Formal methods + precise – scalability issues

Formal methods and machine learning 3/12 Formal methods + precise – scalability issues MEM-OUT

Formal methods and machine learning 3/12 Formal methods + precise – scalability issues

Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use different objectives

Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use

Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use precise computation focus on important stuff

Examples 4/12 ◮ Reinforcement learning for efficient controller synthesis ◮ MDP with functional spec (reachability, LTL) 1 ◮ MDP with performance spec (mean payoff/average reward) 2 ◮ Decision tree learning for efficient controller representation ◮ MDP 3 ◮ Games 4 1 Brazdil, Chatterjee, Chmelik, Forejt, K., Kwiatkowska, Parker, Ujma: Verification of Markov Decision Processes Using Learning Algorithms. ATVA 2014 Daca, Henzinger, K., Petrov: Faster Statistical Model Checking for Unbounded Temporal Properties. TACAS 2016 2 Ashok, Chatterjee, Daca, K., Meggendorfer: Value Iteration for Long-run Average Reward in Markov Decision Processes. Submitted 3 Brazdil, Chatterjee, Chmelik, Fellner, K.: Counterexample Explanation by Learning Small Strategies in Markov Decision Processes. CAV 2015 4 Brazdil, Chatterjee, K., Toman: Strategy Representation by Decision Trees in Reactive Synthesis. Submitted

Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down 0 . 01 0 . 5 0 . 99 c t goal 1

Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down 0 . 01 0 . 5 0 . 99 c t goal 1 controller σ P σ [ � goal ] max

Example: Markov decision processes 5/12 p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c t goal 1 1 controller σ P σ [ � goal ] max

Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c t goal 1 controller σ P σ [ � goal ] max

Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c ACTION = down t goal 1 Y N controller σ P σ [ � goal ] max

Example 1: Computing controllers faster 6/12 1: repeat a for all transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently 1: repeat a for all transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat a UpBound ( s −→ ) sample a path from s init ⊲ pick action arg max 2: a a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat a UpBound ( s −→ ) sample a path from s init ⊲ pick action arg max 2: a a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ faster & sure updates important parts of the system

Example 1: Experimental results 7/12 Visited states Example PRISM with RL zeroconf 4,427,159 977 wlan 5,007,548 1,995 firewire 19,213,802 32,214 mer 26,583,064 1,950

Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree) Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106

Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree) Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106 * MEM-OUT in PRISM, whereas RL yields: 1887 619 13 0.00014

Example 2: Computing small controllers 9/12 precise decisions DT, importance of decisions Importance of a decision in s with respect to � goal and controller σ :

Example 2: Computing small controllers 9/12 precise decisions DT, importance of decisions Importance of a decision in s with respect to � goal and controller σ : P σ [ � s | � goal ]

Some related work 10/12 Further examples on decision trees ◮ Garg, Neider, Madhusudan, Roth: Learning Invariants using Decision Trees and Implication Counterexamples . POPL 2016 ◮ Krishna, Puhrsch, Wies: Learning Invariants Using Decision Trees. Further examples on reinforcement learning ◮ Junges, Jansen, Dehnert, Topcu, Katoen: Safety-Constrained Reinforcement Learning for MDPs. TACAS 2016 ◮ David, Jensen, Larsen, Legay, Lime, Sorensen, Taankvist: On Time with Minimal Expected Cost! ATVA 2014

Summary 11/12 Machine learning in verification ◮ Scalable heuristics ◮ Example 1: Speeding up value iteration ◮ technique : reinforcement learning, BRTDP ◮ idea : focus on updating “most important parts” = most often visited by good strategies ◮ Example 2: Small and readable strategies ◮ technique : decision tree learning ◮ idea : based on the importance of states, feed the decisions to the learning algorithm ◮ Learning in Verification (LiVe) at ETAPS ◮ Explainable Verification (FEVer) at CAV

Discussion 12/12 Verification using machine learning ◮ How far do we want to compromise? ◮ Do we have to compromise? ◮ BRTDP , invariant generation, strategy representation don’t ◮ Don’t we want more than ML? ◮ ( ε -)optimal controllers? ◮ arbitrary controllers – is it still verification? ◮ What do we actually want? ◮ scalability shouldnt overrule guarantees? ◮ when is PAC enough? ◮ Oracle usage seems fine ◮ How much of it can work for examples from robotics?

(Learning to) Learn to Control Jan K ret nsk y Technical - PowerPoint PPT Presentation

(Learning to) Learn to Control Jan K ret nsk y Technical University of Munich, Germany joint work with P . Ashok, T. Meggendorfer (TUM), T. Br azdil (Masaryk University Brno), K. Chatterjee, M. Chmel k, P . Daca, A.

You will learn what git is . You will learn how you can use git . You will learn how to learn more

Learn Blackboard Learn Learn with others Learn in your own time, pace, space Learn through

CS 330 Paper Review Learning to learn distributions Why Learn distributions aka learn

Learn to Learn: Facts About Learning Rmi Emonet (@remiemonet) 2016-05-25 Web En Vert Tonight

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Lunch n Learn Lunch n Learn Lunch n Learn Lunch n Learn Understanding Understanding

Objectives learn about Java branching statements learn about loops Flow of Control

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Learn and live 1 Corinthians 10 1 Corinthians 10 1 Corinthians 10 1 Corinthians 10 Learn and

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Industrial Robots Industrial Robots Control Control Part 2 Control Control Part 2 Part 2

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Lecture 30 Ratio, Feed Forward, Cascade Control Process Control Prof. Kannan M. Moudgalya IIT

Access Control and Protection Overview Access control: What and Why Abstract Models of

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Objectives Christensen-Stilman (RIM@GT) 8803-Lecture 1 January 6, 2008 4 / 24 Introduction

SFM-11:CONNECT Summer School, Bertinoro, June 2011 EU-FP7: CONNECT LSCITS/PSS

A Reusable Camera Interface for Rovers Daniel Clouse Jet Propulsion Laboratory California

VTSA10 Summer School, Luxembourg, September 2010 Course overview 2 sessions (Tue/Wed am): 4

Learning to Use Learning in Verification Jan K ret nsk y Technische Universit at

Real-time Harris and Stephen implementation on Smart camera Merwan BIREM Franois BERRY 5-6

!System(on(Chip!Design! Data!Flow!Modeling! (Based!on!slides!at!ECE!522!at!UNM)! Hao$Zheng$

Wavescalar Assembly: Dataflow Winter 2006 CSE 548 - Dataflow Machines 1 Wavescalar Assembly: