Learning Small Strategies Fast Jan K ret nsk y Technical - PowerPoint PPT Presentation

Learning Small Strategies Fast Jan Kˇ ret´ ınsk´ y Technical University of Munich, Germany joint work with P . Ashok, E. Kelmendi, J. Kr¨ amer, T. Meggendorfer, M. Weininger (TUM) T. Br´ azdil (Masaryk University Brno), K. Chatterjee, M. Chmel´ ık, P . Daca, A. Fellner, T. Henzinger, T. Petrov, V. Toman (IST Austria), V. Forejt, M. Kwiatkowska, M. Ujma (Oxford University) D. Parker (University of Birmingham) Logic and Learning The Alan Turing Institute January 12, 2018

Controller synthesis and verification 2/13

Formal methods and machine learning 3/13 Formal methods + precise – scalability issues

Formal methods and machine learning 3/13 Formal methods + precise – scalability issues MEM-OUT

Formal methods and machine learning 3/13 Formal methods + precise – scalability issues

Formal methods and machine learning 3/13 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use different objectives

Formal methods and machine learning 3/13 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use

Formal methods and machine learning 3/13 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use precise computation focus on important stuff

Examples 4/13 ◮ Reinforcement learning for efficient strategy synthesis ◮ MDP with functional spec (reachability, LTL) 1 2 ◮ MDP with performance spec (mean payoff/average reward) 3 4 ◮ Simple stochastic games (reachability) 5 ◮ Decision tree learning for efficient strategy representation ◮ MDP 6 ◮ Games 7 1 Brazdil, Chatterjee, Chmelik, Forejt, K., Kwiatkowska, Parker, Ujma: Verification of Markov Decision Processes Using Learning Algorithms. ATVA 2014 2 Daca, Henzinger, K., Petrov: Faster Statistical Model Checking for Unbounded Temporal Properties. TACAS 2016 3 Ashok, Chatterjee, Daca, K., Meggendorfer: Value Iteration for Long-run Average Reward in Markov Decision Processes. CAV 2017 4 K., Meggendorfer: Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes. ATVA 2017 5 draft 6 Brazdil, Chatterjee, Chmelik, Fellner, K.: Counterexample Explanation by Learning Small Strategies in Markov Decision Processes. CAV 2015 7 Brazdil, Chatterjee, K., Toman: Strategy Representation by Decision Trees in Reactive Synthesis. TACAS 2018

Example: Markov decision processes 5/13 p . . . 1 a 1 up b 0 . 5 · · · v 1 init s down 0 . 01 0 . 5 0 . 99 c goal t 1

Example: Markov decision processes 5/13 p . . . 1 a 1 up b 0 . 5 · · · v 1 init s down 0 . 01 0 . 5 0 . 99 c goal t 1 strategy σ P σ [ � goal ] max

Example: Markov decision processes 5/13 p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 init s down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c goal t 1 1 strategy σ P σ [ � goal ] max

Example: Markov decision processes 5/13 p . . . 1 a 1 up b 0 . 5 · · · v 1 init s down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c goal t 1 strategy σ P σ [ � goal ] max

Example: Markov decision processes 5/13 p . . . 1 a 1 up b 0 . 5 · · · v 1 init s down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c ACTION = down goal t 1 Y N strategy σ P σ [ � goal ] max

Example 1: Computing strategies faster 6/13 1: repeat a for all transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ a 1: procedure U pdate ( s −→ ) s ′ ∈ S ∆( s , a , s ′ ) · UpBound ( s ′ ) UpBound ( s , a ) := � 2: s ′ ∈ S ∆( s , a , s ′ ) · LoBound ( s ′ ) LoBound ( s , a ) := � 3: UpBound ( s ) := max a ∈ A UpBound ( s , a ) 4: LoBound ( s ) := max a ∈ A LoBound ( s , a ) 5:

Example 1: Computing strategies faster 6/13 More frequently update what is visited more frequently 1: repeat a for all transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing strategies faster 6/13 More frequently update what is visited more frequently 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing strategies faster 6/13 More frequently update what is visited more frequently by reasonably good strategies 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing strategies faster 6/13 More frequently update what is visited more frequently by reasonably good strategies 1: repeat a sample a path from s init ⊲ pick action arg max UpBound ( s −→ ) 2: a a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

Example 1: Computing strategies faster 6/13 More frequently update what is visited more frequently by reasonably good strategies 1: repeat a sample a path from s init ⊲ pick action arg max UpBound ( s −→ ) 2: a a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ faster & sure updates important parts of the system

Example 1: Experimental results 7/13 Visited states Example PRISM with RL zeroconf 4,427,159 977 wlan 5,007,548 1,995 firewire 19,213,802 32,214 mer 26,583,064 1,950

Example 2: Computing small strategies 8/13 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

Example 2: Computing small strategies 9/13 precise decisions DT, importance of decisions Cut off states with zero importance (un- reachable or useless) Cut off states with low importance (small error, ε -optimal strategy) How to make use of the exact quantities? Importance of a decision in s with respect to � goal and strategy σ :

Example 2: Computing small strategies 9/13 precise decisions DT, importance of decisions Cut off states with zero importance (un- reachable or useless) Cut off states with low importance (small error, ε -optimal strategy) How to make use of the exact quantities? Importance of a decision in s with respect to � goal and strategy σ : P σ [ � s | � goal ]

Example 2: Experimental results 10/13 Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106

Example 2: Experimental results 10/13 Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106 * MEM-OUT in PRISM, whereas RL yields: 1887 619 13 0.00014

Some related work 11/13 Reinforcement learning in verification ◮ Junges, Jansen, Dehnert, Topcu, Katoen: Safety-Constrained Reinforcement Learning for MDPs. TACAS 2016 ◮ David, Jensen, Larsen, Legay, Lime, Sorensen, Taankvist: On Time with Minimal Expected Cost! ATVA 2014 Strategy representation learning ◮ Neider, Topcu: An Automaton Learning Approach to Solving Safety Games over Infinite Graphs. TACAS 2016 Invariants generation, theorem provers guidance, . . .

Summary 12/13 Machine learning in verification ◮ Scalable heuristics ◮ Example 1: Speeding up value iteration ◮ technique : reinforcement learning, BRTDP ◮ idea : focus on updating “most important parts” = most often visited by good strategies ◮ Example 2: Small and readable strategies ◮ technique : decision tree learning ◮ idea : based on the importance of states, feed the decisions to the learning algorithm ◮ Learning in Verification (LiVe) at ETAPS

Learning Small Strategies Fast Jan K ret nsk y Technical - PowerPoint PPT Presentation

Learning Small Strategies Fast Jan K ret nsk y Technical University of Munich, Germany joint work with P . Ashok, E. Kelmendi, J. Kr amer, T. Meggendorfer, M. Weininger (TUM) T. Br azdil (Masaryk University Brno), K.

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

Office of Small Business Small Business Updates James G. Burrows: Senior Vice President, Office

City of Boston Small Business Plan Small Business Plan Overview State of Small Business in

SMALL CHARITIES THE REALITY Definition of small Micro, small The scope, reach and

Class Structure Last time: Midterm This time: Fast Learning Next time: Fast Learning Lecture 11:

The Education for All The Education for All Fast Track Initiative Fast Track Initiative

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Fast-track listing Fast-track listing process Time to market can be essential benefits of

Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel Stephan Liwicki Roberto Cipolla

IOTA/FAST Collaboration Meeting - Intro Vladimir SHILTSEV, AD/APC IOTA/FAST Workshop and

Uninformed Search strategies AIMA sections 3.4 Uninformed search strategies Uninformed Search

SMALL BUSINESSES IN CAMBRIDGE 2020 | THE STATE OF SMALL BUSINESSES IN CAMBRIDGE BY SMALL

I mplementing California s Small Business Health Exchange John Arensmeyer Small Business

In tro duction to F unctional Programming: Lecture 1 1 In tro duction to F unctional

Purp urpose-Cen entered ed L Lea eader ership Nora L a Lap apitan an Bureau for or Food

Launch & Grow a Successful Simulation Program April 14 th , 2016 Lance Millburg, BBA, CLSSBB

Nov. 18, 2015 2015 Accounting MOVE Project Lightbulb Moments Highlights & Takeaways

Caml Trader: Adventures of a functional programmer on Wall Street Yaron M. Minsky Managing

Introduction to Functional Programming Introduction to Functional Programming Practice Strategy

Project planning Topics covered Software pricing Plan-driven development Project

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

Learning Small Strategies Fast Jan K ret nsk y Technical - PowerPoint PPT Presentation

Learning Small Strategies Fast Jan K ret nsk y Technical University of Munich, Germany joint work with P . Ashok, E. Kelmendi, J. Kr amer, T. Meggendorfer, M. Weininger (TUM) T. Br azdil (Masaryk University Brno), K.

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

Office of Small Business Small Business Updates James G. Burrows: Senior Vice President, Office

City of Boston Small Business Plan Small Business Plan Overview State of Small Business in

SMALL CHARITIES THE REALITY Definition of small Micro, small The scope, reach and

Class Structure Last time: Midterm This time: Fast Learning Next time: Fast Learning Lecture 11:

The Education for All The Education for All Fast Track Initiative Fast Track Initiative

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Fast-track listing Fast-track listing process Time to market can be essential benefits of

Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel Stephan Liwicki Roberto Cipolla

IOTA/FAST Collaboration Meeting - Intro Vladimir SHILTSEV, AD/APC IOTA/FAST Workshop and

Uninformed Search strategies AIMA sections 3.4 Uninformed search strategies Uninformed Search

SMALL BUSINESSES IN CAMBRIDGE 2020 | THE STATE OF SMALL BUSINESSES IN CAMBRIDGE BY SMALL

I mplementing California s Small Business Health Exchange John Arensmeyer Small Business

In tro duction to F unctional Programming: Lecture 1 1 In tro duction to F unctional

Purp urpose-Cen entered ed L Lea eader ership Nora L a Lap apitan an Bureau for or Food

Launch &amp; Grow a Successful Simulation Program April 14 th , 2016 Lance Millburg, BBA, CLSSBB

Nov. 18, 2015 2015 Accounting MOVE Project Lightbulb Moments Highlights &amp; Takeaways

Caml Trader: Adventures of a functional programmer on Wall Street Yaron M. Minsky Managing

Introduction to Functional Programming Introduction to Functional Programming Practice Strategy

Project planning Topics covered Software pricing Plan-driven development Project

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

Launch & Grow a Successful Simulation Program April 14 th , 2016 Lance Millburg, BBA, CLSSBB

Nov. 18, 2015 2015 Accounting MOVE Project Lightbulb Moments Highlights & Takeaways