learning to use learning in verification
play

Learning to Use Learning in Verification Jan K ret nsk y - PowerPoint PPT Presentation

Learning to Use Learning in Verification Jan K ret nsk y Technische Universit at M unchen, Germany joint work with T. Br azdil (Masaryk University Brno), K. Chatterjee, M. Chmel k, P . Daca, A. Fellner, T. Henzinger, T.


  1. Learning to Use Learning in Verification Jan Kˇ ret´ ınsk´ y Technische Universit¨ at M¨ unchen, Germany joint work with T. Br´ azdil (Masaryk University Brno), K. Chatterjee, M. Chmel´ ık, P . Daca, A. Fellner, T. Henzinger, T. Petrov (IST Austria), V. Forejt, M. Kwiatkowska, M. Ujma (Oxford University) D. Parker (University of Birmingham) published at ATVA 2014, CAV 2015, TACAS 2016 Mysore Park Workshop Trends and Challenges in Quantitative Verification February 3, 2016

  2. Approaches and their interaction 3/16 Formal methods ◮ precise ◮ scalability issues

  3. Approaches and their interaction 3/16 Formal methods ◮ precise ◮ scalability issues MEM-OUT

  4. Approaches and their interaction 3/16 Formal methods ◮ precise ◮ scalability issues

  5. Approaches and their interaction 3/16 Formal methods ACTION = rec ◮ precise N Y ◮ scalability issues l>0&b=1&ip mess=1 -> ACTION = b’=0&z’=0&n1’=min(n1+1,8)&ip mess’=0 Y N z ≤ 0 Y N

  6. Approaches and their interaction 3/16 Formal methods ◮ precise ◮ scalability issues

  7. Approaches and their interaction 3/16 Formal methods Learning ◮ precise ◮ weaker guarantees ◮ scalability issues ◮ scalable different objectives

  8. Approaches and their interaction 3/16 Formal methods Learning ◮ precise ◮ weaker guarantees ◮ scalability issues ◮ scalable

  9. Approaches and their interaction 3/16 Formal methods Learning ◮ precise ◮ weaker guarantees ◮ scalability issues ◮ scalable precise computation focus on important stuff

  10. What problems? 4/16 ◮ Verification ? ◮ ( ε )-optimality → PAC − ? ◮ hard guarantees → probably correct − ◮ Controller synthesis ◮ convergence is preferable ◮ at least probably correct? ◮ Synthesis

  11. Markov decision processes 5/16 ( S , s 0 ∈ S , A , ∆ : S → A → D ( S )) p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down 0 . 01 0 . 5 0 . 99 c err t 1

  12. Markov decision processes 5/16 ( S , s 0 ∈ S , A , ∆ : S → A → D ( S )) p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down 0 . 01 0 . 5 0 . 99 c err t 1 strategy σ P σ [ Reach err ] max

  13. Markov decision processes 5/16 ( S , s 0 ∈ S , A , ∆ : S → A → D ( S )) p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c err t 1 1 strategy σ P σ [ Reach err ] max

  14. Markov decision processes 5/16 ( S , s 0 ∈ S , A , ∆ : S → A → D ( S )) p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c err t 1 1 strategy σ P σ [ Reach err ] max

  15. Markov decision processes 5/16 ( S , s 0 ∈ S , A , ∆ : S → A → D ( S )) p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c err t 1 1 strategy σ P σ [ Reach err ] max

  16. Markov decision processes 5/16 ( S , s 0 ∈ S , A , ∆ : S → A → D ( S )) p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c err t 1 strategy σ P σ [ Reach err ] max

  17. Markov decision processes 5/16 ( S , s 0 ∈ S , A , ∆ : S → A → D ( S )) p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 ACTION = down c err t 1 Y N strategy σ P σ [ Reach err ] max

  18. Ex.1: Computing strategies faster: How? 6/16 Fixed-point computation V ( s ) := max a ∈ ∆( s ) V ( s , a ) � ∆( s , a , s ′ ) · V ( s ′ ) V ( s , a ) := s ′ ∈ S

  19. Ex.1: Computing strategies faster: How? 6/16 Fixed-point computation V ( s ) := max a ∈ ∆( s ) V ( s , a ) � ∆( s , a , s ′ ) · V ( s ′ ) V ( s , a ) := s ′ ∈ S Order of evaluation?

  20. Ex.1: Computing strategies faster: How? 6/16 Fixed-point computation V ( s ) := max a ∈ ∆( s ) V ( s , a ) � ∆( s , a , s ′ ) · V ( s ′ ) V ( s , a ) := s ′ ∈ S Order of evaluation? [ATVA’14] More frequently evaluate those states that are visited more frequently

  21. Ex.1: Computing strategies faster: How? 6/16 Fixed-point computation V ( s ) := max a ∈ ∆( s ) V ( s , a ) � ∆( s , a , s ′ ) · V ( s ′ ) V ( s , a ) := s ′ ∈ S Order of evaluation? [ATVA’14] More frequently evaluate those states that are visited more frequently by reasonably good schedulers

  22. Ex.1: Computing strategies faster: How? 6/16 Reinforcement learning Fixed-point computation V ( s ) := max a ∈ ∆( s ) V ( s , a ) � ∆( s , a , s ′ ) · V ( s ′ ) V ( s , a ) := s ′ ∈ S Order of evaluation? [ATVA’14] More frequently evaluate those states that are visited more frequently by reasonably good schedulers

  23. Ex.1: Computing strategies faster: Algorithm 7/16 1: U ( · , · ) ← 1 , L ( · , · ) ← 0 2: L ( 1 , · ) ← 1 , U ( 0 , · ) ← 0 3: repeat 7: until U ( s 0 ) − L ( s 0 ) < ǫ

  24. Ex.1: Computing strategies faster: Algorithm 7/16 1: U ( · , · ) ← 1 , L ( · , · ) ← 0 2: L ( 1 , · ) ← 1 , U ( 0 , · ) ← 0 3: repeat sample a path from s 0 to { 1 , 0 } 4: U ( s , a ) ⊲ actions uniformly from arg max a ⊲ states according to ∆( s , a , s ′ ) 7: until U ( s 0 ) − L ( s 0 ) < ǫ

  25. Ex.1: Computing strategies faster: Algorithm 7/16 1: U ( · , · ) ← 1 , L ( · , · ) ← 0 2: L ( 1 , · ) ← 1 , U ( 0 , · ) ← 0 3: repeat sample a path from s 0 to { 1 , 0 } 4: U ( s , a ) ⊲ actions uniformly from arg max a ⊲ states according to ∆( s , a , s ′ ) for all visited transitions ( s , a , s ′ ) do 5: U pdate ( s , a , s ′ ) 6: 7: until U ( s 0 ) − L ( s 0 ) < ǫ

  26. Ex.1: Computing strategies faster: Algorithm 7/16 1: U ( · , · ) ← 1 , L ( · , · ) ← 0 2: L ( 1 , · ) ← 1 , U ( 0 , · ) ← 0 3: repeat sample a path from s 0 to { 1 , 0 } 4: U ( s , a ) ⊲ actions uniformly from arg max a ⊲ states according to ∆( s , a , s ′ ) for all visited transitions ( s , a , s ′ ) do 5: U pdate ( s , a , s ′ ) 6: 7: until U ( s 0 ) − L ( s 0 ) < ǫ —————————————————————————— 1: procedure U pdate ( s , a , · ) s ′ ∈ S ∆( s , a , s ′ ) · U ( s ′ ) U ( s , a ) := � 2: s ′ ∈ S ∆( s , a , s ′ ) · L ( s ′ ) L ( s , a ) := � 3:

  27. Ex.1: Computing strategies faster 8/16 important parts of the system Reinforcement Value Learning Iteration faster & sure updates Guaranteed upper & lower bounds at all times + practically fast convergence

  28. Ex.1: Computing strategies faster 8/16 important parts of the system Reinforcement Value Learning Iteration faster & sure updates Guaranteed upper & lower bounds at all times + practically fast convergence Remark: ◮ PAC SMC for MDP and (unbounded) LTL [ATVA’14]: | S | , p min ◮ practical PAC SMC for MC and (unbounded) LTL + mean payoff [TACAS’16]: p min

  29. Ex.1: Experimental results 9/16 Visited states Example PRISM BRTDP 3,001,911 760 zeroconf 4,427,159 977 5,477,150 1411 345,000 2018 wlan 1,295,218 2053 5,007,548 1995 6,719,773 26,508 firewire 13,366,666 25,214 19,213,802 32,214 17,722,564 1950 17,722,564 2902 mer 26,583,064 1950 26,583,064 2900

  30. Further examples on reinforcement learning 10/16 Sebastian Junges, Nils Jansen, Christian Dehnert, Ufuk Topcu, Joost-Pieter Katoen: Safety-Constrained Reinforcement Learning for MDPs. TACAS 2016. ◮ safe and cost-optimizing strategies ◮ (1) compute safe, permissive strategies ◮ (2) learn cost-optimal strategies (convergence) among them Alexandre David, Peter Gjl Jensen, Kim Guldstrand Larsen, Axel Legay, Didier Lime, Mathias Grund Srensen, Jakob Haahr Taankvist: On Time with Minimal Expected Cost! ATVA 2014. ◮ priced timed games → priced timed MDPs ◮ time-bounded cost-optimal (convergence) strategies ◮ (1) Uppaal TiGa for safe strategies ◮ (2) Uppaal SMC and learning for cost-optimal strategies

  31. Ex.2: Computing small strategies: Which decisions? 11/16 Importance of a node s with respect to � target and strategy σ : P σ [ � s ]

  32. Ex.2: Computing small strategies: Which decisions? 11/16 Importance of a node s with respect to � target and strategy σ : P σ [ � s | � target ]

  33. Ex.2: Computing small strategies: Which decisions? 11/16 Importance of a node s with respect to � target and strategy σ : P σ [ � s | � target ] Cut off states with zero importance (unreachable or useless)

  34. Ex.2: Computing small strategies: Which decisions? 11/16 Importance of a node s with respect to � target and strategy σ : P σ [ � s | � target ] Cut off states with zero importance (unreachable or useless) Cut off states with low importance (small error, ε -optimal strategy)

  35. Ex.2: Small strategies: Which representation? 12/16 How to make use of the exact importance?

  36. Ex.2: Small strategies: Which representation? 12/16 How to make use of the exact importance? Learn decisions in s in proportion to importance of s

Recommend


More recommend