Lecture 13 Reachability in MDPs Dr. Dave Parker Department of - PowerPoint PPT Presentation

Probabilistic Model Checking Michaelmas Term 2011 Lecture 13   Reachability in MDPs Dr. Dave Parker Department of Computer Science University of Oxford

Recall - MDPs • Markov decision process: M = (S,s init ,Ste teps,L) • Adversary σ ∈ Adv resolves nondeterminism • σ induces set of paths Path σ (s) and DTMC D σ • D σ yields probability space Pr σ s over Path σ (s) • Prob σ (s, ψ ) = Pr σ s { ω ∈ Path σ (s) | ω ⊨ ψ } • MDP yields minimum/maximum probabilities: p min (s, ψ ) = inf σ∈ Adv Prob σ (s, ψ ) p max (s, ψ ) = sup σ∈ Adv Prob σ (s, ψ ) DP/Probabilistic Model Checking, Michaelmas 2011 2

Probabilistic reachability • Minimum and maximum probability of reaching target set − target set = all states labelled with atomic proposition a p min (s,F a) = inf σ∈ Adv Prob σ (s,F a) p max (s,F a) = sup σ∈ Adv Prob σ (s,F a) • Vectors: p min (F a) and p max (F a) − minimum/maximum probabilities for all states of MDP DP/Probabilistic Model Checking, Michaelmas 2011 3

Overview • Qualitative probabilistic reachability − case where p min >0 or p max >0 • Optimality equation • Memoryless adversaries suffice − finitely many adversaries to consider • Computing reachability probabilities − value iteration (fixed point computation) − linear programming problem − policy iteration DP/Probabilistic Model Checking, Michaelmas 2011 4

Qualitative probabilistic reachability • Consider the problem of determining states for which   p min (s, F a) or p max (s, F a) is zero (or non-zero) − max case: S max=0 = { s ∈ S | p max (s, F a) = 0 } − this is just (non-probabilistic) reachability R := Sat(a) done := false while (done = false) R � = R ∪ { s ∈ S | ∃ (a,µ) ∈ Steps(s) . ∃ s � ∈ R . µ(s � )>0} if (R � =R) then done := true R := R � endwhile return S\R DP/Probabilistic Model Checking, Michaelmas 2011 5

Qualitative probabilistic reachability • Min case: S min=0 = { s ∈ S | p min (s, F a) = 0 } note: quantification R := Sat(a) over all choices done := false while (done = false) R � = R ∪ { s ∈ S | ∀ (a,µ) ∈ Steps(s) . ∃ s � ∈ R . µ(s � )>0} if (R � =R) then done := true R := R � endwhile return S\R DP/Probabilistic Model Checking, Michaelmas 2011 6

Optimality (min) • The values p min (s, F a) are the unique solution of the following equations: % ' 1 if s ∈ Sat(a) ' ' x s = 0 if s ∈ S min = 0 & ' % ) ' ' ' min µ (s') ⋅ x s' | (a, µ ) ∈ Steps (s) otherwise ∑ & * ' ' ' ( + ( s' ∈ S S min=0 optimal solution for state s uses = optimal solution for successors s � { s | p min (s, F a)=0 } • This is an instance of the Bellman equation − (basis of dynamic programming techniques) DP/Probabilistic Model Checking, Michaelmas 2011 7

Optimality (max) • Likewise, the values p max (s, F a) are the unique solution of the following equations: % ' 1 if s ∈ Sat(a) ' ' x s = 0 if s ∈ S max = 0 & ' % ) ' ' ' max µ (s') ⋅ x s' | (a, µ ) ∈ Steps ps(s) otherwise ∑ & * ' ' ' ( + ( s' ∈ S S max=0 = { s | p max (s, F a)=0 } DP/Probabilistic Model Checking, Michaelmas 2011 8

Memoryless adversaries • Memoryless adversaries suffice for probabilistic reachability − i.e. there exist memoryless adversaries σ min & σ max such that: − Prob σ min (s, F a) = p min (s, F a) for all states s ∈ S − Prob σ max (s, F a) = p max (s, F a) for all states s ∈ S • Construct adversaries from optimal solution: & * ( ( σ min (s) = argmin µ (s') ⋅ p min (s',Fa) | (a, µ ) ∈ Steps ps(s) ∑ ' + ( ( ) , s' ∈ S & * ( ( σ max (s) = argmax µ (s') ⋅ p max (s',Fa) | (a, µ ) ∈ Steps (s) ∑ ' + ( ( ) , s' ∈ S DP/Probabilistic Model Checking, Michaelmas 2011 9

Computing reachability probabilities • Several approaches… Preferable   in practice, • 1. Value iteration e.g. in PRISM − approximate with iterative solution method − corresponds to fixed point computation • 2. Reduction to a linear programming (LP) problem − solve with linear optimisation techniques − exact solution using well-known methods better • 3. Policy iteration complexity; good for small − iteration over adversaries examples DP/Probabilistic Model Checking, Michaelmas 2011 10

Method 1 - Value iteration (min) • For minimum probabilities p min (s, F a) it can be shown that: − p min (s, F a) = lim n →∞ x s (n) where: & 1 if s ∈ Sat(a) ( 0 if s ∈ S min = 0 ( ( (n) if s ∈ S ? and n = 0 x s 0 = ' ( & * ( if s ∈ S ? and n > 0 ( (n − 1) ( ( min µ (s') ⋅ x s' | (a, µ ) ∈ Steps ps(s) ∑ ' + ( ( ) , ) s' ∈ S − where: S ? = S \ ( Sat(a) ∪ S min=0 ) • Approximate iterative solution technique − iterations terminated when solution converges sufficiently DP/Probabilistic Model Checking, Michaelmas 2011 11

Method 1 - Value iteration (max) • Value iteration applies to maximum probabilities in the same way… − p max (s, F a) = lim n →∞ x s (n) where: & 1 if s ∈ Sat(a) ( 0 if s ∈ S max = 0 ( ( (n) if s ∈ S ? and n = 0 x s 0 = ' ( & * ( if s ∈ S ? and n > 0 ( (n − 1) ( ( max µ (s') ⋅ x s' | (a, µ ) ∈ Step eps (s) ∑ ' + ( ( ) , ) s' ∈ S − where: S ? = S \ ( Sat(a) ∪ S max=0 ) DP/Probabilistic Model Checking, Michaelmas 2011 12

Example • Minimum/maximum probability of reaching an a-state 0.5 {a} 0.4 s 2 s 1 1 0.1 1 1 1 0.5 s 0 s 3 0.25 0.25 DP/Probabilistic Model Checking, Michaelmas 2011 13

Example - Value iteration (min) Compute: p min (s i , F a) Sat(a) = {s 2 }, S min=0 ={s 3 }, S ? = {s 0 , s 1 } 0.5 Sat(a) {a} [ x 0 (n) ,x 1 (n) ,x 2 (n) ,x 3 (n) ] 0.4 s 1 s 2 n=0: [ 0, 0, 1, 0 ] 1 0.1 n=1: [ min(1·0, 0.25·0+0.25·0+0.5·1), 1 1 1 0.5 0.1·0+0.5·0+0.4·1, 1, 0 ] s 3 s 0 = [ 0, 0.4, 1, 0 ] 0.25 S min=0 n=2: [ min(1·0.4,0.25·0+0.25·0+0.5·1), 0.25 0.1·0+0.5·0.4+0.4·1, 1, 0 ] =[ 0.4, 0.6, 1, 0 ] n=3: … DP/Probabilistic Model Checking, Michaelmas 2011 14

Example - Value iteration (min) [ x 0 (n) ,x 1 (n) ,x 2 (n) ,x 3 (n) ] n=0: [ 0.000000, 0.000000, 1, 0 ] 0.5 n=1: [ 0.000000, 0.400000, 1, 0 ] Sat(a) {a} n=2: [ 0.400000, 0.600000, 1, 0 ] 0.4 s 1 s 2 n=3: [ 0.600000, 0.740000, 1, 0 ] 1 n=4: [ 0.650000, 0.830000, 1, 0 ] 0.1 1 1 n=5: [ 0.662500, 0.880000, 1, 0 ] 1 0.5 n=6: [ 0.665625, 0.906250, 1, 0 ] s 3 s 0 n=7: [ 0.666406, 0.919688, 1, 0 ] 0.25 S min=0 n=8: [ 0.666602, 0.926484, 1, 0 ] 0.25 … p min (F a) n=20: [ 0.666667, 0.933332, 1, 0 ] = n=21: [ 0.666667, 0.933332, 1, 0 ] [ 2/3, 14/15, 1, 0 ] ≈ [ 2/3, 14/15, 1, 0 ] DP/Probabilistic Model Checking, Michaelmas 2011 15

Generating an optimal adversary • Min adversary σ min [ x 0 (n) ,x 1 (n) ,x 2 (n) ,x 3 (n) ] … 0.5 Sat(a) {a} n=20: [ 0.666667, 0.933332, 1, 0 ] 0.4 n=21: [ 0.666667, 0.933332, 1, 0 ] s 1 s 2 ≈ [ 2/3, 14/15, 1, 0 ] 1 0.1 1 1 s 0 : min(1·14/15, 0.5 · 1+0.25 · 0+0.25 · 2/3) 1 0.5 s 3 s 0 =min(14/15, 2/3) 0.25 S min=0 0.25 DP/Probabilistic Model Checking, Michaelmas 2011 16

Generating an optimal adversary • DTMC D σ min [ x 0 (n) ,x 1 (n) ,x 2 (n) ,x 3 (n) ] … 0.5 {a} n=20: [ 0.666667, 0.933332, 1, 0 ] 0.4 n=21: [ 0.666667, 0.933332, 1, 0 ] s 1 s 2 ≈ [ 2/3, 14/15, 1, 0 ] 1 0.1 s 0 : min(1·14/15, 0.5 · 1+0.25 · 0+0.25 · 2/3) 1 0.5 s 3 s 0 =min(14/15, 2/3) 0.25 0.25 DP/Probabilistic Model Checking, Michaelmas 2011 17

Value iteration as a fixed point • Can view value iteration as a fixed point computation over vectors of probabilities y ∈ [0,1] S , e.g. for minimum: $ 1 if s Sat ( a ) ! ∈ ! F( y )(s) 0 if s S min 0 = = ∈ # ! $ ' min µ ( s ' ) y ( s ' ) | (a, µ ) St Steps ( s ) otherwise ∑ ⋅ ∈ ! # & " % s' S " ∈ • Let: − x (0) = 0 (i.e. x (0) (s) = 0 for all s) − x (n+1) = F(x (n) ) • Then: − x (0) ≤ x (1) ≤ x (2) ≤ x (3) ≤ … − p min (F a) = lim n →∞ x (n) DP/Probabilistic Model Checking, Michaelmas 2011 18

Linear programming • Linear programming − optimisation of a linear objective function − subject to linear (in)equality constraints • General form: Many standard solution − n variables: x 1 , x 2 , … ,x n techniques exist, e.g. Simplex, ellipsoid method,   − maximise (or minimise): interior point method • c 1 x 1 +c 2 x 2 +…+c n x n − subject to constraints In matrix/vector form: • a 11 x 1 +a 12 x 2 +…a 1n x n ≤ b 1 Maximise (or minimise) • a 21 x 1 +a 22 x 2 +…a 2n x n ≤ b 2 c·x subject to A·x ≤ b • … • a m1 x 1 +a m2 x 2 +…a mn x n ≤ b m DP/Probabilistic Model Checking, Michaelmas 2011 19

Method 2 - Linear programming problem • Min probabilities p min (s, F a) can be computed as follows: − p min (s, F a) = 1 if s ∈ Sat(a) − p min (s, F a) = 0 if s ∈ S min=0 − values for remaining states in the set S ? = S \ (Sat(a) ∪ S no ) can   be obtained as the unique solution of the following   linear programming problem: maximize x s subject to the constraints : ∑ s ∈ S ? x s ≤ µ (s') ⋅ x s' + µ (s') ∑ ∑ s' ∈ S ? s' ∈ Sat(a) for all s ∈ S ? and for all (a, µ ) ∈ Steps (s) DP/Probabilistic Model Checking, Michaelmas 2011 20

Lecture 13 Reachability in MDPs Dr. Dave Parker Department of - PowerPoint PPT Presentation

Probabilistic Model Checking Michaelmas Term 2011 Lecture 13 Reachability in MDPs Dr. Dave Parker Department of Computer Science University of Oxford Recall - MDPs Markov decision process: M = (S,s init ,Ste teps,L)

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

CS 730/830: Intro AI Solving MDPs MDP Extras Wheeler Ruml (UNH) Lecture 20, CS 730 1 / 23

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik

Internet The value of Internet is in global reachability Reachability comes from

Semantic guidance for unbounded symbolic reachability Martin Suda Max Planck Institute fr

KReach: A Tool for Reachability in Petri Nets Alex Dixon Ranko Lazi The Reachability Problem

Reachability Analysis for Reachability Analysis for Sequential Circuits Sequential Circuits

TIRA: Toolbox for Interval Reachability Analysis Pierre-Jean Meyer , Alex Devonport, Murat Arcak

Interval Reachability Analysis using Second-Order Sensitivity Pierre-Jean Meyer , Murat Arcak

Modeling, Control, and Modeling, Control, and Reachability Analysis of Analysis of Reachability

Lecture 2: Infinite Horizon and Indefinite Horizon MDPs B9140 Dynamic Programming &

Policy Gradients for CVaR-Constrained MDPs Prashanth L.A. INRIA Lille Team SequeL Prashanth

CS 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley

Partially-Observable MDPs RN, Chapter 17.4 17.5 Decision Theoretic Agents Introduction

Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Motivation:

Convex optimization problems (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization V: Linear

Linear Programming Outline Introduction Introduction A diet problem A diet

Bilevel Integer Programming Ted Ralphs 1 Joint work with: Scott DeNegre 1 , Menal Guzelsoy 2 ,

Introduction to Linear Programming Dominik Scheder Products Resources production production

The Central Curve in Linear Programming Cynthia Vinzant, U. Michigan joint work with Jes

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Carolyn Penstein Ros 1 Theoretical framework Psychology-> Sociolinguistics ->

Lecture 13 Reachability in MDPs Dr. Dave Parker Department of - PowerPoint PPT Presentation

Probabilistic Model Checking Michaelmas Term 2011 Lecture 13 Reachability in MDPs Dr. Dave Parker Department of Computer Science University of Oxford Recall - MDPs Markov decision process: M = (S,s init ,Ste teps,L)

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

CS 730/830: Intro AI Solving MDPs MDP Extras Wheeler Ruml (UNH) Lecture 20, CS 730 1 / 23

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik

Internet The value of Internet is in global reachability Reachability comes from

Semantic guidance for unbounded symbolic reachability Martin Suda Max Planck Institute fr

KReach: A Tool for Reachability in Petri Nets Alex Dixon Ranko Lazi The Reachability Problem

Reachability Analysis for Reachability Analysis for Sequential Circuits Sequential Circuits

TIRA: Toolbox for Interval Reachability Analysis Pierre-Jean Meyer , Alex Devonport, Murat Arcak

Interval Reachability Analysis using Second-Order Sensitivity Pierre-Jean Meyer , Murat Arcak

Modeling, Control, and Modeling, Control, and Reachability Analysis of Analysis of Reachability

Lecture 2: Infinite Horizon and Indefinite Horizon MDPs B9140 Dynamic Programming &amp;

Policy Gradients for CVaR-Constrained MDPs Prashanth L.A. INRIA Lille Team SequeL Prashanth

CS 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley

Partially-Observable MDPs RN, Chapter 17.4 17.5 Decision Theoretic Agents Introduction

Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Motivation:

Convex optimization problems (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization V: Linear

Linear Programming Outline Introduction Introduction A diet problem A diet

Bilevel Integer Programming Ted Ralphs 1 Joint work with: Scott DeNegre 1 , Menal Guzelsoy 2 ,

Introduction to Linear Programming Dominik Scheder Products Resources production production

The Central Curve in Linear Programming Cynthia Vinzant, U. Michigan joint work with Jes

Lin inear programming Example Numpy: PageRank scipy.optimize.linprog Example linear

Carolyn Penstein Ros 1 Theoretical framework Psychology-&gt; Sociolinguistics -&gt;

Lecture 2: Infinite Horizon and Indefinite Horizon MDPs B9140 Dynamic Programming &

Carolyn Penstein Ros 1 Theoretical framework Psychology-> Sociolinguistics ->