Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes Aaron Bohy 1 ere 1 cois Raskin 2 V´ eronique Bruy` Jean-Fran¸ 1 Universit´ 2 Universit´ e de Mons e Libre de Bruxelles SYNT 2014 3rd workshop on Synthesis
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Overview (1/2) Motivations: • Markov decision processes with large state spaces • Explicit enumeration exhausts the memory • Symbolic representations like MTBDDs are useful • No easy use of (MT)BDDs for solving linear systems 1 R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A. Dhama, and O. E. Theel. Symblicit calculation of long-run averages for concurrent probabilistic systems. In QEST , pages 27-36. IEEE Computer Society, 2010. 1 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Overview (1/2) Motivations: • Markov decision processes with large state spaces • Explicit enumeration exhausts the memory • Symbolic representations like MTBDDs are useful • No easy use of (MT)BDDs for solving linear systems Recent contributions of [WBB + 10] 1 : • Symblicit algorithm • Mixes symb olic and exp licit data structures • Expected mean-payoff in Markov decision processes • Using (MT)BDDs 1 R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A. Dhama, and O. E. Theel. Symblicit calculation of long-run averages for concurrent probabilistic systems. In QEST , pages 27-36. IEEE Computer Society, 2010. 1 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Overview (2/2) Our motivations: • Antichains sometimes outperform BDDs (e.g. [WDHR06, DR07]) • Use antichains instead of (MT)BDDs in symblicit algorithms 2 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Overview (2/2) Our motivations: • Antichains sometimes outperform BDDs (e.g. [WDHR06, DR07]) • Use antichains instead of (MT)BDDs in symblicit algorithms Our contributions: • New structure of pseudo-antichain (extension of antichains) • Closed under negation • Monotonic Markov decision processes • Two quantitative settings : • Stochastic shortest path (focus of this talk) • Expected mean-payoff • Two applications : • Automated planning • LTL synthesis Full paper available on ArXiv: abs/1402.1076 2 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Table of contents Definitions Symblicit approach Antichains and pseudo-antichains Monotonic Markov decision processes Applications Conclusion and future work 3 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Table of contents Definitions Symblicit approach Antichains and pseudo-antichains Monotonic Markov decision processes Applications Conclusion and future work 4 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov decision processes (MDPs) • M = ( S , Σ , P ) where: s 0 • S is a finite set of states σ 2 σ 1 5 1 6 2 • Σ is a finite set of actions 1 1 • P : S × Σ → Dist ( S ) is a stochastic transition 6 2 function s 2 s 1 σ 1 σ 1 4 1 1 5 1 5 5 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov decision processes (MDPs) • M = ( S , Σ , P ) where: s 0 • S is a finite set of states σ 2 σ 1 5 1 1 3 6 2 • Σ is a finite set of actions 1 1 • P : S × Σ → Dist ( S ) is a stochastic transition 6 2 function s 2 s 1 • Cost function c : S × Σ → R > 0 σ 1 2 σ 1 1 4 1 1 5 1 5 5 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov decision processes (MDPs) • M = ( S , Σ , P ) where: s 0 • S is a finite set of states σ 2 σ 1 σ 1 5 1 1 3 6 2 • Σ is a finite set of actions 1 1 • P : S × Σ → Dist ( S ) is a stochastic transition 6 2 function s 2 s 1 • Cost function c : S × Σ → R > 0 σ 1 σ 1 2 σ 1 σ 1 1 4 1 1 5 1 • (Memoryless) strategy λ : S → Σ 5 5 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov chains (MCs) s 0 • MDP ( S , Σ , P ) with P : S × Σ → Dist ( S ) 1 + strategy λ : S → Σ 2 ⇒ induced MC ( S , P λ ) with P λ : S → Dist ( S ) 1 2 s 2 s 1 4 1 1 5 1 5 6 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov chains (MCs) s 0 • MDP ( S , Σ , P ) with P : S × Σ → Dist ( S ) 1 + strategy λ : S → Σ 3 2 ⇒ induced MC ( S , P λ ) with P λ : S → Dist ( S ) 1 2 • Cost function c : S × Σ → R > 0 s 2 s 1 + strategy λ : S → Σ 2 1 4 1 ⇒ induced cost function c λ : S → R > 0 1 5 1 5 6 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Expected truncated sum • Let M λ = ( S , P λ ) with cost function c λ • Let G ⊆ S be a set of goal states 7 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Expected truncated sum • Let M λ = ( S , P λ ) with cost function c λ • Let G ⊆ S be a set of goal states • TS G ( ρ = s 0 s 1 s 2 . . . ) = � n − 1 i =0 c λ ( s i ), with n first index s.t. s n ∈ G 7 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Expected truncated sum • Let M λ = ( S , P λ ) with cost function c λ • Let G ⊆ S be a set of goal states • TS G ( ρ = s 0 s 1 s 2 . . . ) = � n − 1 i =0 c λ ( s i ), with n first index s.t. s n ∈ G • E TS G ( s ) = � ρ P λ ( ρ )TS G ( ρ ), with ρ = s 0 s 1 . . . s n s.t. λ s 0 = s , s n ∈ G and s 0 , . . . , s n − 1 �∈ G 7 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Stochastic shortest path (SSP) • Let M = ( S , Σ , P ) with cost function c • Let G ⊆ S be a set of goal states • λ ∗ is optimal if E TS G λ ∗ ( s ) = inf λ ∈ Λ E TS G ( s ) λ 8 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Stochastic shortest path (SSP) • Let M = ( S , Σ , P ) with cost function c • Let G ⊆ S be a set of goal states • λ ∗ is optimal if E TS G λ ∗ ( s ) = inf λ ∈ Λ E TS G ( s ) λ • SSP problem: compute an optimal strategy λ ∗ • Complexity and strategies [BT96]: • Polynomial time via linear programming • Memoryless optimal strategies exist 8 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Table of contents Definitions Symblicit approach Antichains and pseudo-antichains Monotonic Markov decision processes Applications Conclusion and future work 9 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Ingredients • Strategy iteration algorithm [How60, BT96] • Generates a sequence of monotonically improving strategies • 2 phases: • strategy evaluation by solving a linear system • strategy improvement at each state • Stops as soon as no more improvement can be made • Returns the optimal strategy along with its value function 10 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Ingredients • Strategy iteration algorithm [How60, BT96] • Generates a sequence of monotonically improving strategies • 2 phases: • strategy evaluation by solving a linear system • strategy improvement at each state • Stops as soon as no more improvement can be made • Returns the optimal strategy along with its value function • Bisimulation lumping [LS91, Buc94, KS60] • Applies to MCs • Gathers states which behave equivalently • Produces a bisimulation quotient (hopefully) smaller • Interested in the largest bisimulation ∼ L 10 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Symblicit algorithm • Mix of symbolic and explicit data structures Algo 1 Symblicit(MDP M S , Cost function c S , Goal states G S ) 1: n := 0 , λ S n := InitialStrategy( M S , G S ) 2: repeat ( M S λ n , c S λ n ) := InducedMCAndCost( M S , c S , λ S 3: n ) ( M S λ n , ∼ L , c S λ n , ∼ L ) := Lump( M S λ n , c S 4: λ n ) ( M λ n , ∼ L , c λ n , ∼ L ) := Explicit( M S λ n , ∼ L , c S 5: λ n , ∼ L ) 6: v n := SolveLinearSystem( M λ n , ∼ L , c λ n , ∼ L ) v S 7: n := Symbolic( v n ) λ S n +1 := ImproveStrategy( M S , λ S n , v S 8: n ) 9: n := n + 1 10: until λ S n = λ S n − 1 11: return ( λ S n − 1 , v S n − 1 ) Key: S in superscript denotes symbolic representations 11 / 27
Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Table of contents Definitions Symblicit approach Antichains and pseudo-antichains Monotonic Markov decision processes Applications Conclusion and future work 12 / 27
Recommend
More recommend