Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik Wojtczak LICS 2017, Reykjavik 20 June 2017 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 1
Countable MDPs · · · · · · 1 1 1 1 1 − 1 2 2 i · · · · · · 2 2 2 1 1 1 2 2 i 1 bad/odd good/even controlled 1 2 0 . 3 0 . 3 random 1 2 0 . 7 0 . 7 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 2
Countable MDPs · · · · · · 1 1 1 1 1 − 1 2 2 i · · · · · · 2 2 2 1 1 1 2 2 i 1 There is no almost-surely winning strategy. sup Pr σ ( Parity ) = 1 σ All finite-memory strategies lose almost surely. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 2
{ 1 , 2 , 3 } -Parity · · · · · · 1 1 1 1 1 − 1 2 2 i · · · · · · 2 2 2 1 1 1 2 2 i 3 There exists an almost-surely winning strategy. All finite-memory strategies lose almost surely. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 3
Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity { 0 , 1 } -Parity { 1 , 2 } -Parity Safety Reach Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity { 0 , 1 } -Parity { 1 , 2 } -Parity 0 . 5 0 . 5 1 Safety Reach Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity { 0 , 1 } -Parity { 1 , 2 } -Parity 0 . 5 0 . 5 1 Safety Reach Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity optimal MD { 0 , 1 } -Parity { 1 , 2 } -Parity ε -optimal MD Safety Reach ε -optimal MD means: sup Pr σ ( Parity ) = sup Pr σ ( Parity ) MD σ σ Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity · · · · · · 1 1 1 { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity 1 1 − 1 optimal MD 2 2 i · · · · · · 2 2 2 { 0 , 1 } -Parity { 1 , 2 } -Parity 1 ε -optimal MD 1 1 2 2 i 1 Safety Reach ε -optimal MD means: sup Pr σ ( Parity ) = sup Pr σ ( Parity ) MD σ σ Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity · · · · · 1 1 1 { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity 1 1 − 1 optimal MD 2 2 i · · · · · 2 2 2 { 0 , 1 } -Parity { 1 , 2 } -Parity 1 ε -optimal MD 1 1 2 2 i 3 Safety Reach ε -optimal MD means: sup Pr σ ( Parity ) = sup Pr σ ( Parity ) MD σ σ optimal MD means: if ∃ optimal σ , then ∃ optimal σ that is MD Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity · · · · · 1 1 1 { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity 1 1 − 1 optimal MD 2 2 i · · · · · 2 2 2 { 0 , 1 } -Parity { 1 , 2 } -Parity 1 ε -optimal MD 1 1 2 2 i 3 Safety Reach ε -optimal MD means: sup Pr σ ( Parity ) = sup Pr σ ( Parity ) MD σ σ optimal MD means: if ∃ optimal σ , then ∃ optimal σ that is MD Dichotomy between MD and infinite memory; contrast to finite MDPs Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Optimal MD-Strategies Theorem Consider a countable-state MDP with { 0 , 1 , 2 } -parity objective. If there exists an optimal strategy, then there exists an optimal strategy that is MD. “Optimal strategies for { 0 , 1 , 2 } -parity may be chosen MD.” Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 5
Optimal MD-Strategies for Co-Büchi Theorem Almost-surely winning strategies for co-Büchi may be chosen MD. Suppose there is an almost-surely winning strategy σ . Focus on states used by σ . They all have an a.s. winning strategy. Set a more ambitious goal: Safety (= never see 1 again) 7 1 3 · · · 8 2 4 1 0 1 0 1 0 1 1 1 1 8 2 4 0 0 0 1 1 1 Always playing for safety is too greedy. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 6
An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 0 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 0 0. Playing the safest action everywhere is not ok. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 2 3 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 2 3 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. 2. Once we are in dark blue : with prob ≥ 1 2 we stay in blue . Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 2 3 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. 2. Once we are in dark blue : with prob ≥ 1 2 we stay in blue . Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 2 3 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. 2. Once we are in dark blue : with prob ≥ 1 2 we stay in blue . 3. The a.s. winning strategy for 1. gets us in dark blue a.s. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
When MD Suffices For Finitely Branching MDPs { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity optimal MD { 0 , 1 } -Parity { 1 , 2 } -Parity ε -optimal MD Safety Reach Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 8
When MD Suffices For Infinitely Branching MDPs { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity { 0 , 1 } -Parity { 1 , 2 } -Parity D M l a m i t p o D M l a m Safety i t Reach p o ε - Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 8
Context of the Paper Our work: countable MDPs Other work: mostly finite MDPs Our work: maximizing the probability of Parity objectives Other work: maximizing expected (discounted) total/average reward/cost Our work: general countable MDPs Other work: countable MDPs arising from specific models: recursive MDPs nondeterministic probabilistic lossy channel systems VASS-induced MDPs one-counter MDPs controlled queueing systems controlled multitype branching processes · · · Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 9
Conditioning a Markov Chain 1 1 1 3 3 3 0 0 1 1 1 3 3 3 ) P y r t ( i r A a 0 1 P | ¬ | A P ( a r r P i t y = 1 1 ) = ) A ( P r r P ( A ) Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 10
Conditioning a Markov Chain 1 1 1 3 3 3 0 0 1 1 1 3 3 3 ) P y r t ( i r A a 0 1 P | ¬ | A P ( a r r P i t y = 1 1 ) = ) A ( P r r P ( A ) 1 1 1 1 1 2 3 3 3 3 6 3 0 0 0 0 2 1 1 1 3 6 2 2 0 1 1 1 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 10
Countable Markov Chains Infinite Markov chains are very different from finite ones. Gambler’s ruin: 2 / 3 2 / 3 2 / 3 · · · 1 1 / 3 1 / 3 1 / 3 1 / 3 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 11
Countable Markov Chains Infinite Markov chains are very different from finite ones. Gambler’s ruin: 2 / 3 2 / 3 2 / 3 · · · 1 1 / 3 1 / 3 1 / 3 1 / 3 Dependence on exact probabilities Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 11
Recommend
More recommend