meet your expectations with guarantees beyond worst case
play

Meet Your Expectations With Guarantees: Beyond Worst-Case Synthesis - PowerPoint PPT Presentation

Meet Your Expectations With Guarantees: Beyond Worst-Case Synthesis in Quantitative Games V. Bruy` ere (UMONS) E. Filiot (ULB) M. Randour (UMONS-ULB) J.-F. Raskin (ULB) Paris - 24.01.2014 GDR IM GT Jeux: Annual Meeting Context BWC


  1. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Example: going to work � Weights = minutes � Goal: minimize our expected home time to reach “work” car train 1 � But , important meeting in 2 back home bicycle one hour! Requires strict departs 1 station traffic 45 2 9 35 guarantees on the worst-case 10 light 10 heavy 1 20 7 1 reaching time. 10 10 10 70 delay wait medium 1 4 30 waiting work room Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 12 / 26

  2. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Example: going to work � Optimal expectation strategy: take the car. home E = 33, WC = 71 > 60. car train � Optimal worst-case strategy: 1 2 bicycle. back home bicycle departs E = WC = 45 < 60. 1 station traffic 45 2 9 35 10 light 10 heavy 1 20 7 1 10 10 10 70 delay wait medium 1 4 30 waiting work room Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 12 / 26

  3. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Example: going to work � Optimal expectation strategy: take the car. home E = 33, WC = 71 > 60. car train � Optimal worst-case strategy: 1 2 bicycle. back home bicycle departs E = WC = 45 < 60. 1 station traffic 45 2 9 35 10 light 10 heavy 1 20 7 1 � Sample BWC strategy : try 10 10 10 70 delay wait medium train up to 3 delays then 1 4 30 switch to bicycle. E ≈ 37 . 56, WC = 59 < 60. waiting work Optimal E under WC room constraint Uses finite memory Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 12 / 26

  4. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Beyond worst-case synthesis Formal definition Given a game G = ( G , S 1 , S 2 ), with G = ( S , E , w ) its underlying graph, an initial state s init ∈ S , a finite-memory stochastic model λ stoch ∈ Λ F 2 of the 2 adversary, represented by a stochastic Moore machine, a measurable value function f : Plays( G ) → R ∪ {−∞ , ∞} , and two rational thresholds µ, ν ∈ Q , the beyond worst-case (BWC) problem asks to decide if P 1 has a finite-memory strategy λ 1 ∈ Λ F 1 such that � ∀ λ 2 ∈ Λ 2 , ∀ π ∈ Outs G ( s init , λ 1 , λ 2 ) , f ( π ) > µ (1) G [ λ 1 ,λ stoch ] E 2 ( f ) > ν (2) s init and the BWC synthesis problem asks to synthesize such a strategy if one exists. Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 13 / 26

  5. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Beyond worst-case synthesis Formal definition Given a game G = ( G , S 1 , S 2 ), with G = ( S , E , w ) its underlying graph, an initial state s init ∈ S , a finite-memory stochastic model λ stoch ∈ Λ F 2 of the 2 adversary, represented by a stochastic Moore machine, a measurable value function f : Plays( G ) → R ∪ {−∞ , ∞} , and two rational thresholds µ, ν ∈ Q , the beyond worst-case (BWC) problem asks to decide if P 1 has a finite-memory strategy λ 1 ∈ Λ F 1 such that � ∀ λ 2 ∈ Λ 2 , ∀ π ∈ Outs G ( s init , λ 1 , λ 2 ) , f ( π ) > µ (1) G [ λ 1 ,λ stoch ] E 2 ( f ) > ν (2) s init and the BWC synthesis problem asks to synthesize such a strategy if one exists. Notice the highlighted parts! Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 13 / 26

  6. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Related work Common philosophy: avoiding outlier outcomes 1 Our strategies are strongly risk averse � avoid risk at all costs and optimize among safe strategies Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 14 / 26

  7. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Related work Common philosophy: avoiding outlier outcomes 1 Our strategies are strongly risk averse � avoid risk at all costs and optimize among safe strategies 2 Other notions of risk ensure low probability of risked behavior [WL99, FKR95] � without worst-case guarantee � without good expectation Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 14 / 26

  8. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Related work Common philosophy: avoiding outlier outcomes 1 Our strategies are strongly risk averse � avoid risk at all costs and optimize among safe strategies 2 Other notions of risk ensure low probability of risked behavior [WL99, FKR95] � without worst-case guarantee � without good expectation 3 Trade-off between expectation and variance [BCFK13, MT11] � statistical measure of the stability of the performance � no strict guarantee on individual outcomes Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 14 / 26

  9. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion 1 Context 2 BWC Synthesis 3 Mean-Payoff 4 Shortest Path 5 Conclusion Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 15 / 26

  10. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Mean-payoff value function i = n − 1 � � 1 � � � MP( π ) = lim inf n · w ( s i , s i +1 ) n →∞ i =0 Sample play π = 2 , − 1 , − 4 , 5 , (2 , 2 , 5) ω � MP( π ) = 3 � long-run average weight � prefix-independent Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 16 / 26

  11. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Mean-payoff value function i = n − 1 � � 1 � � � MP( π ) = lim inf n · w ( s i , s i +1 ) n →∞ i =0 Sample play π = 2 , − 1 , − 4 , 5 , (2 , 2 , 5) ω � MP( π ) = 3 � long-run average weight � prefix-independent worst-case expected value BWC complexity NP ∩ coNP P NP ∩ coNP memory memoryless memoryless pseudo-polynomial � [LL69, EM79, ZP96, Jur98, GS09, Put94, FV97] � Additional modeling power for free ! Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 16 / 26

  12. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Philosophy of the algorithm � Classical worst-case and expected value results and algorithms as nuts and bolts � Screw them together in an adequate way Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 17 / 26

  13. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Philosophy of the algorithm � Classical worst-case and expected value results and algorithms as nuts and bolts � Screw them together in an adequate way Three key ideas 1 To characterize the expected value, look at end-components (ECs) 2 Winning ECs vs. losing ECs : the latter must be avoided to preserve the worst-case requirement! 3 Inside a WEC , we have an interesting way to play. . . Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 17 / 26

  14. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Philosophy of the algorithm � Classical worst-case and expected value results and algorithms as nuts and bolts � Screw them together in an adequate way Three key ideas 1 To characterize the expected value, look at end-components (ECs) 2 Winning ECs vs. losing ECs : the latter must be avoided to preserve the worst-case requirement! 3 Inside a WEC , we have an interesting way to play. . . = ⇒ Let’s focus on an ideal case Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 17 / 26

  15. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion An ideal situation − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 18 / 26

  16. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion An ideal situation − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Game interpretation � Worst-case threshold is µ = 0 � All states are winning: memoryless optimal worst-case ( G ), ensuring µ ∗ = 1 > 0 strategy λ wc 1 ∈ Λ PM 1 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 18 / 26

  17. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion An ideal situation − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Game interpretation � Worst-case threshold is µ = 0 � All states are winning: memoryless optimal worst-case ( G ), ensuring µ ∗ = 1 > 0 strategy λ wc 1 ∈ Λ PM 1 MDP interpretation � Memoryless optimal expected value strategy λ e 1 ∈ Λ PM ( P ) 1 achieves ν ∗ = 2 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 18 / 26

  18. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion A cornerstone of our approach − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 BWC problem: what kind of threholds (0 , ν ) can we achieve? Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 18 / 26

  19. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion A cornerstone of our approach − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 BWC problem: what kind of threholds (0 , ν ) can we achieve? Key result For all ε > 0, there exists a finite-memory strategy of P 1 that satisfies the BWC problem for the thresholds pair (0 , ν ∗ − ε ). � We can be arbitrarily close to the optimal expectation while ensuring the worst-case! Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 18 / 26

  20. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Combined strategy − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Outcomes of the form � > 0 � > 0 � ≤ 0 � > 0 � ≤ 0 compensate compensate WC > 0 K steps L steps E =?? Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 18 / 26

  21. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Combined strategy − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Outcomes of the form � > 0 � > 0 � ≤ 0 � > 0 � ≤ 0 compensate compensate WC > 0 K steps L steps E =?? What we want K , L → ∞ E = ν ∗ = 2 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 18 / 26

  22. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Combined strategy: crux of the proof Precise reasoning on convergence rates using involved techniques When K grows, L needs to grow linearly to ensure WC Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 19 / 26

  23. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Combined strategy: crux of the proof Precise reasoning on convergence rates using involved techniques When K grows, L needs to grow linearly to ensure WC When K grows, P ( ) → 0 and it decreases exponentially fast � application of Chernoff bounds and Hoeffding’s inequality for Markov chains [Tra09, GO02] Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 19 / 26

  24. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Combined strategy: crux of the proof Precise reasoning on convergence rates using involved techniques When K grows, L needs to grow linearly to ensure WC When K grows, P ( ) → 0 and it decreases exponentially fast � application of Chernoff bounds and Hoeffding’s inequality for Markov chains [Tra09, GO02] Overall we are good: WC > 0 and E > ν ∗ − ε for sufficiently large K , L . Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 19 / 26

  25. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion 1 Context 2 BWC Synthesis 3 Mean-Payoff 4 Shortest Path 5 Conclusion Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 20 / 26

  26. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Shortest path - truncated sum Assume strictly positive integer weights, w : E → N 0 Let T ⊆ S be a target set that P 1 wants to reach with a path of bounded value (cf. introductory example) � inequalities are reversed, ν < µ TS T ( π = s 0 s 1 s 2 . . . ) = � n − 1 i =0 w (( s i , s i +1 )), with n the first index such that s n ∈ T , and TS T ( π ) = ∞ if ∀ n , s n �∈ T Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 21 / 26

  27. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Shortest path - truncated sum Assume strictly positive integer weights, w : E → N 0 Let T ⊆ S be a target set that P 1 wants to reach with a path of bounded value (cf. introductory example) � inequalities are reversed, ν < µ TS T ( π = s 0 s 1 s 2 . . . ) = � n − 1 i =0 w (( s i , s i +1 )), with n the first index such that s n ∈ T , and TS T ( π ) = ∞ if ∀ n , s n �∈ T worst-case expected value BWC complexity P P pseudo-poly. / NP-hard memory memoryless memoryless pseudo-poly. � [BT91, dA99] � Problem inherently harder than worst-case and expectation. � NP-hardness by K th largest subset problem [JK78, GJ79] Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 21 / 26

  28. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Key difference with MP case Useful observation The set of all worst-case winning strategies for the shortest path can be represented through a finite game. Sequential approach solving the BWC problem: 1 represent all WC winning strategies, 2 optimize the expected value within those strategies. Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 22 / 26

  29. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 Start from G = ( G , S 1 , S 2 ), G = ( S , E , w ), T = { s 3 } , M ( λ stoch ), µ = 8, and ν ∈ Q 2 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  30. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 Start from G = ( G , S 1 , S 2 ), G = ( S , E , w ), T = { s 3 } , M ( λ stoch ), µ = 8, and ν ∈ Q 2 2 Build G ′ by unfolding G , tracking the current sum up to the worst-case threshold µ , and integrating it in the states of G ′ . Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  31. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 s 1 , 0 s 2 , 1 5 s 3 , 5 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  32. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 1 1 2 s 1 , 0 s 2 , 1 s 1 , 2 1 2 5 1 s 3 , 5 s 3 , 2 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  33. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 1 1 1 2 s 1 , 0 s 2 , 1 s 1 , 2 s 2 , 3 1 2 5 1 5 s 3 , 5 s 3 , 2 s 3 , 7 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  34. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 1 1 1 1 1 2 2 s 1 , 0 s 2 , 1 s 1 , 2 s 2 , 3 s 1 , 4 1 1 2 2 5 1 5 1 s 3 , 5 s 3 , 2 s 3 , 7 s 3 , 4 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  35. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 1 1 1 1 1 1 2 2 s 1 , 0 s 2 , 1 s 1 , 2 s 2 , 3 s 1 , 4 s 2 , 5 1 1 2 2 5 1 5 1 5 s 3 , 5 s 3 , 2 s 3 , 7 s 3 , 4 s 3 , ⊤ Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  36. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 1 1 1 1 1 1 1 1 2 2 2 s 1 , 0 s 2 , 1 s 1 , 2 s 2 , 3 s 1 , 4 s 2 , 5 s 1 , 6 1 1 1 2 2 2 5 1 5 1 5 1 s 3 , 5 s 3 , 2 s 3 , 7 s 3 , 4 s 3 , ⊤ s 3 , 6 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  37. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 1 1 2 s 1 s 2 1 1 2 5 1 s 3 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 s 1 , 0 s 2 , 1 s 1 , 2 s 2 , 3 s 1 , 4 s 2 , 5 s 1 , 6 s 2 , 7 s 1 , ⊤ 1 1 1 1 2 2 2 2 5 1 5 1 5 1 s 3 , 5 s 3 , 2 s 3 , 7 s 3 , 4 s 3 , ⊤ s 3 , 6 5 1 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  38. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 3 Compute R , the attractor of T with cost < µ = 8 4 Consider G µ = G ′ ⇂ R 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 s 1 , 0 s 2 , 1 s 1 , 2 s 2 , 3 s 1 , 4 s 2 , 5 s 1 , 6 s 2 , 7 s 1 , ⊤ 1 1 1 1 2 2 2 2 5 1 5 1 5 1 s 3 , 5 s 3 , 2 s 3 , 7 s 3 , 4 s 3 , ⊤ s 3 , 6 5 1 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  39. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 3 Compute R , the attractor of T with cost < µ = 8 4 Consider G µ = G ′ ⇂ R 1 1 1 2 s 1 , 0 s 2 , 1 s 1 , 2 1 2 5 1 5 s 3 , 5 s 3 , 2 s 3 , 7 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  40. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Pseudo-polynomial algorithm: sketch 5 Consider P = G µ ⊗ M ( λ stoch ) 2 6 Compute memoryless optimal expectation strategy 7 If ν ∗ < ν , answer Yes , otherwise answer No 1 1 1 2 s 1 , 0 s 2 , 1 s 1 , 2 Here, ν ∗ = 9 / 2 1 2 5 1 5 s 3 , 5 s 3 , 2 s 3 , 7 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 23 / 26

  41. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion 1 Context 2 BWC Synthesis 3 Mean-Payoff 4 Shortest Path 5 Conclusion Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 24 / 26

  42. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion In a nutshell BWC framework combines worst-case and expected value requirements � a natural wish in many practical applications � few existing theoretical support Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 25 / 26

  43. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion In a nutshell BWC framework combines worst-case and expected value requirements � a natural wish in many practical applications � few existing theoretical support Mean-payoff: additional modeling power for no complexity cost (decision-wise) Shortest path: harder than the worst-case, pseudo-polynomial with NP-hardness result Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 25 / 26

  44. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion In a nutshell BWC framework combines worst-case and expected value requirements � a natural wish in many practical applications � few existing theoretical support Mean-payoff: additional modeling power for no complexity cost (decision-wise) Shortest path: harder than the worst-case, pseudo-polynomial with NP-hardness result In both cases, pseudo-polynomial memory is both sufficient and necessary � but strategies have natural representations based on states of the game and simple integer counters Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 25 / 26

  45. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Beyond BWC synthesis? Possible future works include study of other quantitative objectives, extension of our results to more general settings (multi-dimension [CDHR10, CRR12], decidable classes of games with imperfect information [DDG + 10], etc), application of the BWC problem to various practical cases. Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 26 / 26

  46. Context BWC Synthesis Mean-Payoff Shortest Path Conclusion Beyond BWC synthesis? Possible future works include study of other quantitative objectives, extension of our results to more general settings (multi-dimension [CDHR10, CRR12], decidable classes of games with imperfect information [DDG + 10], etc), application of the BWC problem to various practical cases. Thanks! Do not hesitate to discuss with us! Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 26 / 26

  47. References I T. Br´ azdil, K. Chatterjee, V. Forejt, and A. Kucera. Trading performance for stability in Markov decision processes. In Proc. of LICS, pages 331–340. IEEE Computer Society, 2013. V. Bruy` ere, E. Filiot, M. Randour, and J.-F. Raskin. Meet your expectations with guarantees: beyond worst-case synthesis in quantitative games. In Proc. of STACS, LIPIcs. Schloss Dagstuhl - LZI, 2014. D.P. Bertsekas and J.N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16:580–595, 1991. K. Chatterjee, L. Doyen, T.A. Henzinger, and J.-F. Raskin. Generalized mean-payoff and energy games. In Proc. of FSTTCS, LIPIcs 8, pages 505–516. Schloss Dagstuhl - LZI, 2010. K. Chatterjee, L. Doyen, M. Randour, and J.-F. Raskin. Looking at mean-payoff and total-payoff through windows. In Proc. of ATVA, LNCS 8172, pages 118–132. Springer, 2013. K. Chatterjee and M. Henzinger. An O ( n 2 ) time algorithm for alternating B¨ uchi games. In Proc. of SODA, pages 1386–1399. SIAM, 2012. K. Chatterjee, M. Randour, and J.-F. Raskin. Strategy synthesis for multi-dimensional quantitative objectives. In Proc. of CONCUR, LNCS 7454, pages 115–131. Springer, 2012. Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 27 / 26

  48. References II C. Courcoubetis and M. Yannakakis. The complexity of probabilistic verification. J. ACM, 42(4):857–907, 1995. L. de Alfaro. Formal verification of probabilistic systems. PhD thesis, Stanford University, 1997. L. de Alfaro. Computing minimum and maximum reachability times in probabilistic systems. In Proc. of CONCUR, LNCS 1664, pages 66–81. Springer, 1999. A. Degorre, L. Doyen, R. Gentilini, J.-F. Raskin, and S. Torunczyk. Energy and mean-payoff games with imperfect information. In Proc. of CSL, LNCS 6247, pages 260–274. Springer, 2010. A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoff games. Int. Journal of Game Theory, 8(2):109–113, 1979. J.A. Filar, D. Krass, and K.W. Ross. Percentile performance criteria for limiting average Markov decision processes. Transactions on Automatic Control, pages 2–10, 1995. J. Filar and K. Vrieze. Competitive Markov decision processes. Springer, 1997. Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 28 / 26

  49. References III M.R. Garey and D.S. Johnson. Computers and intractability: a guide to the Theory of NP-Completeness. Freeman New York, 1979. P.W. Glynn and D. Ormoneit. Hoeffding’s inequality for uniformly ergodic Markov chains. Statistics & Probability Letters, 56(2):143–146, 2002. T. Gawlitza and H. Seidl. Games through nested fixpoints. In Proc. of CAV, LNCS 5643, pages 291–305. Springer, 2009. D.B. Johnson and S.D. Kashdan. Lower bounds for selection in X + Y and other multisets. Journal of the ACM, 25(4):556–570, 1978. M. Jurdzi´ nski. Deciding the winner in parity games is in UP ∩ co-UP. Inf. Process. Lett., 68(3):119–124, 1998. T.M. Liggett and S.A. Lippman. Stochastic games with perfect information and time average payoff. Siam Review, 11(4):604–607, 1969. S. Mannor and J.N. Tsitsiklis. Mean-variance optimization in Markov decision processes. In Proc. of ICML, pages 177–184. Omnipress, 2011. Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 29 / 26

  50. References IV M.L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, 1994. M. Tracol. Fast convergence to state-action frequency polytopes for MDPs. Oper. Res. Lett., 37(2):123–126, 2009. C. Wu and Y. Lin. Minimizing risk models in Markov decision processes with policies depending on target values. Journal of Mathematical Analysis and Applications, 231(1):47–67, 1999. U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158:343–359, 1996. Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 30 / 26

  51. An ideal situation − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 31 / 26

  52. An ideal situation − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Game interpretation � Worst-case threshold is µ = 0 � All states are winning: memoryless optimal worst-case ( G ), ensuring µ ∗ = 1 > 0 strategy λ wc 1 ∈ Λ PM 1 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 31 / 26

  53. An ideal situation − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 MDP interpretation � All states are reachable with probability one (even surely) � The highest achievable expected value is the same in all states: ν ∗ = 2 � Memoryless optimal expected value strategy λ e 1 ∈ Λ PM ( P ) 1 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 31 / 26

  54. A cornerstone of our approach − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 BWC problem: what kind of threholds (0 , ν ) can we achieve? Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 31 / 26

  55. A cornerstone of our approach − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 BWC problem: what kind of threholds (0 , ν ) can we achieve? Key result For all ε > 0, there exists a finite-memory strategy of P 1 that satisfies the BWC problem for the thresholds pair (0 , ν ∗ − ε ). � We can be arbitrarily close to the optimal expectation while ensuring the worst-case! Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 31 / 26

  56. Combined strategy − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 We define λ cmb ∈ Λ PF as follows, for some well-chosen K , L ∈ N . 1 1 (a) Play λ e 1 for K steps and memorize Sum ∈ Z , the sum of weights encountered during these K steps. (b) If Sum > 0, then go to (a) . Else, play λ wc during L steps then go to (a) . 1 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 31 / 26

  57. Combined strategy − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Intuitions � Phase (a) : try to increase the expectation and approach the optimal one � Phase (b) : compensate, if needed, losses that occured in (a) Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 31 / 26

  58. Combined strategy − 1 1 1 2 s 7 s 6 s 5 0 1 2 1 9 Intuitions � Phase (a) : try to increase the expectation and approach the optimal one � Phase (b) : compensate, if needed, losses that occured in (a) Proving the strategy is up to the job requires some technical work, but let’s review the key ideas � ∃ K , L ∈ N for any thresholds pair (0 , ν ∗ − ε ) � plays = sequences of periods starting with phase (a) Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 31 / 26

  59. Combined strategy: worst-case requirement Does any consistent outcome have a strictly positive MP? ∀ K , ∃ L ( K ), linear in K , s.t. (a) + (b) has MP ≥ 1 / ( K + L ) > 0 because µ ∗ = 1 > µ = 0 Periods (a) induce MP ≥ 1 / K (not followed by (b) ) Weights are integers and period length bounded � inequality remains strict for play Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 32 / 26

  60. Combined strategy: expected value requirement Can we ensure an ε -optimal expected value? When K → ∞ , E (a) → ν ∗ Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 33 / 26

  61. Combined strategy: expected value requirement Can we ensure an ε -optimal expected value? When K → ∞ , E (a) → ν ∗ As K → ∞ , we have L ( K ) → ∞ (potentially bigger losses to compensate), which may prevent E (a) + (b) → ν ∗ Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 33 / 26

  62. Combined strategy: expected value requirement Can we ensure an ε -optimal expected value? When K → ∞ , E (a) → ν ∗ As K → ∞ , we have L ( K ) → ∞ (potentially bigger losses to compensate), which may prevent E (a) + (b) → ν ∗ But as K → ∞ , we also have P (b) → 0: losses after period (a) are less probable � Intuition through a Bernouilli process Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 33 / 26

  63. Bernouilli process Assume our phase (a) is a simple fair coin tossing sequence with heads granting 1 and tails granting 0 � The expected MP is 1 / 2 whatever the # of tosses � Let ε = 1 / 6, what is the probability to witness an MP > 1 / 2 − 1 / 6 = 1 / 3 after K tosses? Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 34 / 26

  64. Bernouilli process Assume our phase (a) is a simple fair coin tossing sequence with heads granting 1 and tails granting 0 � The expected MP is 1 / 2 whatever the # of tosses � Let ε = 1 / 6, what is the probability to witness an MP > 1 / 2 − 1 / 6 = 1 / 3 after K tosses? K = 1 ⇒ P (MP > 1 / 3) = 1 / 2 1 1 0 0 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 34 / 26

  65. Bernouilli process Assume our phase (a) is a simple fair coin tossing sequence with heads granting 1 and tails granting 0 � The expected MP is 1 / 2 whatever the # of tosses � Let ε = 1 / 6, what is the probability to witness an MP > 1 / 2 − 1 / 6 = 1 / 3 after K tosses? 1 1 K = 1 ⇒ P (MP > 1 / 3) = 1 / 2 1 1 0 1/2 K = 2 ⇒ P (MP > 1 / 3) = 3 / 4 1/2 0 1 0 0 0 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 34 / 26

  66. Bernouilli process Assume our phase (a) is a simple fair coin tossing sequence with heads granting 1 and tails granting 0 � The expected MP is 1 / 2 whatever the # of tosses � Let ε = 1 / 6, what is the probability to witness an MP > 1 / 2 − 1 / 6 = 1 / 3 after K tosses? 1 1 K = 1 ⇒ P (MP > 1 / 3) = 1 / 2 1 1 0 1/2 K = 2 ⇒ P (MP > 1 / 3) = 3 / 4 . . . 1/2 0 1 for any ε > 0, when K → ∞ , it 0 tends to one 0 0 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 34 / 26

  67. Bounding the gap One can lower bound the measure of paths such that MP > ν ∗ − ε for a sufficiently large K Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 35 / 26

  68. Bounding the gap One can lower bound the measure of paths such that MP > ν ∗ − ε for a sufficiently large K Using Chernoff bounds and Hoeffding’s inequality for Markov chains [Tra09, GO02], we can bound the probability of being far from the optimal after K steps of (a) in our combined strategy Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 35 / 26

  69. Bounding the gap One can lower bound the measure of paths such that MP > ν ∗ − ε for a sufficiently large K Using Chernoff bounds and Hoeffding’s inequality for Markov chains [Tra09, GO02], we can bound the probability of being far from the optimal after K steps of (a) in our combined strategy � P (b) decreases exponentially while L ( K ) only needs to increase polynomially � The overall contribution of (b) tends to zero when K → ∞ � Hence E (a) + (b) → ν ∗ as claimed Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 35 / 26

  70. The ideal case: wrap-up The combined strategy works in any subgame such that 1 it constitutes an EC in the MDP, 2 all states are worst-case winning in the subgame. Such winning ECs (WECs) are the crux of BWC strategies in arbitrary games. But to explain that, let’s first zoom out and consider the big picture. Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 36 / 26

  71. Zooming out − 1 1 1 2 0 0 s 7 s 6 s 5 s 1 s 2 1 0 1 0 − 1 2 1 9 0 s 3 s 4 1 1 0 2 2 Arbitrary game, with ideal case as a subgame. We assume all states are worst-case winning. � BWC strategies must avoid WC losing states at all times: an antagonistic adversary can force WC losing outcomes from there (due to prefix-independence) � Some preprocessing can be done and in the remaining game, P 1 has a memoryless WC winning strategy from all states Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 37 / 26

  72. End-components: what they are − 1 1 1 2 0 0 s 7 s 6 s 5 s 1 s 2 1 0 1 0 − 1 2 1 9 0 s 3 s 4 1 1 0 2 2 An EC of the MDP P = G [ λ stoch ] is a subgraph in which P 1 can 2 ensure to stay despite stochastic states [dA97], i.e., a set U ⊆ S s.t. (i) ( U , E ∩ ( U × U )) is strongly connected, (ii) ∀ s ∈ U ∩ S ∆ , Supp(∆( s )) ⊆ U , i.e., in stochastic states, all outgoing edges stay in U . Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 37 / 26

  73. End-components: what they are − 1 1 1 U 1 2 0 0 s 7 s 6 s 5 s 1 s 2 1 0 1 0 − 1 2 1 9 0 s 3 s 4 1 1 0 2 2 An EC of the MDP P = G [ λ stoch ] is a subgraph in which P 1 can 2 ensure to stay despite stochastic states [dA97], i.e., a set U ⊆ S s.t. (i) ( U , E ∩ ( U × U )) is strongly connected, (ii) ∀ s ∈ U ∩ S ∆ , Supp(∆( s )) ⊆ U , i.e., in stochastic states, all outgoing edges stay in U . � ECs: E = { U 1 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 37 / 26

  74. End-components: what they are − 1 1 1 U 2 U 1 2 0 0 s 7 s 6 s 5 s 1 s 2 1 0 1 0 − 1 2 9 1 0 s 3 s 4 1 1 0 2 2 An EC of the MDP P = G [ λ stoch ] is a subgraph in which P 1 can 2 ensure to stay despite stochastic states [dA97], i.e., a set U ⊆ S s.t. (i) ( U , E ∩ ( U × U )) is strongly connected, (ii) ∀ s ∈ U ∩ S ∆ , Supp(∆( s )) ⊆ U , i.e., in stochastic states, all outgoing edges stay in U . � ECs: E = { U 1 , U 2 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 37 / 26

  75. End-components: what they are − 1 1 1 U 2 U 1 2 0 0 s 7 s 6 s 5 s 1 s 2 1 0 1 0 − 1 2 9 1 U 3 0 s 3 s 4 1 1 0 2 2 An EC of the MDP P = G [ λ stoch ] is a subgraph in which P 1 can 2 ensure to stay despite stochastic states [dA97], i.e., a set U ⊆ S s.t. (i) ( U , E ∩ ( U × U )) is strongly connected, (ii) ∀ s ∈ U ∩ S ∆ , Supp(∆( s )) ⊆ U , i.e., in stochastic states, all outgoing edges stay in U . � ECs: E = { U 1 , U 2 , U 3 Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 37 / 26

  76. End-components: what they are − 1 1 1 U 2 U 1 2 0 0 s 7 s 6 s 5 s 1 s 2 1 0 1 0 − 1 2 1 9 U 3 0 s 3 s 4 1 1 0 2 2 An EC of the MDP P = G [ λ stoch ] is a subgraph in which P 1 can 2 ensure to stay despite stochastic states [dA97], i.e., a set U ⊆ S s.t. (i) ( U , E ∩ ( U × U )) is strongly connected, (ii) ∀ s ∈ U ∩ S ∆ , Supp(∆( s )) ⊆ U , i.e., in stochastic states, all outgoing edges stay in U . � ECs: E = { U 1 , U 2 , U 3 , { s 5 , s 6 } , { s 6 , s 7 } , { s 1 , s 3 , s 4 , s 5 }} Beyond Worst-Case Synthesis Bruy` ere, Filiot, Randour, Raskin 37 / 26

Recommend


More recommend