operator approach to stochastic games with varying stage
play

Operator approach to stochastic games with varying stage duration - PowerPoint PPT Presentation

Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 26 January 2016, ADGO II, Santiago de Chile 1 G.Vigeral (with S. Sorin) Operator approach Table of contents


  1. Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 26 January 2016, ADGO II, Santiago de Chile 1 G.Vigeral (with S. Sorin) Operator approach

  2. Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 2 G.Vigeral (with S. Sorin) Operator approach

  3. Zero-sum stochastic games Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 3 G.Vigeral (with S. Sorin) Operator approach

  4. Zero-sum stochastic games Zero-sum stochastic game A zero-sum stochastic game Γ is a 5-tuple ( Ω , I , J , g , ρ ) where: Ω is the set of states. I (resp. J ) is the action set of Player 1 (resp. Player 2). g : I × J × Ω → [ − 1 , 1 ] is the payoff function (that Player 1 maximizes and Player 2 minimizes). ρ : I × J × Ω → ∆ ( Ω ) is the transition probability. 4 G.Vigeral (with S. Sorin) Operator approach

  5. Zero-sum stochastic games How the Game is played An initial state ω 1 is given, known by each player. At each stage k ∈ N : the players observe the current state ω k . According to the past history, Player 1 (resp. Player 2) chooses a mixed action x k in X = ∆ ( I ) (resp. y k in Y = ∆ ( J ) ). Done independently by each player. An action i k of Player 1 (resp. j k of Player 2) is drawn according to his mixed strategy x k (resp. y k ). This gives the payoff at stage k : g k = g ( i k , j k , ω k ) . A new state ω k + 1 is drawn according to ρ ( i k , j k , ω k ) . 5 G.Vigeral (with S. Sorin) Operator approach

  6. Zero-sum stochastic games The n -stage game For any stochastic game Γ , any finite horizon n ∈ N , and any starting state ω 1 , the n -stage game Γ n is the zero-sum game with payoff � � n ∑ E g k , k = 1 that Player 1 maximizes and Player 2 minimizes. The value of Γ n ( ω 1 ) is denoted by V n ( ω 1 ) . Normalized value v n = V n n . 6 G.Vigeral (with S. Sorin) Operator approach

  7. Zero-sum stochastic games The discounted game For any stochastic game Γ , any discount factor λ ∈ ] 0 , 1 [ , and any starting state ω 1 , the discounted game Γ λ ( ω 1 ) is the zero-sum game with payoff � � + ∞ ( 1 − λ ) k − 1 g k ∑ , E k = 1 that Player 1 maximizes and Player 2 minimizes. The value of Γ λ ( ω 1 ) is denoted by W λ ( ω 1 ) . Normalized value w λ = λ v λ . 7 G.Vigeral (with S. Sorin) Operator approach

  8. Zero-sum stochastic games Recursive structure Shapley (1953) proved that the values satisfy a recursive structure: � � V n ( ω ) = g ( x , y , ω )+ E ρ ( x , y , ω ) ( V n − 1 ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+ E ρ ( x , y , ω ) ( V n − 1 ( · )) y ∈ Y sup inf x ∈ X � � W λ ( ω ) = g ( x , y , ω )+( 1 − λ ) E ρ ( x , y , ω ) ( W λ ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+( 1 − λ ) E ρ ( x , y , ω ) ( W λ ( · )) . y ∈ Y sup inf x ∈ X 8 G.Vigeral (with S. Sorin) Operator approach

  9. Zero-sum stochastic games Shapley operator This can be summarized by: Ψ ( V n − 1 ) = Ψ n ( 0 ) = V n W λ = Ψ (( 1 − λ ) W λ ) �� ∞ � 1 − λ � � � 1 − λ = λ Ψ = λ Ψ · w λ w λ λ λ for some operator Ψ . � � Ψ ( f )( ω ) = g ( x , y , ω )+ E ρ ( x , y , ω ) ( f ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+ E ρ ( x , y , ω ) ( f ( · )) y ∈ Y sup inf . x ∈ X Ψ is nonexpansive for the infinite norm � Ψ ( f ) − Ψ ( f ′ ) � ∞ ≤ � f − f ′ � ∞ . 9 G.Vigeral (with S. Sorin) Operator approach

  10. Zero-sum stochastic games Framework This was proven by Shapley in the finite case but true in a very wide framework. For example if Ω finite, X and Y compact, g and ρ continuous. Ω , X and Y are compact metric, g and ρ continuous. See Maitra Partasarathy, Nowak, Mertens Sorin Zamir for more general frameworks. 10 G.Vigeral (with S. Sorin) Operator approach

  11. Exact games with varying stage duration Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 11 G.Vigeral (with S. Sorin) Operator approach

  12. Exact games with varying stage duration Definition Definition due to Neyman (2013). Instead of playing at time 1 , 2 , ··· , n , ··· , players play at times t 1 , t 2 , ··· , t n , ··· The intensity of both payoff and transition at time t k is h k = t k + 1 − t k That is g h = hg and ρ h = ( 1 − h ) Id + h ρ . Shapley operator of "exact game" with duration h : Ψ h = ( 1 − h ) Id + h Ψ 12 G.Vigeral (with S. Sorin) Operator approach

  13. Exact games with varying stage duration Some natural questions What happens, for a fixed horizon t or discount factor λ , 1 when the duration h i of each stage vanishes ? Does the value converge, to which limit ? What happens, for a fixed sequence of stage duration h i , 2 when the horizon goes to infinity or the discount factor goes to 0. Does the normalized value converge, to which limit ? What happens when both λ (or 1 n ) and h i go to 0 ? 3 What can be said of optimal strategies in games with 4 varying duration ? Neyman answers questions 1 3 4 for finite discounted games. Here we use the operator approach to give a general answer to 1 2 3. 13 G.Vigeral (with S. Sorin) Operator approach

  14. Exact games with varying stage duration Finite horizon Game with finite horizon and varying duration Finite horizon t , finite sequence of stage duration h 1 , ··· , h n with ∑ h i = t . The value V of such a game satisfies V = z n with z i + 1 = Ψ h i ( z i ) = ( 1 − h i ) z i + h i Ψ ( z i ) z i + 1 − z i = − ( Id − Ψ )( z i ) h i Eulerian scheme associated to f ′ = − ( Id − Ψ )( f ) . One can use general results associated to such schemes, for any non expansive operator defined on a Banach space. 14 G.Vigeral (with S. Sorin) Operator approach

  15. Exact games with varying stage duration Finite horizon Eulerian schemes in Banach spaces For general nonexpansive Ψ : Proposition (Miyadera-Oharu ‘70, Crandall-Liggett ‘71) h ( z 0 ) � ≤ � z 0 − Ψ ( z 0 ) � h √ n . � f nh ( z 0 ) − Ψ n Proposition (V. ’10) If z i + 1 = ( 1 − h i ) z i + h i Ψ ( z i ) , then � n ∑ h 2 � f t ( z 0 ) − x n � ≤ � z 0 − Ψ ( z 0 ) � i . i = 1 with t = ∑ n i = 1 h i . 15 G.Vigeral (with S. Sorin) Operator approach

  16. Exact games with varying stage duration Finite horizon Result with t fixed Let h = max h i and t = ∑ h i , then √ � V − f ( t ) � ≤ K ht . Hence as the mesh h goes to 0, the value of the game goes to f ( t ) . f ( t ) can be interpreted as the value of a game played in continuous time (Neyman ’13). 16 G.Vigeral (with S. Sorin) Operator approach

  17. Exact games with varying stage duration Finite horizon Asymptotic results For any h i , � V − f ( t ) � ≤ K √ t . t All the repeated games with varying stage duration have the same (normalized) asymptotic behavior. Same asymptotic behavior for the normalized value in continuous time f ( t ) and for the normalized value of the t original game v n . 17 G.Vigeral (with S. Sorin) Operator approach

  18. Exact games with varying stage duration Discounted evaluation Game with discount factor and varying duration Discount factor λ = weight on the payoff on [ 0 , 1 ] compared to [ 0 , + ∞ ] . Infinite sequence of stage durations h 1 , ··· , h n , ··· . � � 1 − λ h When h is constant, normalized value w h λ = λ Ψ h . λ In general w is � � + ∞ D h i ∏ ( 0 ) λ i = 1 with � 1 − λ h � D h λ ( f ) = λ Ψ h . f λ 18 G.Vigeral (with S. Sorin) Operator approach

  19. Exact games with varying stage duration Discounted evaluation Result with λ fixed and vanishing duration λ For a uniform duration h , w h λ = w µ with µ = 1 + λ − λ h . For any λ and h i ≤ h , the value w of the λ − discounted game with stage durations h i satisfies � w − ˆ w λ � ≤ Kh w λ : = w with ˆ 1 + λ . λ Hence as the mesh h goes to 0, the value of the game goes to w 1 + λ . Already known when the game is finite λ (Neyman 2013). w λ can be interpreted as the value of a game played in ˆ continuous time (Neyman ’13). 19 G.Vigeral (with S. Sorin) Operator approach

  20. Exact games with varying stage duration Discounted evaluation Asymptotic results Assumption: there exists nondecreasing k : ] 0 , 1 ] → R + and √ ℓ : [ 0 , + ∞ ] → R + with k ( λ ) = o ( λ ) as λ goes to 0 and � D 1 λ ( z ) − D 1 µ ( z ) � ≤ k ( | λ − µ | ) ℓ ( � z � ) for all ( λ , µ ) ∈ ] 0 , 1 ] 2 and z ∈ Z . Always true for Shapley operators of games with bounded payoff. Then for any λ and h i , the value w of the λ − discounted game with stage durations h i satisfies � w − w λ � ≤ K λ . All the repeated games with varying stage duration have the same (normalized) asymptotic behavior as λ goes to 0. Same asymptotic behavior for the normalized value in continuous time ˆ w λ and for the normalized value of the original game w λ . 20 G.Vigeral (with S. Sorin) Operator approach

Recommend


More recommend