Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 26 January 2016, ADGO II, Santiago de Chile 1 G.Vigeral (with S. Sorin) Operator approach
Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 2 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 3 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Zero-sum stochastic game A zero-sum stochastic game Γ is a 5-tuple ( Ω , I , J , g , ρ ) where: Ω is the set of states. I (resp. J ) is the action set of Player 1 (resp. Player 2). g : I × J × Ω → [ − 1 , 1 ] is the payoff function (that Player 1 maximizes and Player 2 minimizes). ρ : I × J × Ω → ∆ ( Ω ) is the transition probability. 4 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games How the Game is played An initial state ω 1 is given, known by each player. At each stage k ∈ N : the players observe the current state ω k . According to the past history, Player 1 (resp. Player 2) chooses a mixed action x k in X = ∆ ( I ) (resp. y k in Y = ∆ ( J ) ). Done independently by each player. An action i k of Player 1 (resp. j k of Player 2) is drawn according to his mixed strategy x k (resp. y k ). This gives the payoff at stage k : g k = g ( i k , j k , ω k ) . A new state ω k + 1 is drawn according to ρ ( i k , j k , ω k ) . 5 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games The n -stage game For any stochastic game Γ , any finite horizon n ∈ N , and any starting state ω 1 , the n -stage game Γ n is the zero-sum game with payoff � � n ∑ E g k , k = 1 that Player 1 maximizes and Player 2 minimizes. The value of Γ n ( ω 1 ) is denoted by V n ( ω 1 ) . Normalized value v n = V n n . 6 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games The discounted game For any stochastic game Γ , any discount factor λ ∈ ] 0 , 1 [ , and any starting state ω 1 , the discounted game Γ λ ( ω 1 ) is the zero-sum game with payoff � � + ∞ ( 1 − λ ) k − 1 g k ∑ , E k = 1 that Player 1 maximizes and Player 2 minimizes. The value of Γ λ ( ω 1 ) is denoted by W λ ( ω 1 ) . Normalized value w λ = λ v λ . 7 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Recursive structure Shapley (1953) proved that the values satisfy a recursive structure: � � V n ( ω ) = g ( x , y , ω )+ E ρ ( x , y , ω ) ( V n − 1 ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+ E ρ ( x , y , ω ) ( V n − 1 ( · )) y ∈ Y sup inf x ∈ X � � W λ ( ω ) = g ( x , y , ω )+( 1 − λ ) E ρ ( x , y , ω ) ( W λ ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+( 1 − λ ) E ρ ( x , y , ω ) ( W λ ( · )) . y ∈ Y sup inf x ∈ X 8 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Shapley operator This can be summarized by: Ψ ( V n − 1 ) = Ψ n ( 0 ) = V n W λ = Ψ (( 1 − λ ) W λ ) �� ∞ � 1 − λ � � � 1 − λ = λ Ψ = λ Ψ · w λ w λ λ λ for some operator Ψ . � � Ψ ( f )( ω ) = g ( x , y , ω )+ E ρ ( x , y , ω ) ( f ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+ E ρ ( x , y , ω ) ( f ( · )) y ∈ Y sup inf . x ∈ X Ψ is nonexpansive for the infinite norm � Ψ ( f ) − Ψ ( f ′ ) � ∞ ≤ � f − f ′ � ∞ . 9 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Framework This was proven by Shapley in the finite case but true in a very wide framework. For example if Ω finite, X and Y compact, g and ρ continuous. Ω , X and Y are compact metric, g and ρ continuous. See Maitra Partasarathy, Nowak, Mertens Sorin Zamir for more general frameworks. 10 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 11 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Definition Definition due to Neyman (2013). Instead of playing at time 1 , 2 , ··· , n , ··· , players play at times t 1 , t 2 , ··· , t n , ··· The intensity of both payoff and transition at time t k is h k = t k + 1 − t k That is g h = hg and ρ h = ( 1 − h ) Id + h ρ . Shapley operator of "exact game" with duration h : Ψ h = ( 1 − h ) Id + h Ψ 12 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Some natural questions What happens, for a fixed horizon t or discount factor λ , 1 when the duration h i of each stage vanishes ? Does the value converge, to which limit ? What happens, for a fixed sequence of stage duration h i , 2 when the horizon goes to infinity or the discount factor goes to 0. Does the normalized value converge, to which limit ? What happens when both λ (or 1 n ) and h i go to 0 ? 3 What can be said of optimal strategies in games with 4 varying duration ? Neyman answers questions 1 3 4 for finite discounted games. Here we use the operator approach to give a general answer to 1 2 3. 13 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Finite horizon Game with finite horizon and varying duration Finite horizon t , finite sequence of stage duration h 1 , ··· , h n with ∑ h i = t . The value V of such a game satisfies V = z n with z i + 1 = Ψ h i ( z i ) = ( 1 − h i ) z i + h i Ψ ( z i ) z i + 1 − z i = − ( Id − Ψ )( z i ) h i Eulerian scheme associated to f ′ = − ( Id − Ψ )( f ) . One can use general results associated to such schemes, for any non expansive operator defined on a Banach space. 14 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Finite horizon Eulerian schemes in Banach spaces For general nonexpansive Ψ : Proposition (Miyadera-Oharu ‘70, Crandall-Liggett ‘71) h ( z 0 ) � ≤ � z 0 − Ψ ( z 0 ) � h √ n . � f nh ( z 0 ) − Ψ n Proposition (V. ’10) If z i + 1 = ( 1 − h i ) z i + h i Ψ ( z i ) , then � n ∑ h 2 � f t ( z 0 ) − x n � ≤ � z 0 − Ψ ( z 0 ) � i . i = 1 with t = ∑ n i = 1 h i . 15 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Finite horizon Result with t fixed Let h = max h i and t = ∑ h i , then √ � V − f ( t ) � ≤ K ht . Hence as the mesh h goes to 0, the value of the game goes to f ( t ) . f ( t ) can be interpreted as the value of a game played in continuous time (Neyman ’13). 16 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Finite horizon Asymptotic results For any h i , � V − f ( t ) � ≤ K √ t . t All the repeated games with varying stage duration have the same (normalized) asymptotic behavior. Same asymptotic behavior for the normalized value in continuous time f ( t ) and for the normalized value of the t original game v n . 17 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Discounted evaluation Game with discount factor and varying duration Discount factor λ = weight on the payoff on [ 0 , 1 ] compared to [ 0 , + ∞ ] . Infinite sequence of stage durations h 1 , ··· , h n , ··· . � � 1 − λ h When h is constant, normalized value w h λ = λ Ψ h . λ In general w is � � + ∞ D h i ∏ ( 0 ) λ i = 1 with � 1 − λ h � D h λ ( f ) = λ Ψ h . f λ 18 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Discounted evaluation Result with λ fixed and vanishing duration λ For a uniform duration h , w h λ = w µ with µ = 1 + λ − λ h . For any λ and h i ≤ h , the value w of the λ − discounted game with stage durations h i satisfies � w − ˆ w λ � ≤ Kh w λ : = w with ˆ 1 + λ . λ Hence as the mesh h goes to 0, the value of the game goes to w 1 + λ . Already known when the game is finite λ (Neyman 2013). w λ can be interpreted as the value of a game played in ˆ continuous time (Neyman ’13). 19 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Discounted evaluation Asymptotic results Assumption: there exists nondecreasing k : ] 0 , 1 ] → R + and √ ℓ : [ 0 , + ∞ ] → R + with k ( λ ) = o ( λ ) as λ goes to 0 and � D 1 λ ( z ) − D 1 µ ( z ) � ≤ k ( | λ − µ | ) ℓ ( � z � ) for all ( λ , µ ) ∈ ] 0 , 1 ] 2 and z ∈ Z . Always true for Shapley operators of games with bounded payoff. Then for any λ and h i , the value w of the λ − discounted game with stage durations h i satisfies � w − w λ � ≤ K λ . All the repeated games with varying stage duration have the same (normalized) asymptotic behavior as λ goes to 0. Same asymptotic behavior for the normalized value in continuous time ˆ w λ and for the normalized value of the original game w λ . 20 G.Vigeral (with S. Sorin) Operator approach
Recommend
More recommend