interactions and dynamics some aspects of repeated zero
play

Interactions and dynamics: some aspects of repeated zero-sum games - PDF document

Interactions and dynamics: some aspects of repeated zero-sum games Sylvain Sorin Laboratoire dEconom etrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris and Equipe Combinatoire, UFR 921, Universit e P. et M. Curie - Paris 6,


  1. Interactions and dynamics: some aspects of repeated zero-sum games Sylvain Sorin Laboratoire d’Econom´ etrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris and Equipe Combinatoire, UFR 921, Universit´ e P. et M. Curie - Paris 6, 175 Rue du Chevaleret, 75013 Paris, France sorin@poly.polytechnique.fr Winter School on Complex Systems December 9 -13, 2002 Ecole Normale Sup´ erieure de Lyon 1

  2. Contents 1 Introduction 3 1.1 Zero-sum games . . . . . . . . . . . . . . . 3 1.2 Repetition, information and interaction . . 3 1.3 Evaluation: asymptotic approach, uniform approach . . . . . . . . . . . . . . . . . . . 3 2 Stochastic games 5 2.1 Description . . . . . . . . . . . . . . . . . 5 2.2 Results . . . . . . . . . . . . . . . . . . . . 6 3 Incomplete information games 7 3.1 Description . . . . . . . . . . . . . . . . . 7 3.2 Results . . . . . . . . . . . . . . . . . . . . 8 4 Recursive structure and discrete dynamics 9 4.1 Representation of game with incomplete information as a stochastic game . . . . . 9 4.2 General repeated game . . . . . . . . . . . 10 4.3 Recursive formula . . . . . . . . . . . . . . 12 4.4 Examples . . . . . . . . . . . . . . . . . . 13 4.4.1 Stochastic games . . . . . . . . . . 13 4.4.2 incomplete information games . . . 13 5 Operator approach 14 6 Uniform approach 15 7 Open problems 17 8 References 18 2

  3. 1 Introduction 1.1 Zero-sum games 1 / 4 3 / 4 1 / 2 2 0 1 / 2 − 1 1 v = 1 / 2 Minmax theorem (von Neumann): Let A be an I × J matrix. ∃ v ∈ IR , x ∈ ∆( I ) , y ∈ ∆( J ) : xAy ′ ≥ v, ∀ y ′ x ′ Ay ≤ v, ∀ x ′ 1.2 Repetition, information and interaction Repetition allows for: - Coordination - Threats as a fonction of the information along the play. In the zero-sum case, the impact is only through the evolution of a jointly controlled state variable 1.3 Evaluation: asymptotic approach, uniform approach sequence of stage payoff g n , n = 1 , . . . , - asymptotic approach: for each averaging rule θ , value v θ . limiting behavior of the family v θ 3

  4. - uniform approach: properties independent of the (long) duration of the interaction 4

  5. 2 Stochastic games 2.1 Description Finite two-person zero-sum stochastic game: - state space Ω - action spaces I and J - payoff function g from Ω × I × J to IR - initial state, ω 1 , known to both players - at each stage t + 1, a transition Q ( ·| ω t , i t , j t ) ∈ ∆(Ω) determines the law of the new state ω t +1 , announced to each player. X = ∆( I ), Y = ∆( J ) g and Q are extended by bilinearity to X × Y . α β 1 ∗ 0 ∗ a b 0 1 5

  6. 2.2 Results Shapley’s Theorem (1953) The value v λ of the λ discounted game is the only fixed point of the operator f �→ Φ ( λ, f ) from IR Ω to itself � Ω f ( ω ′ ) Q ( dω ′ | ω, x, y ) } Φ ( λ, f )( ω ) = val X × Y { λg ( ω, x, y )+(1 − λ ) where val X × Y stands for the value operator: val X × Y = max min = min max . X Y Y X Bewley and Kohlberg (1976a, 1976b) Algebraic approach: v λ has an expansion in Puiseux se- rie, hence lim λ → 0 v λ exists and lim n →∞ v n = lim λ → 0 v λ . Mertens and Neyman (1981) General stochastic game: v λ BV implies lim v n = lim v λ (and the existence of v ∞ under standard signalling). Lehrer and Sorin (1992) Markov Decision Process: uniform convergence of v λ is equivalent to uniform convergence of v n and the limits are the same. Example with Ω countable where both limits exist and differ. 6

  7. 3 Incomplete information games 3.1 Description Two-person zero-sum repeated games with incomplete information, Aumann and Maschler (1995). Simple case: independent information and standard sig- nalling. - parameter space: K × L - endowed with a product probability π = p ⊗ q ∈ ∆( K ) × ∆( L ) according to which ( k, ℓ ) is chosen. - k is told to Player 1 and ℓ to Player 2, hence the players have partial private information on the parameter ( k , ℓ ) which is fixed for the duration of the play. - after each stage t the players are told the previous moves ( i t , j t ). a one-stage strategy of Player 1 is an element x in X = ∆( I ) K (resp. y in Y = ∆( J ) L for Player 2). 7

  8. 3.2 Results Aumann and Maschler (1966-68) Lack of information on one side: lim v n = lim v λ = v (= v ∞ ) Mertens and Zamir (1971-72) Lack of information on both sides: lim v n = lim v λ = v characterization of v : existence and uniqueness of the solution of the functional equation v = Cav p min( u, v ) v = Vex q max( u, v ) where u is the value of the non revealing game - none of the players transmits (uses) his own information and Cav (resp. Vex ) is the concavification (resp. convexifica- tion) operator: Given f from a convex set C to IR , Cav C f is the smallest concave function greater than f on C . 8

  9. 4 Recursive structure and discrete dynamics 4.1 Representation of game with incomplete information as a stochastic game - state space χ = ∆( K ) × ∆( L ) (beliefs of the players on the parameter along the play) Recall that a one-stage strategy of Player 1 is an element x in X = ∆( I ) K (resp. y in Y = ∆( J ) L for Player 2) - transition Π : χ × X × Y → ∆( χ ) • Π(( p ( i ) , q ( j )) | ( p, q ) , x, y ) = x ( i ) y ( j ), • p ( i ) is the conditional probability on K given the move i • x ( i ) the probability of this move (similarly y ( j ) for Player 2) i and p k ( i ) = p k x k i k p k x k Explicitly: x ( i ) = x ( i ) . � 9

  10. 4.2 General repeated game - parameter space M - action spaces I and J for Player 1 and 2 respectively - payoff function g from I × J × M to IR - signal sets A and B (Assume all sets finite, avoiding measurability issues) - initial position: parameter m 1 , signal a 1 (resp. b 1 ) for Player 1 (resp. Player 2) according to π probability on M × A × B - transition Q from M × I × J to probabilities on M × A × B . At stage t , given the state m t and the moves ( i t , j t ) ( m t +1 , a t +1 , b t +1 ) ∼ Q ( m t , i t , j t ) - play of the game: m 1 , a 1 , b 1 , i 1 , j 1 , m 2 , a 2 , b 2 , i 2 , j 2 , . . . - information of Player 1 before his play at stage t : pri- vate history of the form ( a 1 , i 1 , a 2 , i 2 , . . ., a t ), (similarly for Player 2) - sequence of payoffs is g 1 , g 2 , . . ., g t , . . . with g t = g ( i t , j t , m t ). - strategy for Player 1: σ , map from private histories to ∆( I ): probabilities on the set I of actions τ defined similarly for Player 2. 10

  11. A couple ( σ, τ ) induces, together with the components of the game, π and Q , a distribution on plays, P σ,τ , hence on the sequence of payoffs. 1) the finite n -stage game Γ n with payoff given by the average of the first n rewards: γ n ( σ, τ ) = E σ,τ (1 � n t =1 g t ) n 2) the λ -discounted game Γ λ with payoff equal to the discounted sum of the rewards: � ∞ t =1 λ (1 − λ ) t − 1 g t ) γ λ ( σ, τ ) = E σ,τ ( The values of these games are denoted by v n and v λ re- spectively. The analysis of their asymptotic behavior, as n goes to ∞ or λ goes to 0 is the study of the asymp- totic game . 11

  12. 4.3 Recursive formula The recursive structure relies on the construction of the universal belief space . Mertens and Zamir (1985) The infinite hierarchy of beliefs on M is canonically rep- resented by Ξ = M × Θ 1 × Θ 2 , where Θ i , homeomorphic to ∆( M × Θ − i ) , is the type set of Player i . An information scheme is a probability on M × A × B (parameter × signals). It induces a consistent distri- bution Q on Ξ: for any Borel subset B of Ξ � Ξ θ i ( ζ )( B ) Q ( dζ ) Q ( B ) = where θ i is the canonical projection from Ξ to Θ i . - the strategies of the players and the signaling structure in the game, before the moves at stage t , defines a proba- bility on t − histories, hence an information scheme, thus a consistent distribution on Ξ: the entrance law P t - P t and the (behavioral) strategies at stage t (maps from types to mixed actions, α t : Θ 1 → ∆( I ), for Player 1, resp. β t for Player 2) determine the current payoff g t and the new entrance law P t +1 = L ( P t , α t , β t ). - the stationary aspect of the repeated game is expressed by the fact that L does not depend on the stage t . 12

  13. The Shapley operator maps the set of real bounded functions defined on the space of consistent probabilities (in ∆(Ξ)) to itself: Ψ ( f )( P ) = val α × β { g ( P , α, β ) + f ( L ( P , α, β )) } Mertens, Sorin and Zamir (1994), Sections III.1, III.2, IV.3. v λ λ = Ψ ((1 − λ ) v λ nv n = Ψ n (0) , λ ) . Problems: asymptotic behavior of v λ as λ → 0 or of v n as n →∞ . Convergence ? convergence to the same w ? 4.4 Examples 4.4.1 Stochastic games Ψ operates on IR Ω : � Ω f ( ω ′ ) Q ( dω ′ | ω, x, y ) } Ψ ( f )( ω ) = val X × Y { g ( ω, x, y ) + 4.4.2 incomplete information games Ψ is an operator on the set of real bounded saddle (con- cave/convex) functions on χ � χ f ( p ′ , q ′ )Π( d ( p ′ , q ′ ) | ( p, q ) , x, y ) } Ψ ( f )( p, q ) = val X × Y { g ( p, q, x, y )+ k,ℓ p k q ℓ g ( k, ℓ, x k , y ℓ ). with g ( p, q, x, y ) = � 13

Recommend


More recommend