Previously in Game Theory
Previously in Game Theory ◮ decision makers: ◮ choices ◮ preferences
Previously in Game Theory ◮ decision makers: ◮ choices ◮ preferences ◮ solution concepts: ◮ best response ◮ Nash equilibrium
Rock, paper, scissors
Rock, paper, scissors R P S R 0 , 0 − 1 , 1 1 , − 1 1 , − 1 0 , 0 − 1 , 1 P S − 1 , 1 1 , − 1 0 , 0
Learning in games
Learning in games Repeated games
Learning in games
Best Response learning
Best Response learning 1. Guess what the opponent(s) will play
Best Response learning 1. Guess what the opponent(s) will play 2. Play a Best Response to that guess
Best Response learning 1. Guess what the opponent(s) will play 2. Play a Best Response to that guess 3. Observe the play
Best Response learning 1. Guess what the opponent(s) will play 2. Play a Best Response to that guess 3. Observe the play 4. Update the guess
BR learning: Cournot dynamics
BR learning: Cournot dynamics Guess = last action played
BR learning: Cournot dynamics Guess = last action played C D C 2 , 2 − 1 , 3 3 , − 1 0 , 0 D
BR learning: Cournot dynamics Guess = last action played C D C 2 , 2 − 1 , 3 3 , − 1 0 , 0 D R P S 0 , 0 − 1 , 1 1 , − 1 R P 1 , − 1 0 , 0 − 1 , 1 − 1 , 1 1 , − 1 0 , 0 S
BR learning: Fictitious play
BR learning: Fictitious play Guess = empirical distribution of play
BR learning: Fictitious play Guess = empirical distribution of play R P S 0 , 0 − 1 , 1 1 , − 1 R P 1 , − 1 0 , 0 − 1 , 1 − 1 , 1 1 , − 1 0 , 0 S
BR learning: Fictitious play Guess = empirical distribution of play R P S 0 , 0 − 1 , 1 1 , − 1 R P 1 , − 1 0 , 0 − 1 , 1 − 1 , 1 1 , − 1 0 , 0 S L C R 0 , 0 0 , 1 1 , 0 U M 1 , 0 0 , 0 0 , 1 0 , 1 1 , 0 0 , 0 D
Evolutionary learning
Evolutionary learning Action set: A Utility function: u
Evolutionary learning Action set: A Utility function: u p ∈ ∆( A ) , k ∈ A p k = p k ( u ( k, p ) − u ( p, p )) ˙
Battle of the Sexes
Battle of the Sexes O F 3 , 2 0 , 0 O 0 , 0 2 , 3 F
Correlated equilibrium (CE)
Correlated equilibrium (CE) a ∗ ∈ A = � i A i is a NE: ∀ i, ∀ a ′ i , u i ( a ∗ i , a ∗ − i ) ≥ u i ( a ′ i , a ∗ − i )
Correlated equilibrium (CE) a ∗ ∈ A = � i A i is a NE: ∀ i, ∀ a ′ i , u i ( a ∗ i , a ∗ − i ) ≥ u i ( a ′ i , a ∗ − i ) i ∆( A i ) is a NE: ∀ i, ∀ a i , ∀ a ′ α ∈ � i , � � u i ( a ′ u i ( a i , a − i ) α ( a ) ≥ i , a − i ) α ( a ) a − i a − i
Correlated equilibrium (CE) a ∗ ∈ A = � i A i is a NE: ∀ i, ∀ a ′ i , u i ( a ∗ i , a ∗ − i ) ≥ u i ( a ′ i , a ∗ − i ) i ∆( A i ) is a NE: ∀ i, ∀ a i , ∀ a ′ α ∈ � i , � � u i ( a ′ u i ( a i , a − i ) α ( a ) ≥ i , a − i ) α ( a ) a − i a − i π ∈ ∆( A ) is a CE: ∀ i, ∀ a i , ∀ a ′ i , � � u i ( a ′ u i ( a i , a − i ) π ( a ) ≥ i , a − i ) π ( a ) a − i a − i
No regret learning
No regret learning u i ( k, a − i ) − u i ( j, a − i )
No regret learning u i ( k, a − i ) − u i ( j, a − i ) t � R i jk ( t ) = u i ( k, a − i ( τ )) − u i ( j, a − i ( τ )) τ =0: a i ( τ )= j
No regret learning u i ( k, a − i ) − u i ( j, a − i ) t � R i jk ( t ) = u i ( k, a − i ( τ )) − u i ( j, a − i ( τ )) τ =0: a i ( τ )= j Regret matching converges to the correlated equilibria set.
Learning in games
Learning in games ◮ Best response
Learning in games ◮ Best response ◮ Replicator dynamics
Learning in games ◮ Best response ◮ Replicator dynamics ◮ No regret
Repeated games
Markov Decision Process (MDP)
Markov Decision Process (MDP) state space X action space U transition P : X × U → ∆( X ) reward r : X × U → R discount factor δ ∈ [0 , 1]
Markov Decision Process (MDP) state space X action space U transition P : X × U → ∆( X ) reward r : X × U → R discount factor δ ∈ [0 , 1] + ∞ � δ t r ( x ( t ) , u ( t )) U ( x ( · ) , u ( · )) = t =0
MDP (continued) history H ∈ � ( X, U ) policy π : H → ∆( U )
MDP (continued) history H ∈ � ( X, U ) policy π : H → ∆( U ) V π ( x 0 ) = E π [ U ( x ( · ) , u ( · ))]
MDP (continued) history H ∈ � ( X, U ) policy π : H → ∆( U ) V π ( x 0 ) = E π [ U ( x ( · ) , u ( · ))] V π ( x 0 ) V ( x 0 ) = max π
Principle of Optimality Bellman’s equation: V ( x 0 ) = max u 0 [ r ( x 0 , u 0 ) + δV ( P ( x 0 , u 0 ))]
Dynamic Programming Solving the MDP:
Dynamic Programming Solving the MDP: ◮ knowing P : value iteration
Dynamic Programming Solving the MDP: ◮ knowing P : value iteration ◮ not knowing P : online learning
Repeated game
Repeated game Game ( I , � i A i , � i u i )
Repeated game Game ( I , � i A i , � i u i ) Discount factor δ + ∞ � δ t u i ( a ( t )) U i ( a ( · )) = t =0
Repeated game Game ( I , � i A i , � i u i ) Discount factor δ + ∞ � δ t u i ( a ( t )) U i ( a ( · )) = t =0 Strategy σ : H → � i ∆( A i x ) V i ( σ ) = E σ [ U i ( a ( · ))]
Nash equilibrium Player i : ◮ choices σ i ◮ utility V i
Nash equilibrium Player i : ◮ choices σ i ◮ utility V i Nash equilibrium is not strong enough! (Explanation on the whiteboard ⇒ )
Information structure
Information structure ◮ perfect ◮ imperfect
Information structure ◮ perfect ◮ imperfect ◮ public ◮ private (beliefs)
Folk theorem Any feasible, strictly individually rational payoff can be sustained by a sequentially rational equilibrium.
Folk theorem Any feasible, strictly individually rational payoff can be sustained by a sequentially rational equilibrium. Holy grail for repeated games.
u 2 u 1
u 2 u 1 DC
u 2 CD CC u 1 DD DC
u 2 CD CC u 1 DD DC
u 2 CD CC u 1 DD DC
Research
Weakly belief-free equilibria Characterization of repeated games with correlated equilibria.
Repeated games
Repeated games ◮ Dynamic programming
Repeated games ◮ Dynamic programming ◮ Repeated games
Repeated games ◮ Dynamic programming ◮ Repeated games ◮ Folk theorem
Learning in games
Learning in games Repeated games
Questions, Comments
Recommend
More recommend