previously in game theory previously in game theory
play

Previously in Game Theory Previously in Game Theory decision makers: - PowerPoint PPT Presentation

Previously in Game Theory Previously in Game Theory decision makers: choices preferences Previously in Game Theory decision makers: choices preferences solution concepts: best response Nash equilibrium Rock,


  1. Previously in Game Theory

  2. Previously in Game Theory ◮ decision makers: ◮ choices ◮ preferences

  3. Previously in Game Theory ◮ decision makers: ◮ choices ◮ preferences ◮ solution concepts: ◮ best response ◮ Nash equilibrium

  4. Rock, paper, scissors

  5. Rock, paper, scissors R P S R 0 , 0 − 1 , 1 1 , − 1 1 , − 1 0 , 0 − 1 , 1 P S − 1 , 1 1 , − 1 0 , 0

  6. Learning in games

  7. Learning in games Repeated games

  8. Learning in games

  9. Best Response learning

  10. Best Response learning 1. Guess what the opponent(s) will play

  11. Best Response learning 1. Guess what the opponent(s) will play 2. Play a Best Response to that guess

  12. Best Response learning 1. Guess what the opponent(s) will play 2. Play a Best Response to that guess 3. Observe the play

  13. Best Response learning 1. Guess what the opponent(s) will play 2. Play a Best Response to that guess 3. Observe the play 4. Update the guess

  14. BR learning: Cournot dynamics

  15. BR learning: Cournot dynamics Guess = last action played

  16. BR learning: Cournot dynamics Guess = last action played C D C 2 , 2 − 1 , 3 3 , − 1 0 , 0 D

  17. BR learning: Cournot dynamics Guess = last action played C D C 2 , 2 − 1 , 3 3 , − 1 0 , 0 D R P S 0 , 0 − 1 , 1 1 , − 1 R P 1 , − 1 0 , 0 − 1 , 1 − 1 , 1 1 , − 1 0 , 0 S

  18. BR learning: Fictitious play

  19. BR learning: Fictitious play Guess = empirical distribution of play

  20. BR learning: Fictitious play Guess = empirical distribution of play R P S 0 , 0 − 1 , 1 1 , − 1 R P 1 , − 1 0 , 0 − 1 , 1 − 1 , 1 1 , − 1 0 , 0 S

  21. BR learning: Fictitious play Guess = empirical distribution of play R P S 0 , 0 − 1 , 1 1 , − 1 R P 1 , − 1 0 , 0 − 1 , 1 − 1 , 1 1 , − 1 0 , 0 S L C R 0 , 0 0 , 1 1 , 0 U M 1 , 0 0 , 0 0 , 1 0 , 1 1 , 0 0 , 0 D

  22. Evolutionary learning

  23. Evolutionary learning Action set: A Utility function: u

  24. Evolutionary learning Action set: A Utility function: u p ∈ ∆( A ) , k ∈ A p k = p k ( u ( k, p ) − u ( p, p )) ˙

  25. Battle of the Sexes

  26. Battle of the Sexes O F 3 , 2 0 , 0 O 0 , 0 2 , 3 F

  27. Correlated equilibrium (CE)

  28. Correlated equilibrium (CE) a ∗ ∈ A = � i A i is a NE: ∀ i, ∀ a ′ i , u i ( a ∗ i , a ∗ − i ) ≥ u i ( a ′ i , a ∗ − i )

  29. Correlated equilibrium (CE) a ∗ ∈ A = � i A i is a NE: ∀ i, ∀ a ′ i , u i ( a ∗ i , a ∗ − i ) ≥ u i ( a ′ i , a ∗ − i ) i ∆( A i ) is a NE: ∀ i, ∀ a i , ∀ a ′ α ∈ � i , � � u i ( a ′ u i ( a i , a − i ) α ( a ) ≥ i , a − i ) α ( a ) a − i a − i

  30. Correlated equilibrium (CE) a ∗ ∈ A = � i A i is a NE: ∀ i, ∀ a ′ i , u i ( a ∗ i , a ∗ − i ) ≥ u i ( a ′ i , a ∗ − i ) i ∆( A i ) is a NE: ∀ i, ∀ a i , ∀ a ′ α ∈ � i , � � u i ( a ′ u i ( a i , a − i ) α ( a ) ≥ i , a − i ) α ( a ) a − i a − i π ∈ ∆( A ) is a CE: ∀ i, ∀ a i , ∀ a ′ i , � � u i ( a ′ u i ( a i , a − i ) π ( a ) ≥ i , a − i ) π ( a ) a − i a − i

  31. No regret learning

  32. No regret learning u i ( k, a − i ) − u i ( j, a − i )

  33. No regret learning u i ( k, a − i ) − u i ( j, a − i ) t � R i jk ( t ) = u i ( k, a − i ( τ )) − u i ( j, a − i ( τ )) τ =0: a i ( τ )= j

  34. No regret learning u i ( k, a − i ) − u i ( j, a − i ) t � R i jk ( t ) = u i ( k, a − i ( τ )) − u i ( j, a − i ( τ )) τ =0: a i ( τ )= j Regret matching converges to the correlated equilibria set.

  35. Learning in games

  36. Learning in games ◮ Best response

  37. Learning in games ◮ Best response ◮ Replicator dynamics

  38. Learning in games ◮ Best response ◮ Replicator dynamics ◮ No regret

  39. Repeated games

  40. Markov Decision Process (MDP)

  41. Markov Decision Process (MDP) state space X action space U transition P : X × U → ∆( X ) reward r : X × U → R discount factor δ ∈ [0 , 1]

  42. Markov Decision Process (MDP) state space X action space U transition P : X × U → ∆( X ) reward r : X × U → R discount factor δ ∈ [0 , 1] + ∞ � δ t r ( x ( t ) , u ( t )) U ( x ( · ) , u ( · )) = t =0

  43. MDP (continued) history H ∈ � ( X, U ) policy π : H → ∆( U )

  44. MDP (continued) history H ∈ � ( X, U ) policy π : H → ∆( U ) V π ( x 0 ) = E π [ U ( x ( · ) , u ( · ))]

  45. MDP (continued) history H ∈ � ( X, U ) policy π : H → ∆( U ) V π ( x 0 ) = E π [ U ( x ( · ) , u ( · ))] V π ( x 0 ) V ( x 0 ) = max π

  46. Principle of Optimality Bellman’s equation: V ( x 0 ) = max u 0 [ r ( x 0 , u 0 ) + δV ( P ( x 0 , u 0 ))]

  47. Dynamic Programming Solving the MDP:

  48. Dynamic Programming Solving the MDP: ◮ knowing P : value iteration

  49. Dynamic Programming Solving the MDP: ◮ knowing P : value iteration ◮ not knowing P : online learning

  50. Repeated game

  51. Repeated game Game ( I , � i A i , � i u i )

  52. Repeated game Game ( I , � i A i , � i u i ) Discount factor δ + ∞ � δ t u i ( a ( t )) U i ( a ( · )) = t =0

  53. Repeated game Game ( I , � i A i , � i u i ) Discount factor δ + ∞ � δ t u i ( a ( t )) U i ( a ( · )) = t =0 Strategy σ : H → � i ∆( A i x ) V i ( σ ) = E σ [ U i ( a ( · ))]

  54. Nash equilibrium Player i : ◮ choices σ i ◮ utility V i

  55. Nash equilibrium Player i : ◮ choices σ i ◮ utility V i Nash equilibrium is not strong enough! (Explanation on the whiteboard ⇒ )

  56. Information structure

  57. Information structure ◮ perfect ◮ imperfect

  58. Information structure ◮ perfect ◮ imperfect ◮ public ◮ private (beliefs)

  59. Folk theorem Any feasible, strictly individually rational payoff can be sustained by a sequentially rational equilibrium.

  60. Folk theorem Any feasible, strictly individually rational payoff can be sustained by a sequentially rational equilibrium. Holy grail for repeated games.

  61. u 2 u 1

  62. u 2 u 1 DC

  63. u 2 CD CC u 1 DD DC

  64. u 2 CD CC u 1 DD DC

  65. u 2 CD CC u 1 DD DC

  66. Research

  67. Weakly belief-free equilibria Characterization of repeated games with correlated equilibria.

  68. Repeated games

  69. Repeated games ◮ Dynamic programming

  70. Repeated games ◮ Dynamic programming ◮ Repeated games

  71. Repeated games ◮ Dynamic programming ◮ Repeated games ◮ Folk theorem

  72. Learning in games

  73. Learning in games Repeated games

  74. Questions, Comments

Recommend


More recommend