Stability and Selection in Game Theoretic Learning Jeff S Shamma - PowerPoint PPT Presentation

Stability and Selection in Game Theoretic Learning Jeff S Shamma Georgia Institute of Technology Joint work with G¨ urdal Arslan, Georgios Chasparis & Michael J. Fox Valuetools 2011 Georgia Institute of Technology May 18, 2011

Networked interaction: Societal, engineered, & hybrid 1

Game formulations • Game elements: – Actors/players – Choices – Preferences over collective choices – Solution concept (e.g., Nash equilibrium) • Descriptive agenda: – Modeling of natural systems – Game elements inherited – Modeling metrics • Prescriptive agenda: – Distributed optimization for engineered (programmable!) systems – Game elements designed – Performance metrics 2

Main message Arrow, 1987: The attainment of equilibrium requires a disequilibrium process. Skyrms, 1992: The explanatory significance of the equilibrium concept depends on the underlying dynamics. 3

Background: Game theoretic learning Arrow: “The attainment of equilibrium requires a disequilibrium process.” Skyrms: “The explanatory significance of the equilibrium concept depends on the underlying dynamics.” • Monographs: – Weibull, Evolutionary Game Theory , 1997. – Young, Individual Strategy and Social Structure , 1998. – Fudenberg & Levine, The Theory of Learning in Games , 1998. – Samuelson, Evolutionary Games and Equilibrium Selection , 1998. – Young, Strategic Learning and Its Limits , 2004. – Sandholm, Population Dynamics and Evolutionary Games , 2010. • Surveys: – Hart, “Adaptive heuristics”, Econometrica , 2005. – Fudenberg & Levine, “Learning and equilibrium”, Annual Review of Economics , 2009. 4

Learning among learners • Single agent adaptation: – Stationary environment – Asymptotic guarantees • Multiagent adaptation: Environment = Other learning agents ⇒ Non-stationary • A is learning about B , whose behavior depends on A , whose behavior depends on B ...i.e., feedback • Resulting non-stationarity has major implications on achievable outcomes. 5

Illustration: Fictitious play & stability • Setup: Repeated play • Each player: – Maintains empirical frequencies (histograms) of other player actions – Forecasts (incorrectly) that others are playing randomly and independently according to empirical frequencies – Selects an action that maximizes expected payoff • Convergence: Zero sum games (1951); 2 × 2 games (1961); Potential games (1996); 2 × N games (2003). • Non-convergence: Shapley fashion game (1964); Jordan anti-coordination game (1993); Foster & Young merry-go-round game (1998). 6

Illustration: RPS & chaos • Setup: Continuous-time “replicator dynamics” on perturbed RPS • Sato et al (PNAS 2002): Chaos in learning a simple two-person game “Many economists have noted the lack of any compelling account of how agents might learn to play a Nash equilibrium. Our results strongly reinforce this concern, in a game simple enough for children to play.” 7

Illustration: Stochastic adaptive play & selection A B S H A 4,4 0,0 S 3/2,3/2 0,1 B 0,0 3,3 H 1,0 1,1 Typewriter Game Stag Hunt • How to distinguish equilibria? • Payoff based distinctions: Payoff dominance vs Risk dominance • Evolutionary (i.e., dynamic ) distinction – Young (1993) “The evolution of convention” – Kandori/Mailath/Rob (1993) “Learning, mutation, and long-run equilibria in games” – many more... • Adaptive play: – “Two” players sparsely sample from finite history – Players either: ∗ Play best response to selection ∗ Experiment with small probability – Young (1993): Risk dominance is “stochastically stable” 8

Outline Stability Selection Descriptive explanation refinement Prescriptive adaptation efficiency • Transient phenomena & stability • Transient phenomena & selection • Stochastic stability & self-organization • Network formation, self-assembly, language evolution 9

Setup: Basic notions • Setup: – Players: { 1 , ..., p } – Actions: a i ∈ A i – Action profiles: ( a 1 , a 2 , ..., a p ) ∈ A = A 1 × A 2 × ... × A p – Payoffs: u i : ( a 1 , a 2 , ..., a p ) = ( a i , a − i ) �→ R • Nash equilibrium: Action profile a ∗ ∈ A is a NE if for all players: u i ( a ∗ 1 , a ∗ 2 , ..., a ∗ p ) = u i ( a ∗ i , a ∗ − i ) ≥ u i ( a ′ i , a ∗ − i ) • Learning dynamics: – t = 0 , 1 , 2 , ... – Pr [ a i ( t )] = p i ( t ) , p i ( t ) ∈ ∆( A i ) – p i ( t ) = F i ( available info at time t ) 10

Setup: Continuous vs discrete time dynamics • Stochastic approximation: 1 � � dx x ( t + 1) = x ( t ) + rand [ F ( x ( t ))] = ⇒ dt = F ( x ) t + 1 • Summary: Continuous-time analysis has discrete-time implications • Illustrations (two player): – Smooth fictitious play: 1 � � f i ( t + 1) = f i ( t ) + β i ( f − i ( t )) − f i ( t ) t + 1 ⇓ d f i dt = − f i + β i ( f − i ) – Reinforcement learning: 1 � � p i ( t + 1) = p i ( t ) + t + 1 · u i ( a ( t )) · a i ( t ) − p i ( t ) ⇓ � � dp i diag [ M i p − i ] − diag [ p T dt = i M i p − i ] p i replicator dynamics 11

Uncoupled dynamics & nonconvergence • Uncoupled dynamics: – The learning rule for each player does not depend (explicitly) on the payoff functions of the other players. – Satisfied by fictitious play & replicator dynamics • Hart & Mas-Colell (2003): There are no uncoupled dynamics that are guaranteed to converge to Nash equilibrium. Analysis: Jordan anti-coordination game is universal counterexample. (cf., Saari & Simon (1978) ) • Three players & two actions – Player 1 � = Player 2 – Player 2 � = Player 3 – Player 3 � = Player 1 12

Uncoupled dynamics & convergence? 13

Dynamic vs static processing • Negative results only apply to static learning rules dp i dt ( t ) = F i ( p i ( t ) , p − i ( t ); M i ) (applies to fictitious play & replicator dynamics) • What about dynamic learning rules? dp i dt ( t ) = F i ( p i ( · ) , p − i ( · ); M i ) • Marginal forecast dynamics: – React to myopic predictions – FP: Best response to forecast empirical frequency – Replicator dynamics: React to forecast fitness • Features: q ( t + γ ) ≈ q ( t )+ γdq est – Purely transient dt ( t ) – Still uncoupled! 14

Marginal forecasts • ATL traffic: “Jam Factor” Holding, Building, Clearing • Background: – Basar (1987), “Relaxation techniques and asynchronous algorithms for online computation of noncooperative equilibria” – Selten (1991), “Anticipatory learning in two-person games” – Conlisk (1993), “Adaptation in game: Two solutions to the Crawford puzzle” – Tang (2001), “Anticipatory learning in two-person games: Some experimental results” – Hess & Modjtahedzadeh (1990), “A control theoretic model of driver steering behavior” – McRuier (1980), “Human dynamics in man-machine systems” 15

Analysis: Marginal forecast fictitious play dr i dt = λ ( f i − r i ) � � d f i f − i + γdr − i dt = − f i + β i dt • Approximation for λ ≫ 1 : � � � � d 2 f i � ≤ 1 d dt − dr i f i � � � � � � � � dt 2 dt λ � � � max • Note: Auxiliary variables absent from prior impossibility result! • JSS & Arslan, 2005: For large λ – FP stable at NE p ∗ implies marginal foresight FP stable at q ∗ for 0 ≤ γ < 1 – FP unstable at p ∗ with eigenvalues x k + jy k and 1 x i γ max < 1 − γ < x 2 k + y 2 max k x k k k implies marginal foresight FP stable at p ∗ . • Similar results: – Marginal foresight replicator dynamics – Marginal foresight tatonnement 16

Transient behavior & equilibrium selection • Reinforcement learning: x i = action propensities δ ( t ) = u i ( a ( t )) x i ( t + 1) = x i ( t ) + δ ( t )( a i ( t ) − x i ( t )) , t + 1 p i ( t ) = (1 − ε ) x i ( t ) + ε N 1 u i ( a ( t )) δ std ( t ) = 1 T U i ( t ) + u i ( a ( t )) Interpretation: Increased probability of utilized action. • Dynamic reinforcement learning: Introduce running average 1 y i ( t + 1) = y i ( t ) + t + 1( x i ( t ) − y i ( t ))    + ε p i ( t ) = (1 − ε )Π ∆  x i ( t ) + γ ( x i ( t ) − y i ( t )) N 1 � �� new term 17

Marginal foresight dominance • Chasparis & JSS (2009): The pure NE a ∗ has positive probability of convergence iff 0 < γ i < u i ( a ∗ i , a − i ) − u i ( a ′ i , a ∗ − i ) + 1 ∀ a ′ i � = a ∗ , i u i ( a ′ i , a ∗ − i ) (as opposed to all pure NE) Proof: ODE method of stochastic approximation. • Implication: – Introduction of “forward looking” agent can destabilize equilibria – Surviving equilibria = equilibrium selection • For 2 × 2 symmetric coordination games – RD & not PD ⇒ foresight dominance – RD & PD & Identical interest ⇒ foresight dominance – RD & PD together �⇒ foresight dominance 18

Illustration: Network formation • Setup: – Agents form costly links with other agents – Benefits inherited from connectivity � � � � u i ( a ( t )) = # of connections to i − κ · # of links by i • Properties: – Nash networks are “critically connected” – Wheel network is unique efficient network – Chasparis & JSS (2009): The wheel network is foresight dominant. • Recent work considers transient establishment costs 19

Stability and Selection in Game Theoretic Learning Jeff S Shamma - PowerPoint PPT Presentation

Stability and Selection in Game Theoretic Learning Jeff S Shamma Georgia Institute of Technology Joint work with G urdal Arslan, Georgios Chasparis & Michael J. Fox Valuetools 2011 Georgia Institute of Technology May 18, 2011

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure

Game Theoretic Pragmatics Michael Franke Preliminaries Game Theory Fundamentals Interpretation

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law General game playing

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

VIDEOGAMES ARE A MESS Ian Bogost WHAT IS A GAME? Is a game a system of rules, or is a game a

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Connect your device to application GAME ENGINE ON ANDROID Julian Chu Agenda We Love Game Why

Recursion Theoretic Results for the Game of Cops and Robbers on Graphs Shelley Stahl University

A Game-Theoretic Approach to Network Security Mohammad Pirani and Henrik Sandberg Department of

Intoduction to the Fifth Workshop Game-Theoretic Probability and Related Topics Glenn Shafer 13

Incentives in Crowdsourcing: A Game-theoretic Approach ARPITA GHOSH Cornell University NIPS

Computing Game-Theoretic Solutions for Security Vincent Conitzer Dmytro Korzhyk Dmytro Korzhyk

Derandomization in Game- Theoretic Probability Kenshi Miyabe, Meiji University, Japan (joint

Dynamic Mechanism Design: Revenue Equivalence, Prot Maximization, and Information Disclosure

Stability of Feedback Equilibrium Solutions for Noncooperative Differential Games Alberto Bressan

Competing Network Technologies The Role of Gateways Roch Gurin Dept. Elec. & Sys. Eng

Combinatorial Auctions with Item Bidding: Equilibria and Dynamics Thomas Kesselheim Max Planck

Dynamic Systems Using R for Systems Understanding A Dynamic Approach Evolution of systems in

Dynamic Games in Environmental Economics PhD minicourse Part I: Repeated Games and Self-Enforcing

Controlling inflation with timid monetary-fiscal regime changes Guido Ascari, University of Oxford

About Chemical Equilibrium and Free Energy UNIT 6 DAY 3 What are we going to learn today?

Stability and Selection in Game Theoretic Learning Jeff S Shamma - PowerPoint PPT Presentation

Stability and Selection in Game Theoretic Learning Jeff S Shamma Georgia Institute of Technology Joint work with G urdal Arslan, Georgios Chasparis & Michael J. Fox Valuetools 2011 Georgia Institute of Technology May 18, 2011

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Game interoperability with functors functor AgsFun (structure Game : GAME) :&gt; sig structure

Game Theoretic Pragmatics Michael Franke Preliminaries Game Theory Fundamentals Interpretation

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law General game playing

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

VIDEOGAMES ARE A MESS Ian Bogost WHAT IS A GAME? Is a game a system of rules, or is a game a

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Connect your device to application GAME ENGINE ON ANDROID Julian Chu Agenda We Love Game Why

Recursion Theoretic Results for the Game of Cops and Robbers on Graphs Shelley Stahl University

A Game-Theoretic Approach to Network Security Mohammad Pirani and Henrik Sandberg Department of

Intoduction to the Fifth Workshop Game-Theoretic Probability and Related Topics Glenn Shafer 13

Incentives in Crowdsourcing: A Game-theoretic Approach ARPITA GHOSH Cornell University NIPS

Computing Game-Theoretic Solutions for Security Vincent Conitzer Dmytro Korzhyk Dmytro Korzhyk

Derandomization in Game- Theoretic Probability Kenshi Miyabe, Meiji University, Japan (joint

Dynamic Mechanism Design: Revenue Equivalence, Prot Maximization, and Information Disclosure

Stability of Feedback Equilibrium Solutions for Noncooperative Differential Games Alberto Bressan

Competing Network Technologies The Role of Gateways Roch Gurin Dept. Elec. &amp; Sys. Eng

Combinatorial Auctions with Item Bidding: Equilibria and Dynamics Thomas Kesselheim Max Planck

Dynamic Systems Using R for Systems Understanding A Dynamic Approach Evolution of systems in

Dynamic Games in Environmental Economics PhD minicourse Part I: Repeated Games and Self-Enforcing

Controlling inflation with timid monetary-fiscal regime changes Guido Ascari, University of Oxford

About Chemical Equilibrium and Free Energy UNIT 6 DAY 3 What are we going to learn today?

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Competing Network Technologies The Role of Gateways Roch Gurin Dept. Elec. & Sys. Eng