Preference-dependent learning in the Centipede Game Astrid Gamba 1 - PowerPoint PPT Presentation

Preference-dependent learning in the Centipede Game Astrid Gamba 1 Tobias Regner 2 1 University of Milan-Bicocca 2 Max Planck Institute, Jena Game Theory at the Universities of Milano III, 24 May 2013 A. Gamba, T. Regner 07/05 1 / 36

Introduction Aim of this paper : to explain heterogeneous behavior in the Centipede game, by means of an experiment driven by the theory on Self-confirming equilibrium (Battigalli, 1987; Fudenberg and Levine, 1993; Dekel et al., 2004). Contribution : behavior in the long run is the result of a learning process driven by players’ preference types and based on own observations of co-players’ behavior: given very plausible limitations to the evidence that can be collected, off-path prediction errors may persist in the long run and contribute to sustain heterogeneity of behavior (some agents unravel and some don’t). A. Gamba, T. Regner 07/05 2 / 36

Self-confirming equilibrium Hann, 1973; Battigalli, 1987; Battigalli and Guaitoli, 1988; Fudenberg and Levine, 1993a SCE describes steady states where agents best respond to confirmed beliefs about the play. Two conditions: rationality : players maximize their (subjective) expected utility; confirmation of beliefs (instead of correctness as in Nash): agents’ equilibrium conjectures on the opponents’ strategies are consistent with the evidence they can collect given the strategies adopted. A. Gamba, T. Regner 07/05 3 / 36

Learning interpretation of Self-confirming equilibrium Basic intuition: agents’ beliefs come from a large set of observations of the opponent’s play acquired along recurrent play of the same game. Partial evidence on the opponent’s strategy (own payoff? terminal node? ...) Subjective probabilities of the opponent’s strategies may be different from their objective probabilities. Crucial: especially in an extensive-form game off-path prediction errors may persist in the long run. Learning foundation for SCE: Fudenberg and Levine,1993b and 2006; Fudenberg and Kreps, 1995. A. Gamba, T. Regner 07/05 4 / 36

Other experiments related to Self-confirming equilibrium Fudenberg and Levine (1997): measure losses in payoffs due to limited information about the play; Maniadis (2011): experimental study on whether aggregate information release causes more or less pro-social behavior in the Centipede Game (SCE of an incomplete information game with a small fraction of altruists). A. Gamba, T. Regner 07/05 5 / 36

Other solution concepts applied to the Centipede Game Other solution concepts used to rationalize behavior in the Centipede Game Agent Quantal Response equilibrium (McKelvey and Palfrey, 1992): agents imperfectly respond to correct beliefs about the play. Analogy-based Expectation equilibrium (Jehiel, 2005; Huck and Jehiel, 2004): agents best respond to coarse beliefs about the play (they bundle opponent’s information sets in analogy classes). Cox and James (2012): ” Exploration of the impact of exogenously varied provision of information on past play in these games is an interesting topic for future research, and one that could help further establish the suitability of candidate explanatory models .”(Econometrica, 80(2), p.902) A. Gamba, T. Regner 07/05 6 / 36

The model underlying our experiment Two-player extensive form game (6-stages Centipede Game, CG). In each role/player i there is a large population of agents with heterogeneous preferences (more or less joint payoff maximizers): θ ∈ [ 0, 1 ] , with distributions q i and q j . Agents are drawn at random to play CG and play pure strategies. Each player (role) i plays a mixed strategy σ i ∈ ∆ ( S i ) , induced by q i and the pure strategies adopted by each preference type in i ’role, i.e., s i , θ . We allow agents to have heterogeneous conjectures on the opponent’s (mixed) strategy: µ i , θ ∈ ∆ ( S j ) A. Gamba, T. Regner 07/05 7 / 36

The model underlying our experiment Assume that agents don’t know of the distribution of preference types in either population. Denote π ( z | s i , θ ; q j , σ ) the objective probability that preference type θ observes terminal node z given his own move, the move by Nature and the mixed strategy of the opponent. Denote ρ ( z | s i , θ ; µ i , θ ) the subjective probability of observing terminal node z as assessed by preference type θ given his own strategy and his conjecture about the opponent’s mixed strategy. Assume that after having played agents can only observe the terminal node reached. A. Gamba, T. Regner 07/05 8 / 36

Self-confirming equilibrium of an extensive-form game with heterogeneous preference types Definition A profile of mixed strategies ( σ i ) i ∈ I is a self-confirming equilibrium if for each preference type θ we can find a conjecture µ i , θ s.t. for each s i , θ ∈ supp σ i � � i) s i , θ ∈ arg max s i ∈ S i ∑ s j ∈ S j µ i , θ ( s j ) U θ ( s i , θ , s j ) and ii) ∀ z ∈ Z , ρ ( z | s i , θ ; µ i , θ ) = π ( z | s i , θ ; q j , σ ) A. Gamba, T. Regner 07/05 9 / 36

Self-confirming equilibrium of an extensive-form game with heterogeneous preference types: an example (Gamba 2013) Joint payoff maximizers always choose ”across”, whichever their conjectures. Assume that selfish agents in role 1 believe that A ′ and a are unlikely to be played (prob. < 1 / 3 both to the set of co-player’s strategies that prescribe A ′ and to the set of co-player’s strategies that prescribe a → they always play ”down”. A. Gamba, T. Regner 07/05 10 / 36

The experiment Research question: which is the role of incorrect off-path beliefs in determining long run outcomes of the CG and how they interact with social preferences. How: Behavior of different (social) preference types along 40 rounds of the CG We manipulate access to information about opponent’s play (personal/public) and study how the long run outcomes vary across treatments (from SCE to Nash?). A. Gamba, T. Regner 07/05 11 / 36

The design Jena, 8 sessions: 32 subjects per session (tot. 256); 40 repetitions of the 6-stage Centipede Game Anonymous matches Elicitation of preferences two weeks before the sessions Elicitation of behavior in the CG: sequential response method and strategy method Elicitation of beliefs in round 1 (before they play), and 17, 18, 19, 40 (after they have played) Two treatments with two different ex post information structures: personal versus public information A. Gamba, T. Regner 07/05 12 / 36

Preferences Elicitation Two steps: (1) Elicitation via Social Value Orientation (Murphy et al., 2011) 15 menus of allocations of payoff for self and payoff for other we consider only a subset of sliders (4) which are relevant for joint payoff maximizing concerns. Example: we obtain θ by computing (and normalizing): arctan ( π o − 50 π s − 50 ) we split types at θ = 1 2 A. Gamba, T. Regner 07/05 13 / 36

Preference Elicitation (2) Check that types elicited via the SVO test are meaningful in the context of a trust game (played before the 40 rounds of the CG). A. Gamba, T. Regner 07/05 14 / 36

Preference Elicitation 67% of the agents choosing b 2 are high types and 68% of the agents choosing a 2 are high types. A. Gamba, T. Regner 07/05 15 / 36

The Centipede Game A. Gamba, T. Regner 07/05 16 / 36

Two ex post information structures We manipulate access to information, i.e., information feedbacks about the opponent’s moves after each round of play. In the SEQUENTIAL RESPONSE METHOD: Personal information : agents observe the terminal node reached in their own past match (actions of the agent they just met ). Public information : agents are informed about average conditional frequencies of opponent’s actions (averaged across all agents in the opponent population ). A. Gamba, T. Regner 07/05 17 / 36

Two ex post information structures In the STRATEGY METHOD: Personal information (as above): agents observe the terminal node reached in their own past match (actions of the agent they just met ) Public information : players are informed about frequencies of strategies implemented by agents in the opponent population in the round just played. A. Gamba, T. Regner 07/05 18 / 36

Ex post information structures and learning Personal information : agents in population i learn the conditional frequencies of opponent’s actions at opponent’s information sets personally visited with positive frequency under ( s i , σ j ) ; Public information in the sequential response method : agents i learn the conditional frequencies of opponent’s actions at opponent’s information sets visited with positive frequency by population i under ( σ i , σ j ) ; Public information in the strategy method : agents i learn the objective probability of strategies adopted by the opponent’s j . A. Gamba, T. Regner 07/05 19 / 36

Results Aggregate behavior There is no significant difference in direct response versus strategy method in the personal information treatment. A. Gamba, T. Regner 07/05 20 / 36

Preference-dependent learning in the Centipede Game Astrid Gamba 1 - PowerPoint PPT Presentation

Preference-dependent learning in the Centipede Game Astrid Gamba 1 Tobias Regner 2 1 University of Milan-Bicocca 2 Max Planck Institute, Jena Game Theory at the Universities of Milano III, 24 May 2013 A. Gamba, T. Regner 07/05 1 / 36

Overview Weaknesses of NE 1 Example 1: Centipede Game Example 2: Matching Pennies Logit QRE 2

3. Preference Learning Techniques a. Learning Utility Functions b. Learning Preference

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

Centipede Haoran Geng,Xuyang Liu, Ziyu Zhou, Donglai Guo CSEE4840 - Embedded System Introduction

3. Preference Learning Techniques 4. Complexity of Preference Learning 5. Conclusions 1 ECAI

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure

Preference Litigation ASK LLP What is a preference? A preference is a payment made by an

Preference Relations Relations Preference Preference Relations Prof. Paolo Ciaccia Prof. Paolo

Ordinal and Cardinal Preferences A preference structure represents an agents preferences over a

Dependent Eligibility Audit Dependent Eligibility Audit Purpose: The dependent eligibility audit

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

VIDEOGAMES ARE A MESS Ian Bogost WHAT IS A GAME? Is a game a system of rules, or is a game a

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Annual General Meeting Toro Energy: Australias Leading Development Stage Maintaining

BBW Annual General Meeting 9 November 2007 Peter Hofbauer Chairman BBW Directors Peter

Regional Planning Commission Big Cedar Lake Watershed Land Use and Pollutant Loading Update June

CPO LEADERSHIP TRAINING WASHINGTON COUNTY SHERIFFS OFFICE Community Outreach Specialist

Toro Energy Limited An Emerging Uranium Producer Market, Projects and Uranium Politics Sydney

Soil biodiversity in dairy pastures: a pilot study in 2005 Maria Minor Centipede Flatworm Slug

Board of Directors Meeting August 4, 2016 Triangle Expressway Operations Andy Lelewski, PE

Done Right, Systems Engineering Drives System Integration to Zero! INCOSE Region II

Preference-dependent learning in the Centipede Game Astrid Gamba 1 - PowerPoint PPT Presentation

Preference-dependent learning in the Centipede Game Astrid Gamba 1 Tobias Regner 2 1 University of Milan-Bicocca 2 Max Planck Institute, Jena Game Theory at the Universities of Milano III, 24 May 2013 A. Gamba, T. Regner 07/05 1 / 36

Overview Weaknesses of NE 1 Example 1: Centipede Game Example 2: Matching Pennies Logit QRE 2

3. Preference Learning Techniques a. Learning Utility Functions b. Learning Preference

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

Centipede Haoran Geng,Xuyang Liu, Ziyu Zhou, Donglai Guo CSEE4840 - Embedded System Introduction

3. Preference Learning Techniques 4. Complexity of Preference Learning 5. Conclusions 1 ECAI

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Game interoperability with functors functor AgsFun (structure Game : GAME) :&gt; sig structure

Preference Litigation ASK LLP What is a preference? A preference is a payment made by an

Preference Relations Relations Preference Preference Relations Prof. Paolo Ciaccia Prof. Paolo

Ordinal and Cardinal Preferences A preference structure represents an agents preferences over a

Dependent Eligibility Audit Dependent Eligibility Audit Purpose: The dependent eligibility audit

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

VIDEOGAMES ARE A MESS Ian Bogost WHAT IS A GAME? Is a game a system of rules, or is a game a

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Annual General Meeting Toro Energy: Australias Leading Development Stage Maintaining

BBW Annual General Meeting 9 November 2007 Peter Hofbauer Chairman BBW Directors Peter

Regional Planning Commission Big Cedar Lake Watershed Land Use and Pollutant Loading Update June

CPO LEADERSHIP TRAINING WASHINGTON COUNTY SHERIFFS OFFICE Community Outreach Specialist

Toro Energy Limited An Emerging Uranium Producer Market, Projects and Uranium Politics Sydney

Soil biodiversity in dairy pastures: a pilot study in 2005 Maria Minor Centipede Flatworm Slug

Board of Directors Meeting August 4, 2016 Triangle Expressway Operations Andy Lelewski, PE

Done Right, Systems Engineering Drives System Integration to Zero! INCOSE Region II

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure