Preference-dependent learning in the Centipede Game Astrid Gamba 1 Tobias Regner 2 1 University of Milan-Bicocca 2 Max Planck Institute, Jena Game Theory at the Universities of Milano III, 24 May 2013 A. Gamba, T. Regner 07/05 1 / 36
Introduction Aim of this paper : to explain heterogeneous behavior in the Centipede game, by means of an experiment driven by the theory on Self-confirming equilibrium (Battigalli, 1987; Fudenberg and Levine, 1993; Dekel et al., 2004). Contribution : behavior in the long run is the result of a learning process driven by players’ preference types and based on own observations of co-players’ behavior: given very plausible limitations to the evidence that can be collected, off-path prediction errors may persist in the long run and contribute to sustain heterogeneity of behavior (some agents unravel and some don’t). A. Gamba, T. Regner 07/05 2 / 36
Self-confirming equilibrium Hann, 1973; Battigalli, 1987; Battigalli and Guaitoli, 1988; Fudenberg and Levine, 1993a SCE describes steady states where agents best respond to confirmed beliefs about the play. Two conditions: rationality : players maximize their (subjective) expected utility; confirmation of beliefs (instead of correctness as in Nash): agents’ equilibrium conjectures on the opponents’ strategies are consistent with the evidence they can collect given the strategies adopted. A. Gamba, T. Regner 07/05 3 / 36
Learning interpretation of Self-confirming equilibrium Basic intuition: agents’ beliefs come from a large set of observations of the opponent’s play acquired along recurrent play of the same game. Partial evidence on the opponent’s strategy (own payoff? terminal node? ...) Subjective probabilities of the opponent’s strategies may be different from their objective probabilities. Crucial: especially in an extensive-form game off-path prediction errors may persist in the long run. Learning foundation for SCE: Fudenberg and Levine,1993b and 2006; Fudenberg and Kreps, 1995. A. Gamba, T. Regner 07/05 4 / 36
Other experiments related to Self-confirming equilibrium Fudenberg and Levine (1997): measure losses in payoffs due to limited information about the play; Maniadis (2011): experimental study on whether aggregate information release causes more or less pro-social behavior in the Centipede Game (SCE of an incomplete information game with a small fraction of altruists). A. Gamba, T. Regner 07/05 5 / 36
Other solution concepts applied to the Centipede Game Other solution concepts used to rationalize behavior in the Centipede Game Agent Quantal Response equilibrium (McKelvey and Palfrey, 1992): agents imperfectly respond to correct beliefs about the play. Analogy-based Expectation equilibrium (Jehiel, 2005; Huck and Jehiel, 2004): agents best respond to coarse beliefs about the play (they bundle opponent’s information sets in analogy classes). Cox and James (2012): ” Exploration of the impact of exogenously varied provision of information on past play in these games is an interesting topic for future research, and one that could help further establish the suitability of candidate explanatory models .”(Econometrica, 80(2), p.902) A. Gamba, T. Regner 07/05 6 / 36
The model underlying our experiment Two-player extensive form game (6-stages Centipede Game, CG). In each role/player i there is a large population of agents with heterogeneous preferences (more or less joint payoff maximizers): θ ∈ [ 0, 1 ] , with distributions q i and q j . Agents are drawn at random to play CG and play pure strategies. Each player (role) i plays a mixed strategy σ i ∈ ∆ ( S i ) , induced by q i and the pure strategies adopted by each preference type in i ’role, i.e., s i , θ . We allow agents to have heterogeneous conjectures on the opponent’s (mixed) strategy: µ i , θ ∈ ∆ ( S j ) A. Gamba, T. Regner 07/05 7 / 36
The model underlying our experiment Assume that agents don’t know of the distribution of preference types in either population. Denote π ( z | s i , θ ; q j , σ ) the objective probability that preference type θ observes terminal node z given his own move, the move by Nature and the mixed strategy of the opponent. Denote ρ ( z | s i , θ ; µ i , θ ) the subjective probability of observing terminal node z as assessed by preference type θ given his own strategy and his conjecture about the opponent’s mixed strategy. Assume that after having played agents can only observe the terminal node reached. A. Gamba, T. Regner 07/05 8 / 36
Self-confirming equilibrium of an extensive-form game with heterogeneous preference types Definition A profile of mixed strategies ( σ i ) i ∈ I is a self-confirming equilibrium if for each preference type θ we can find a conjecture µ i , θ s.t. for each s i , θ ∈ supp σ i � � i) s i , θ ∈ arg max s i ∈ S i ∑ s j ∈ S j µ i , θ ( s j ) U θ ( s i , θ , s j ) and ii) ∀ z ∈ Z , ρ ( z | s i , θ ; µ i , θ ) = π ( z | s i , θ ; q j , σ ) A. Gamba, T. Regner 07/05 9 / 36
Self-confirming equilibrium of an extensive-form game with heterogeneous preference types: an example (Gamba 2013) Joint payoff maximizers always choose ”across”, whichever their conjectures. Assume that selfish agents in role 1 believe that A ′ and a are unlikely to be played (prob. < 1 / 3 both to the set of co-player’s strategies that prescribe A ′ and to the set of co-player’s strategies that prescribe a → they always play ”down”. A. Gamba, T. Regner 07/05 10 / 36
The experiment Research question: which is the role of incorrect off-path beliefs in determining long run outcomes of the CG and how they interact with social preferences. How: Behavior of different (social) preference types along 40 rounds of the CG We manipulate access to information about opponent’s play (personal/public) and study how the long run outcomes vary across treatments (from SCE to Nash?). A. Gamba, T. Regner 07/05 11 / 36
The design Jena, 8 sessions: 32 subjects per session (tot. 256); 40 repetitions of the 6-stage Centipede Game Anonymous matches Elicitation of preferences two weeks before the sessions Elicitation of behavior in the CG: sequential response method and strategy method Elicitation of beliefs in round 1 (before they play), and 17, 18, 19, 40 (after they have played) Two treatments with two different ex post information structures: personal versus public information A. Gamba, T. Regner 07/05 12 / 36
Preferences Elicitation Two steps: (1) Elicitation via Social Value Orientation (Murphy et al., 2011) 15 menus of allocations of payoff for self and payoff for other we consider only a subset of sliders (4) which are relevant for joint payoff maximizing concerns. Example: we obtain θ by computing (and normalizing): arctan ( π o − 50 π s − 50 ) we split types at θ = 1 2 A. Gamba, T. Regner 07/05 13 / 36
Preference Elicitation (2) Check that types elicited via the SVO test are meaningful in the context of a trust game (played before the 40 rounds of the CG). A. Gamba, T. Regner 07/05 14 / 36
Preference Elicitation 67% of the agents choosing b 2 are high types and 68% of the agents choosing a 2 are high types. A. Gamba, T. Regner 07/05 15 / 36
The Centipede Game A. Gamba, T. Regner 07/05 16 / 36
Two ex post information structures We manipulate access to information, i.e., information feedbacks about the opponent’s moves after each round of play. In the SEQUENTIAL RESPONSE METHOD: Personal information : agents observe the terminal node reached in their own past match (actions of the agent they just met ). Public information : agents are informed about average conditional frequencies of opponent’s actions (averaged across all agents in the opponent population ). A. Gamba, T. Regner 07/05 17 / 36
Two ex post information structures In the STRATEGY METHOD: Personal information (as above): agents observe the terminal node reached in their own past match (actions of the agent they just met ) Public information : players are informed about frequencies of strategies implemented by agents in the opponent population in the round just played. A. Gamba, T. Regner 07/05 18 / 36
Ex post information structures and learning Personal information : agents in population i learn the conditional frequencies of opponent’s actions at opponent’s information sets personally visited with positive frequency under ( s i , σ j ) ; Public information in the sequential response method : agents i learn the conditional frequencies of opponent’s actions at opponent’s information sets visited with positive frequency by population i under ( σ i , σ j ) ; Public information in the strategy method : agents i learn the objective probability of strategies adopted by the opponent’s j . A. Gamba, T. Regner 07/05 19 / 36
Results Aggregate behavior There is no significant difference in direct response versus strategy method in the personal information treatment. A. Gamba, T. Regner 07/05 20 / 36
Recommend
More recommend