An Approximate Subgame-Perfect Equilibrium Computation Technique for - - PowerPoint PPT Presentation

an approximate subgame perfect equilibrium computation
SMART_READER_LITE
LIVE PREVIEW

An Approximate Subgame-Perfect Equilibrium Computation Technique for - - PowerPoint PPT Presentation

An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games Andriy Burkov Universit e Laval, Canada July 15, 2010 Andriy Burkov, Universit e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation


slide-1
SLIDE 1

An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games

Andriy Burkov

Universit´ e Laval, Canada

July 15, 2010

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 1/60

slide-2
SLIDE 2

Plan

Motivation Game Theory Background Problem and Approach Conclusion and Future Work

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 2/60

slide-3
SLIDE 3

Plan

Motivation Game Theory Background Problem and Approach Conclusion and Future Work

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 3/60

slide-4
SLIDE 4

Motivation

Discover an algorithmic way for:

Finding equilibrium solutions for dynamic games Computing equilibrium strategies for dynamic game players

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 4/60

slide-5
SLIDE 5

Motivation: Example

Prisoner’s Dilemma Player 1 Player 2

C D C

2, 2 −1, 4

D

4, −1 0, 0 When the discount factor is close enough to 1, the long-term average payoff profile (2, 2) is an equilibrium point and there is a strategy, which each player can adopt for generating that point: Tit-For-Tat For an arbitrary discount factor, we don’t usually know:

What is the set of equilibrium points? What are the strategies of players that generate those equilibrium points?

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 5/60

slide-6
SLIDE 6

Plan

Motivation Game Theory Background Problem and Approach Conclusion and Future Work

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 6/60

slide-7
SLIDE 7

Stage-games

A stage-game is a tuple (N, {Ai}i∈N, {ri}i∈N):

N is a finite set of players Ai is a finite set of pure actions of player i ∈ N ri is the payoff function of player i: ri : A → R

where A ≡ ×i∈NAi defines the set of action profiles

Example: Prisoner’s Dilemma

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 7/60

slide-8
SLIDE 8

Stage-games

A stage-game is a tuple (N, {Ai}i∈N, {ri}i∈N):

N is a finite set of players Ai is a finite set of pure actions of player i ∈ N ri is the payoff function of player i: ri : A → R

where A ≡ ×i∈NAi defines the set of action profiles

Example: Prisoner’s Dilemma Player 1 Player 2

C D C

2, 2 −1, 4

D

4, −1 0, 0 N = {1, 2}, A1 = A2 = {C, D}, r1(C, C) = 2, r1(C, D) = −1, r1(D, C) = 4, . . .

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 8/60

slide-9
SLIDE 9

Repeated games

In an infinitely repeated game, a certain stage-game is repeatedly played by the same set of players during an a priori unknown number of time-steps There is a probability of γ that the repeated game will continue after the current stage-game

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 9/60

slide-10
SLIDE 10

Repeated games

In an infinitely repeated game, a certain stage-game is repeatedly played by the same set of players during an a priori unknown number of time-steps There is a probability of γ that the repeated game will continue after the current stage-game

t=0 t=1

...

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 10/60

slide-11
SLIDE 11

Strategies

The set of histories up to time-step t of the repeated game is given by Ht ≡ ×tA The set of all possible histories is given by H ≡ ∞

t=0 Ht with

h ∈ H being a particular history A mixed strategy of player i is a mapping σi : H → ∆(Ai) with αi ∈ ∆(Ai) being a mixed action of player i

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 11/60

slide-12
SLIDE 12

Nash equilibrium

Let σi ∈ Σi be a strategy of player i Let σ ∈ Σ ≡ ×iΣi be a strategy profile An outcome path is a possibly infinite sequence

  • a ≡ (a0, a1, . . .) of action profiles

The discounted average payoff of σ for player i is defined as uγ

i (σ) ≡ (1 − γ) E a∼σ ∞

  • t=0

γtri(at),

The discount factor can be seen as a patience of players: higher it is, more important are future payoffs

A Nash equilibrium is defined as strategy profile σ ≡ (σi, σ−i) such that for each player i and for every σ′

i ∈ Σi:

i (σ) ≥ uγ i (σ′ i, σ−i)

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 12/60

slide-13
SLIDE 13

Subgame-perfect equilibrium

A subgame is a repeated game which continues after a certain history For a pair (σ, h), the subgame strategy profile induced by h is denoted as σ|h A strategy profile σ is a subgame-perfect equilibrium (SPE) in a repeated game, if for all histories h ∈ H, the subgame strategy profile σ|h is a Nash equilibrium in the subgame

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 13/60

slide-14
SLIDE 14

Augmented games

Let be a stage-game: Player 1 Player 2

C D C

r(C, C) r(C, D)

D

r(D, C) r(D, D) Given a strategy profile σ, after any history ht, one can represent an (infinite) subgame as an augmented stage-game:

Player 1 Player 2

C D C

(1 − γ)r(C, C) + γuγ(σ|ht·(C,C)) (1 − γ)r(C, D) + γuγ(σ|ht·(C,D))

D

(1 − γ)r(D, C) + γuγ(σ|ht·(D,C)) (1 − γ)r(D, D) + γuγ(σ|ht·(D,D))

The strategy profile σ is called subgame perfect equilibrium if it induces a Nash equilibrium in each augmented stage-game.

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 14/60

slide-15
SLIDE 15

Plan

Motivation Game Theory Background Problem and Approach Conclusion and Future Work

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 15/60

slide-16
SLIDE 16

Problem and Approach

Problem: Given a discount factor γ and payoff functions of players, find the set of SPE entirely or partially Previous work includes:

All works on computing stage-game equilibria (ex: Lemke & Howson (1965), Porter et al. (2004)) Littman & Stone (2004): only for average payoff (i.e., γ = 1) Judd et al. (2003): arbitrary γ but only pure action equilibria

Our approach: dynamic programming over the set of equilibrium payoff profiles

Permits computing SPE for an arbitrary γ, including pure and mixed action equilibria Based on two ideas: self-generating sets and partitioning of hypercubes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 16/60

slide-17
SLIDE 17

Self-generation

Let BRi(α) be a best response of player i in a stage-game to the mixed action profile α ≡ (αi, α−i): BRi(α) ≡ max

ai∈Ai ri(ai, α−i).

We define the map Bγ on a set W ⊂ R|N| as Bγ(W) ≡

  • (α,w)∈×i∈N ∆(Ai)×W

(1 − γ)r(α) + γw,

w is a continuation promise which verifies for all i ∈ N: (1 − γ)ri(α) + γwi − (1 − γ)ri(BRi(α), α−i) − γwi ≥ 0, wi ≡ infw∈W wi

The largest fixed point of Bγ(W) is the set of all SPE in the repeated game (Abreu, 1990)

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 17/60

slide-18
SLIDE 18

Self-generation

Recall the two self-generation equations: Bγ(W) ≡

  • (α,w)∈×i∈N ∆(Ai)×W

(1 − γ)r(α) + γw (1) (1−γ)ri(α)+γwi−(1−γ)ri(BRi(α), α−i)−γwi ≥ 0 ∀i (2) Equation (1) promises to player i ∈ N a better payoff tomorrow to compensate a possible today’s loss if player i follows a given strategy Equation (2) guarantees to player i a sufficient punishment imposed by the other players if player i deviates from the given strategy

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 18/60

slide-19
SLIDE 19

Updates by hypercubes

Our algorithm starts with an initial approximation W of the set of SPE payoff profiles The set W, in turn, is represented by a union of disjoint hypercubes belonging to the set C Initially, the set C, contains only one hypercube that contains all possible payoff profiles Each iteration of the algorithm consists of verifying, for each hypercube c ∈ C, whether it has to be withdrawn

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 19/60

slide-20
SLIDE 20

Updates by hypercubes: Example

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 20/60

slide-21
SLIDE 21

Updates by hypercubes: Example

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 21/60

slide-22
SLIDE 22

Updates by hypercubes: Example

w

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 22/60

slide-23
SLIDE 23

Updates by hypercubes: Example

w w’

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 23/60

slide-24
SLIDE 24

Updates by hypercubes: Example

w w’

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 24/60

slide-25
SLIDE 25

Updates by hypercubes: Example

w w’

Payoffs of Player 1 Payoffs of Player 2

w

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 25/60

slide-26
SLIDE 26

Updates by hypercubes: Example

w w’

Payoffs of Player 1 Payoffs of Player 2

w

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 26/60

slide-27
SLIDE 27

Updates by hypercubes: Example

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 27/60

slide-28
SLIDE 28

Updates by hypercubes: Example

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 28/60

slide-29
SLIDE 29

Updates by hypercubes: Example

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 29/60

slide-30
SLIDE 30

Partitioning the hypercubes

If, after having tested all hypercubes in C, we haven’t withdrawn any hypercube, we partition each remaining hypercube on a number of smaller hypercubes

We retest the remaining hypercubes the same way This permits improving the precision of approximation of the set of equilibria

The algorithm terminates when the required precision is achieved

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 30/60

slide-31
SLIDE 31

Partitioning the hypercubes: Example

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 31/60

slide-32
SLIDE 32

Partitioning the hypercubes: Example

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 32/60

slide-33
SLIDE 33

Partitioning the hypercubes: Example

Payoffs of Player 1 Payoffs of Player 2

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 33/60

slide-34
SLIDE 34

The Main Theorem Theorem

For any repeated game, any discount factor γ and for any level of approximation, (i) Our algorithm terminates in finite time, (ii) the set of hypercubes C, at any moment, contains at least one hypercube, (iii) for any input v ∈ W, the algorithm returns a strategy profile (represented by a finite automaton) that satisfies the required approximation properties.

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 34/60

slide-35
SLIDE 35

Example: The Prisoner’s Dilemma

Player 1 Player 2

C D C

2, 2 −1, 4

D

4, −1 0, 0 γ = 0.7

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 35/60

slide-36
SLIDE 36

The Prisoner’s Dilemma

) , ( D C r ) , ( C C r ) , ( C D r ) , ( D D r

Iteration 1

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 36/60

slide-37
SLIDE 37

The Prisoner’s Dilemma

) , ( D C r ) , ( C C r ) , ( C D r ) , ( D D r

Iteration 4

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 37/60

slide-38
SLIDE 38

The Prisoner’s Dilemma

) , ( D C r ) , ( C C r ) , ( C D r ) , ( D D r

Iteration 8

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 38/60

slide-39
SLIDE 39

The Prisoner’s Dilemma

) , ( D C r ) , ( C C r ) , ( C D r ) , ( D D r

Iteration 12

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 39/60

slide-40
SLIDE 40

The Prisoner’s Dilemma

) , ( D C r ) , ( C C r ) , ( C D r ) , ( D D r

Iteration 20

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 40/60

slide-41
SLIDE 41

The Prisoner’s Dilemma

) , ( D C r ) , ( C C r ) , ( C D r ) , ( D D r

Iteration 30

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 41/60

slide-42
SLIDE 42

The Prisoner’s Dilemma

) , ( D C r ) , ( C C r ) , ( C D r ) , ( D D r

Iteration 50

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 42/60

slide-43
SLIDE 43

Plan

Motivation Game Theory Background Problem and Approach Conclusion and Future Work

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 43/60

slide-44
SLIDE 44

Conclusion and Future Work

We proposed an algorithmic approach for approximating the set of subgame-perfect equilibrium payoff profiles in repeated games Our algorithm is capable of computing a profile of player strategies that approximately induces any given SPE point Future work will aim at extending the proposed approach for solving more complex dynamic games such as Markov chain games and stochastic games

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 44/60

slide-45
SLIDE 45

Thank you!

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 45/60

slide-46
SLIDE 46

Another Example: Battle of the Sexes (γ = 0.45)

O F O

1, 2 0, 0

F

0, 0 2, 1 Stage-game equilibrium payoff profiles: (1, 2) (2, 1) (2/3, 2/3)

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 46/60

slide-47
SLIDE 47

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 47/60

slide-48
SLIDE 48

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 48/60

slide-49
SLIDE 49

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 49/60

slide-50
SLIDE 50

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 50/60

slide-51
SLIDE 51

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 51/60

slide-52
SLIDE 52

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 52/60

slide-53
SLIDE 53

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 53/60

slide-54
SLIDE 54

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 54/60

slide-55
SLIDE 55

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 55/60

slide-56
SLIDE 56

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 56/60

slide-57
SLIDE 57

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 57/60

slide-58
SLIDE 58

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 58/60

slide-59
SLIDE 59

Example: Repeated Battle of the Sexes

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 59/60

slide-60
SLIDE 60

Automaton implementation

Let M ≡ (Q, q0, f, τ) be an automaton implementation of a strategy profile σ where

Q, set of automaton states with q0 ∈ Q being the initial state f ≡ (fi)i∈N, where fi : Q → ∆(Ai), la fonction de d´ ecision du joueur i τ : Q × A → Q, une fonction de transition

Theorem (Kalai and Stanford, 1988)

Any SPE can be approximated by a finite automaton

Andriy Burkov, Universit´ e Laval, Canada An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games 60/60