Multi-agent learning Rep eated games Gerard Vreeswijk , - PowerPoint PPT Presentation

Plan for today ■ NE in normal form games that are repeated a finite number of times. ● Principle of indu tion . ba kw a rd ■ NE in normal form games that are repeated an indefinite number of times. r . Models the probability of continuation. ● Dis ount fa to rem . (Actually many FT’s.) Repeated games generally do ● F olk theo have infinitely many Nash equilibria. strategy . “on-path” vs. “off-path” play, “minmax” as a ● T rigger threat. This presentation draws heavily on (Peters, 2008). * H. Peters (2008): Game Theory: A Multi-Leveled Approach . Springer, ISBN: 978-3-540- 69290-4. Ch. 8: Repeated games. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

Part I: Nash equilibria Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

Part I: Nash equilibria in normal form games Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

Part I: Nash equilibria in normal form games that are repeated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

Part I: Nash equilibria in normal form games that are repeated a finite number of times Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

one Nash equilib rium D P a reto sub-optimal Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

P a reto sub-optimal Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto ■ Does the situation change if two parties get to play the Prisoners’ Dilemma two times in succession? Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto ■ Does the situation change if two parties get to play the Prisoners’ Dilemma two times in succession? ■ The following diagram (hopefully) shows that playing the PD two times in succession does not yield an essentially new NE. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Nash equilibria in playing the PD twice ( 2 , 2 ) C C C D D C D D ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) C C C D D C D D C C C D D C D D C C C D D C D D C C C D D C D D ( 6 , 6 ) ( 3 , 8 ) ( 8 , 3 ) ( 4 , 4 ) ( 3 , 8 ) ( 0 , 10 ) ( 5 , 5 ) ( 1 , 6 ) ( 8 , 3 ) ( 5 , 5 ) ( 10 , 0 ) ( 6 , 1 ) ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 6

Nash equilibria in playing the PD twice ( 2 , 2 ) C C C D D C D D ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) C C C D D C D D C C C D D C D D C C C D D C D D C C C D D C D D ( 6 , 6 ) ( 3 , 8 ) ( 8 , 3 ) ( 4 , 4 ) ( 3 , 8 ) ( 0 , 10 ) ( 5 , 5 ) ( 1 , 6 ) ( 8 , 3 ) ( 5 , 5 ) ( 10 , 0 ) ( 6 , 1 ) ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) P.S. This is just a payoff tree, not a game in extensive form! Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 6

Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. ■ With 3 successive games, we obtain a 2 3 × 2 3 matrix, where the action profile ( DDD , DDD ) still would be the only Nash equilibrium. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. ■ With 3 successive games, we obtain a 2 3 × 2 3 matrix, where the action profile ( DDD , DDD ) still would be the only Nash equilibrium. ■ Generalise to N repetitions: ( D D N − 1 , DD N − 1 ) still is the only Nash equilibrium in a repeated game where the PD is played N times in succession. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

Part II: Nash equilibria Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

Part II: Nash equilibria in normal form games Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

Part II: Nash equilibria in normal form games that are repeated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

Part II: Nash equilibria in normal form games that are repeated an indefinite number of times Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. ■ . . . an infinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. ■ . . . an infinite number of times. When throwing a dice this must mean a in�nite number of times. ountably Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

inde�nite dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

inde�nite dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). ■ Various rems state the existence of multiple equilibria in F olk theo games that are repeated an indefinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). ■ Various rems state the existence of multiple equilibria in F olk theo games that are repeated an indefinite number of times. ■ Here we discuss one version of “the” Folk Theorem. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

Family of Folk Theorems There actually exist many Folk Theorems. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. ■ Equilibrium . We may be interested in Nash equilibria (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. ■ Equilibrium . We may be interested in Nash equilibria (present case), or other types of equilibria, such as so-called ǫ -Nash equilibria or so-called correlated equilibria . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

rep eated game stage game histo ry The concept of a repeated game Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

rep eated game stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times ■ The rep eated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage ■ A ry h of length t of a repeated game is a sequence of action histo profiles of length t . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage ■ A ry h of length t of a repeated game is a sequence of action histo profiles of length t . Example: (for the prisoner’s dilemma): Row player: C D D D C C D D D D Column player: C D D D D D D C D D 0 1 2 3 4 5 6 7 8 9 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

strategy strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

strategy strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . ■ A ro�le s is a combination of strategies, one for each player. strategy p Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . ■ A ro�le s is a combination of strategies, one for each player. strategy p ■ The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example on next page. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; δ = 1/2. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; δ = 1/2. �� 1 � � t ∞ ∑ Expected payoff 1 ( s ) = [ 0.8 ( 0.7 · 3 + 0.3 · 0 ) + 0.2 ( 0.7 · 5 + 0.3 · 1 )] 2 t = 0 1 1 = 1 − 1/2 [ . . . ] ≈ 1 − 1/22.44 = 2 × 2.44 = 4.88. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. This is true: given that one player always defects, it never pays off for the other player to play C at any time. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. This is true: given that one player always defects, it never pays off for the other player to play C at any time. Therefore, everyone sticks to D ∗ . 1 A notation like D ∗ or (worse) D ∞ is suggestive. Mathematically it makes no sense, but intuitively it does. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

Part III: Trigger strategies Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 16

Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Claim . The strategy profile ( T , T ) is a subgame perfect equilibrium in G ∗ ( δ ) , provided the probability of continuation, δ , is sufficiently large. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Claim . The strategy profile ( T , T ) is a subgame perfect equilibrium in G ∗ ( δ ) , provided the probability of continuation, δ , is sufficiently large. Proof. Suppose one player starts to defect at Round N . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Multi-agent learning Rep eated games Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning Rep eated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Friday 3 rd May, 2019 interation lea rning stage game nite

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Sampling, Virtual Trackball, Hidden Surfaces Week 5, Tue Jun 7

Information security has relied upon the following pillars: Confidentiality only allow

CS 170 Section 13 Multiplicative Updates Owen Jow April 25, 2018 University of California,

Slide 1 / 135 Momentum Problems Slide 2 / 135 Momentum of a Single Object Slide 3 / 135 1

Option Values, Arrays, Sequences, and Lazy Evaluation Bjrn Lisper School of Innovation,

Course Aims Introduce a declarative style of programming Fundamental elements of Prolog:

Lingt Yoo X atngi Sh To Dultowu vocabulary structure Text Recommendation by:

How to Improve the Quality of non- How to Improve the Quality of non -Agricultural Agricultural