Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) = β [ α c 11 + ( 1 − α ) c 21 ] + ( 1 − β )[ α c 12 + ( 1 − α ) c 22 ] = u ′ αβ + α ( c 21 − c 22 ) + β ( c 12 − c 22 ) + c 22 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) = β [ α c 11 + ( 1 − α ) c 21 ] + ( 1 − β )[ α c 12 + ( 1 − α ) c 22 ] = u ′ αβ + α ( c 21 − c 22 ) + β ( c 12 − c 22 ) + c 22 . where u = ( r 11 − r 12 ) − ( r 21 − r 22 ) and u ′ = ( c 11 − c 21 ) − ( c 12 − c 22 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
a�ne map Gradient of expected payoff Gradient: ∂ u 1 ( α , β ) = β u + ( r 12 − r 22 ) ∂α ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Gradient of expected payoff Gradient: ∂ u 1 ( α , β ) = β u + ( r 12 − r 22 ) ∂α ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = ■ If a stationary point exists, it u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 may lie outside [ 0, 1 ] 2 . � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = ■ If a stationary point exists, it u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 may lie outside [ 0, 1 ] 2 . � + c 21 − c 22 ■ If there is a stationary point � α inside [ 0, 1 ] 2 , it is a weak (i.e., � = U + C non-strict) Nash equilibrium. β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Example: payoffs in Stag Hunt (r=4, t=3, s=1, p=3) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 7
Example: payoffs in Stag Hunt (r=4, t=3, s=1, p=3) Player 1 may only move “back – front”; Player 2 may only move “left – right”. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 7
Part 2: IGA Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 8
Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the dynamics must be confined to [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the dynamics must be confined to [ 0, 1 ] 2 . ■ Suppose the state ( α , β ) is on the boundary of the probability space [ 0, 1 ] 2 , and the gradient vector points outwards. Intuition: one of the players has an incentive to improve, but cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the ■ To maintain dynamics within [ 0, 1 ] 2 , the gradient is projected dynamics must be confined to [ 0, 1 ] 2 . back on to [ 0, 1 ] 2 . Intuition: if one of the players ■ Suppose the state ( α , β ) is on has an incentive to improve, but the boundary of the probability cannot improve, then he will not space [ 0, 1 ] 2 , and the gradient improve. vector points outwards. Intuition: one of the players has an incentive to improve, but cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the ■ To maintain dynamics within [ 0, 1 ] 2 , the gradient is projected dynamics must be confined to [ 0, 1 ] 2 . back on to [ 0, 1 ] 2 . Intuition: if one of the players ■ Suppose the state ( α , β ) is on has an incentive to improve, but the boundary of the probability cannot improve, then he will not space [ 0, 1 ] 2 , and the gradient improve. vector points outwards. ■ If nonzero, the projected Intuition: one of the players has an incentive to improve, but gradient is parallel to the (closest) boundary of [ 0, 1 ] 2 . cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point, which, in particular, is a centric point. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point, which, in particular, is a centric point. invertible (iff u = 0 or u ′ = 0), there is no stationary point. 3. If U is not Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Saddle point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 11
Gradient ascent: Coordination game Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 2 U = 2 0 has real eigenvalues: λ 2 − 4 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 2 U = 2 0 has real eigenvalues: λ 2 − 4 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 2 U = 2 0 has real eigenvalues: λ 2 − 4 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
Gradient ascent: Prisoners’ Dilemma Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . ■ Matrix � � − 1 0 U = − 1 0 has real eigenvalues: λ 2 − 1 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . ■ Matrix � � − 1 0 U = − 1 0 has real eigenvalues: λ 2 − 1 = 0. Saddle point outside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . ■ Matrix � � − 1 0 U = − 1 0 has real eigenvalues: λ 2 − 1 = 0. Saddle point outside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
Gradient ascent: Stag hunt Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 4 U = 4 0 has real eigenvalues: λ 2 − 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 4 U = 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 4 U = 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
Gradient ascent: Game of Chicken Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . ■ Matrix � � − 3 0 U = − 3 0 has real eigenvalues: λ 2 − 9 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . ■ Matrix � � − 3 0 U = − 3 0 has real eigenvalues: λ 2 − 9 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . ■ Matrix � � − 3 0 U = − 3 0 has real eigenvalues: λ 2 − 9 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
Gradient ascent: Battle of the Sexes Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . ■ Matrix � � − 4 0 U = − 4 0 has real eigenvalues: λ 2 − 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . ■ Matrix � � − 4 0 U = − 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . ■ Matrix � � − 4 0 U = − 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
Gradient ascent: Matching pennies Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � � 0 4 U = − 4 0 has imaginary eigenvalues: λ 2 + 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � � 0 4 U = − 4 0 has imaginary eigenvalues: λ 2 + 16 = 0. Centric point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � � 0 4 U = − 4 0 has imaginary eigenvalues: λ 2 + 16 = 0. Centric point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
Gradient ascent: other game with centric Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
Gradient ascent: other game with centric ■ Symmetric, zero sum: L R � − 2, 2 � T 1, 1 3, − 3 − 2, 1 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
Recommend
More recommend