Dynamics in Near-Potential Games Ozan Candogan, Asu Ozdaglar, and Pablo Parrilo Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Innovations in Algorithmic Game Theory Workshop May 2011 1
Introduction Motivating Example Two games that are close (in terms of payoffs) may have significantly different limiting dynamics: for any small θ > 0, consider the two games ˆ G and G , A B A B A 0, 1 0, 0 A 0, 1 0, 0 B 1, 0 θ , 2 B 1, 0 − θ , 2 ˆ G G The unique Nash equilibrium of ˆ G : (B,B). � � 2 3 A + 1 θ 1 3 B , 1 + θ A + The unique Nash equilibrium of G : 1 + θ B . We consider convergence of the sequence of pure strategy profiles generated by better-response dynamics (at any strategy profile, a player chosen at random updates its strategy unilaterally to one that yields a better payoff). For ˆ G , the sequence converges to the Nash equilibrium (B,B). For G , the sequence follows the better response cycle (A,A), (B,A), (B,B) and (A,B), hence it is not contained in any (pure) ǫ -equilibrium set for ǫ < 2. 2
Introduction This Paper Can we identify classes of games in which convergence of adaptive dynamics is robust to small perturbations or misspecifications of payoffs, i.e., limiting dynamics are contained in approximate equilibrium sets? As well-known, for (finite) potential games many reasonable adaptive dynamics, including best-response and fictitious play dynamics, “converge" to a Nash equilibrium. [Monderer, Shapley 96, 96], [Young 93, 04] Does this convergence behavior extend to near-potential games? Relatedly, for a given game, can we find a nearby potential game and use the distance between these games to obtain a quantitative measure of the size of the limiting approximate equilibrium set? 3
Introduction Our Contributions We study convergence properties of dynamics in finite strategic form games by exploiting their relation to close potential games. Our approach relies on using the potential function of a close potential game for the analysis of dynamics. We show that for a given game, we can find the “closest" potential game by solving a convex optimization problem. We show that many reasonable adaptive dynamics converge to an approximate equilibrium set, whose size is a function of the distance from a close potential game. For near-potential games, we obtain convergence to a small approximate equilibrium set. 4
Introduction This Talk We focus on three commonly studied update rules and show the following. For discrete time better response dynamics: The sequence of pure strategy profiles converges to a pure ǫ -equilibrium set ( ǫ is proportional to the distance of the game to a potential game). For discrete time fictitious play dynamics: The sequence of empirical frequencies converges to a neighborhood of a (mixed) equilibrium (the size of the neighborhood increases with distance of the game to a potential game). For logit-response dynamics: The stochastically stable strategy profiles are pure ǫ -equilibria ( ǫ is proportional to the distance of the game to a potential game). 5
Preliminaries Preliminaries: Strategies and Nash Equilibrium We consider finite strategic form games G = �M , { E m } m ∈M , { u m } m ∈M � . M : set of players. E m : set of strategies of player m , E = Π m ∈M E m : set of strategy profiles. u m : payoff function of player m ( u m : E → R ) Notation: p ∈ E , p − m ∈ E − m = � k � = m E k . Let ∆ E m denote the set of probability distributions on E m . We refer to x m ∈ ∆ E m as a mixed strategy of player m , and a collection of mixed strategies x = { x m } m as a mixed strategy profile. A mixed strategy profile x is a mixed ǫ -(Nash) equilibrium if for all m ∈ M , y m ∈ ∆ E m . u m ( x m , x − m ) ≥ u m ( y m , x − m ) − ǫ If ǫ = 0, then x is a Nash equilibrium. If x m is degenerate for all m , i.e., it assigns probability 1 to a single strategy, then we refer to the strategy profile as a pure equilibrium (or pure ǫ -equilibrium). 6
Preliminaries Preliminaries: Potential Games A game G is an exact potential game if there exists a function φ : E → R , such that φ ( p m , p − m ) − φ ( q m , p − m ) = u m ( p m , p − m ) − u m ( q m , p − m ) , for all m ∈ M , p m , q m ∈ E m , and p − m ∈ E − m . Let γ = ( p 0 , . . . , p N ) be a simple closed path (i.e., p i and p i + 1 differ in the strategy of only one player and p 0 = p N ). Define I ( γ ) to be the total utility improvement along the path, i.e., N � u m i ( p i ) − u m i ( p i − 1 ) . I ( γ ) = i = 1 Proposition (Monderer and Shapley) A game is a potential game if and only if I ( γ ) = 0 for all simple closed paths γ . 7
Near-Potential Games Maximal Pairwise Difference We adopt the following metric to measure the distance between games. Definition (Maximal pairwise difference) Let G and ˆ G be two games with set of players M , set of strategy profiles E , and utility functions { u m } and { ˆ u m } . The maximal pairwise difference (MPD) between these games is defined as △ d ( G , ˆ { m , p , q | p − m = q − m } | ( u m ( p ) − u m ( q )) − (ˆ u m ( p ) − ˆ u m ( q )) | . G ) = max MPD captures how different two games are in terms of utility improvements due to unilateral deviations. We use difference of utility improvements rather than difference of utility values since the former is a better representation of strategic similarities (equilibrium and dynamic properties) [Candogan, Menache, Ozdaglar, Parrilo 10]. We refer to games with small MPD to a potential game as a near-potential game. 8
Near-Potential Games Finding Close Potential Games We consider the problem of finding the closest potential game to a given game, where the distance is measured in terms of the MPD. Potential games are characterized by linear equality constraints. Given a game with utility functions { u m } , the closest potential game, with u m } , can be obtained by solving the following convex utility functions { ˆ optimization problem: u m ( q m , p − m ) − u m ( p m , p − m ) � � min m ∈M , p ∈ E , q m ∈ E m | max u m } φ, { ˆ u m ( q m , p − m ) − ˆ u m ( p m , p − m ) � � − ˆ | q m , ¯ p − m ) − φ (¯ p m , ¯ p − m ) = ˆ u m (¯ q m , ¯ p − m ) − ˆ u m (¯ p m , ¯ p − m ) , s . t . φ (¯ q m ∈ E m . for all m ∈ M , ¯ p ∈ E , ¯ We study extensions to other norms and weighted potential games in [Candogan, Ozdaglar, Parrilo 2010]. 9
Dynamics Discrete Time Better-Response Dynamics Discrete Time Better-Response Dynamics – 1 We first focus on discrete time better response dynamics: At each time step t , a single player is chosen at random (using a probability distribution with full support over the set of players). Suppose player m is chosen and r ∈ E is the current strategy profile. Player m updates its strategy to a strategy in { q m ∈ E m | u m ( q m , r − m ) > u m ( r ) } , chosen uniformly at random. We consider convergence of the sequence of generated pure strategy profiles { p t } ∞ t = 0 , which we refer to as the trajectory of the dynamics. In finite potential games, convergence of the trajectory to a Nash equilibrium is established using the fact that with each update the potential strictly increases. 10
Dynamics Discrete Time Better-Response Dynamics Discrete Time Better Response Dynamics – 2 Theorem Consider a game G and let ˆ G be a close potential game with d ( G , ˆ G ) = δ . In G , the trajectory of the better-response dynamics is contained in the pure ǫ -equilibrium set after finite time with probability 1 , where ǫ = δ | E | . Proof Sketch: The evolution of trajectories can be represented by a Markov chain: Set of states given by the set of strategy profiles, and there is a nonzero transition probability from r to q if r and q differ in the strategy of a single player, say m , and q m is a (strictly) better response of player m to r − m . With probability 1, we have convergence to a recurrence class in finite time. For any transition between two states in the same recurrence class, we can construct a closed improvement path. Using zero total utility improvement along the path for the close potential game and the proximity of our game to the potential game, we can establish a bound on the utility improvement between any two states in this recurrence class. 11
Dynamics Discrete Time Fictitious Play Dynamics Discrete Time Fictitious Play – 1 In fictitious play, agents form predictions about opponent strategies using the entire history of play: they forecast other players’ strategies to be (independent) empirical frequency distributions. Let 1 ( p m t = p m ) be the indicator function, which is equal to 1 if p m t = p m , and 0 otherwise. The empirical frequency at time T that player m uses strategy q m is given by T − 1 T ( q m ) = 1 µ m � 1 ( p m t = q m ) . T t = 0 Let µ m T denote the empirical frequency distribution (vector) of player m at time T . At each time instant t , every player m , chooses a strategy p m t such that p m q m ∈ E m u m ( q m , µ − m t ∈ arg max ) . t It is known that empirical frequency distributions converge to the (mixed) equilibrium set in potential games. [Monderer and Shapley 96]. 12
Recommend
More recommend