Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games Adrian Rivera Cardoso Joint work with: Jacob Abernethy, He Wang, Huan Xu Georgia Institute of Technology June 7, 2019 Adrian Rivera Cardoso (GATECH) June 7, 2019 1 / 10
Overview Matrix Games 1 Online Matrix Games 2 An Impossibility Result 3 Good News 4 Adrian Rivera Cardoso (GATECH) June 7, 2019 2 / 10
Matrix Games One of the canonical problems in game theory are zero-sum matrix games . Finding a Nash Equilibrium is core to many problems in statistics, optimization and economics. Setup: Player 1 chooses a probability distribution x over d 1 actions Player 2 chooses a probability distribution y over d 2 actions The payoffs are specified by matrix A ∈ R d 1 × d 2 . A ij encodes the loss of Player 1 = reward of Player 2 when they play actions i , j respectively Adrian Rivera Cardoso (GATECH) June 7, 2019 3 / 10
Matrix Games The goal is to find a Nash Equilibrium (NE) of the game. A NE is a pair ( x ∗ , y ∗ ) such that for all x ∈ ∆ d 1 , y ∈ ∆ d 2 it holds that x ∗⊤ Ay ≤ x ∗⊤ Ay ∗ ≤ x ⊤ Ay ∗ ( x ∗ ) ⊤ Ay ∗ is called the value of the game an it holds that x ∗⊤ Ay ∗ = min y ∈ ∆ d 2 x ⊤ Ay = max x ∈ ∆ d 1 x ⊤ Ay . x ∈ ∆ d 1 max y ∈ ∆ d 2 min How to find a NE? Run two OCO algorithms in parallel and then average the history of iterates. But what if the payoff matrix changes with time?!? Adrian Rivera Cardoso (GATECH) June 7, 2019 4 / 10
Online Matrix Games: Problem Setup Two players play a sequence of Matrix Games for T time steps In step t they must each choose a distribution over actions x t ∈ ∆ d 1 , y t ∈ ∆ d 2 An adversary chooses payoff matrix A t They receive loss/reward equal to x ⊤ t A t y t and observe A t Using this new information they choose x t +1 , y t +1 Their goal is to achieve sublinear Nash Equilibrium Regret T T � x ⊤ � x ⊤ A t y | NE . Regret � | t A t y t − min x ∈ ∆ d 1 max y ∈ ∆ d 2 t =1 t =1 Adrian Rivera Cardoso (GATECH) June 7, 2019 5 / 10
An Impossibility Result We know that when A t = A for all t = 1 , ..., T , if each player minimizes its own Individual Regret, T T � � f t ( x t ) − min f t ( x ) , x ∈ X t =1 t =1 and we average their iterates we find a NE equilibrium. Is this still a good strategy to minimize Individual Regret when A t � = A for all t = 1 , ..., T ? Adrian Rivera Cardoso (GATECH) June 7, 2019 6 / 10
An Impossibility Result Theorem Consider any algorithm that selects a sequence of x t , y t pairs given the past payoff matrices A 1 , . . . , A t − 1 . Consider the following three objectives: � T T � � � � � x ⊤ x ⊤ A t y t A t y t − min x ∈ ∆ d 1 max = o ( T ) , (1) � � � � y t ∈ ∆ d 2 � t =1 t =1 � T T � x ⊤ � x ⊤ A t y t t A t y t − min = o ( T ) , (2) x ∈ ∆ X t =1 t =1 T T � � x ⊤ x ⊤ max t A t y − t A t y t = o ( T ) . (3) y ∈ ∆ Y t =1 t =1 Then there exists an (adversarially-chosen) sequence A 1 , A 2 , . . . such that not all of (1) , (2) , and (3) , are true. Adrian Rivera Cardoso (GATECH) June 7, 2019 7 / 10
Good News Theorem There exists an algorithm (see paper or poster) that guarantees: √ √ NE . Regret ≤ O ( T ln( T ) + max { ln( d 1 ) , ln( d 2 ) } T ) Adrian Rivera Cardoso (GATECH) June 7, 2019 8 / 10
Some Preliminary Results Our algorithm seems to be useful for training GANs. Different algorithms used for training GANs on the mixture of Gaussians data set. Adrian Rivera Cardoso (GATECH) June 7, 2019 9 / 10
Thank you! See you at Pacific Ballroom 151 from 6:30-9:00 pm Adrian Rivera Cardoso (GATECH) June 7, 2019 10 / 10
Recommend
More recommend