Variance Reduction for Matrix Games Yair Carmon Yujia Jin - PowerPoint PPT Presentation

Poster #212 Variance Reduction for Matrix Games Yair Carmon Yujia Jin Aaron Sidford Kevin Tian ( presenting )

Poster #212 Zero-sum games min x ∈𝒴 max y ∈𝒵 f ( x , y ) Super useful! ๏ Constraints: checks feasibility (e.g. GAN’s) y ๏ Robustness: represents uncertainty (e.g. adversarial training) y

Poster #212 Zero-sum games min x ∈𝒴 max y ∈𝒵 f ( x , y ) Super useful! ๏ Constraints: checks feasibility (e.g. GAN’s) y ๏ Robustness: represents uncertainty (e.g. adversarial training) y Ideal (approximate) solution -Nash equilibrium ϵ player is happy player is happy x y f ( x , y ) ≥ max y ′ � ∈𝒵 f ( x , y ′ � ) − ϵ f ( x , y ) ≤ min x ′ � ∈𝒴 f ( x ′ � , y ) + ϵ

Poster #212 Zero-sum games min x ∈𝒴 max y ∈𝒵 f ( x , y ) Super useful! ๏ Constraints: checks feasibility (e.g. GAN’s) y ๏ Robustness: represents uncertainty (e.g. adversarial training) y Ideal (approximate) solution -Nash equilibrium ϵ player is happy player is happy x y f ( x , y ) ≥ max y ′ � ∈𝒵 f ( x , y ′ � ) − ϵ f ( x , y ) ≤ min x ′ � ∈𝒴 f ( x ′ � , y ) + ϵ We assume is convex-concave Nash equilibrium exists f ⟹

Poster #212 Our contributions

Poster #212 Our contributions 1. Variance reduction framework   for general (convex-concave) min x ∈𝒴 max y ∈𝒵 f ( x , y )

Poster #212 Our contributions 1. Variance reduction framework   for general (convex-concave) min x ∈𝒴 max y ∈𝒵 f ( x , y ) Centered Fast algorithm gradient estimator

Poster #212 Our contributions Geometry 1. Variance reduction framework   matters for general (convex-concave) min x ∈𝒴 max y ∈𝒵 f ( x , y ) Centered Fast algorithm gradient estimator

Poster #212 Our contributions Geometry 1. Variance reduction framework   matters for general (convex-concave) min x ∈𝒴 max y ∈𝒵 f ( x , y ) Centered Fast algorithm gradient estimator 2. Concrete centered gradient estimators   f ( x ) = y ⊤ Ax for “Sampling from the difference”

Poster #212 Our contributions “Sampling from the difference” } Geometry 1. Variance reduction framework   matters for general (convex-concave) min x ∈𝒴 max y ∈𝒵 f ( x , y ) Centered New runtimes for Fast algorithm gradient estimator y ∈𝒵 y ⊤ Ax min x ∈𝒴 max 2. Concrete centered gradient estimators   f ( x ) = y ⊤ Ax for

Poster #212 Bilinear games y ∈𝒵 y ⊤ Ax , A ∈ ℝ m × n min x ∈𝒴 max ๏ Simplest case ๏ Local model for smooth zero-sum game ๏ Important by themselves

Poster #212 Bilinear games y ∈𝒵 y ⊤ Ax , A ∈ ℝ m × n min x ∈𝒴 max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves

Poster #212 Bilinear games y ∈𝒵 y ⊤ Ax , A ∈ ℝ m × n min x ∈𝒴 max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves simplex 𝒴 = 𝒵 = Matrix games / LP

Poster #212 Bilinear games y ∈𝒵 y ⊤ Ax , A ∈ ℝ m × n min x ∈𝒴 max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves simplex Euclidean, simplex 𝒴 = 𝒵 = 𝒴 = 𝒵 = Matrix games / LP Hard margin SVM

Poster #212 Bilinear games y ∈𝒵 y ⊤ Ax , A ∈ ℝ m × n min x ∈𝒴 max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves simplex Euclidean, simplex Euclidean 𝒴 = 𝒵 = 𝒴 = 𝒵 = 𝒴 = 𝒵 = Matrix games / LP Hard margin SVM Linear regression

Poster #212 Bilinear games y ∈𝒵 y ⊤ Ax , A ∈ ℝ m × n min x ∈𝒴 max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves Balamurugan & Bach `16 simplex Euclidean, simplex Euclidean 𝒴 = 𝒵 = 𝒴 = 𝒵 = 𝒴 = 𝒵 = Matrix games / LP Hard margin SVM Linear regression

Poster #212 Bilinear games y ∈𝒵 y ⊤ Ax , A ∈ ℝ m × n min x ∈𝒴 max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves Balamurugan & Bach `16 simplex Euclidean, simplex Euclidean 𝒴 = 𝒵 = 𝒴 = 𝒵 = 𝒴 = 𝒵 = Matrix games / LP Hard margin SVM Linear regression Our work

Poster #212 Algorithms and rates A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax min x ∈𝒴 max

Poster #212 Algorithms and rates for A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity min x ∈𝒴 max m ≍ n x ↦ Ax takes n 2 time

Poster #212 Algorithms and rates for A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity min x ∈𝒴 max m ≍ n x ↦ Ax takes n 2 time Exact gradient (Nemirovski `04, Nesterov `07) n 2 ⋅ L ϵ

Poster #212 Algorithms and rates for L = { A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity simplex-simplex min x ∈𝒴 max max ij | A ij | m ≍ n simplex-ball x ↦ Ax takes n 2 time max i ∥ A i : ∥ 2 Exact gradient (Nemirovski `04, Nesterov `07) n 2 ⋅ L ϵ

Poster #212 Algorithms and rates for L = { A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity simplex-simplex min x ∈𝒴 max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i ∥ A i : ∥ 2 matters Exact gradient (Nemirovski `04, Nesterov `07) n 2 ⋅ L ϵ

Poster #212 Algorithms and rates for L = { A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity simplex-simplex min x ∈𝒴 max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i ∥ A i : ∥ 2 matters Exact gradient Stochastic gradient (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) n ⋅ L 2 n 2 ⋅ L ϵ 2 ϵ

Poster #212 Algorithms and rates for L = { A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity simplex-simplex min x ∈𝒴 max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i ∥ A i : ∥ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n ⋅ L 2 n 2 ⋅ L n 2 + n 3/2 ⋅ L ϵ 2 ϵ ϵ

Poster #212 Algorithms and rates for L = { A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity simplex-simplex min x ∈𝒴 max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i ∥ A i : ∥ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n ⋅ L 2 n 2 ⋅ L n 2 + n 3/2 ⋅ L ϵ 2 ϵ ϵ Image credit: Chawit Waewsawangwong

Poster #212 Algorithms and rates for L = { A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity simplex-simplex min x ∈𝒴 max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i ∥ A i : ∥ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n ⋅ L 2 n 2 ⋅ L n 2 + n 3/2 ⋅ L ϵ 2 VR always ϵ ϵ better Geometry matters Image credit: Chawit Waewsawangwong

Poster #212 Algorithms and rates for L = { A ∈ ℝ m × n y ∈𝒵 y ⊤ Ax simplicity simplex-simplex min x ∈𝒴 max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i ∥ A i : ∥ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n ⋅ L 2 n 2 ⋅ L n 2 + n 3/2 ⋅ L ϵ 2 VR always VR better ϵ ϵ better for passes Ω (1) over data Geometry matters Image credit: Chawit Waewsawangwong

Poster #212 It’s all in the gradient estimator Reference point x 0

Poster #212 It’s all in the gradient estimator Centered gradient estimator g x 0 ( ⋅ ) 𝔽∥ g x 0 ( x ) − ∇ f ( x ) ∥ 2 * ≤ L 2 ∥ x − x 0 ∥ 2 and 𝔽 g x 0 ( x ) = ∇ f ( x ) Reference point x 0 Also using this concept in the Euclidean setting: VR for non-convex optimization (AH`16, RHSPS`16, FLLZ`18, ZXG`18) & bilinear saddle-point problems (BB`16)

Poster #212 It’s all in the gradient estimator Centered gradient estimator g x 0 ( ⋅ ) 𝔽∥ g x 0 ( x ) − ∇ f ( x ) ∥ 2 * ≤ L 2 ∥ x − x 0 ∥ 2 and 𝔽 g x 0 ( x ) = ∇ f ( x ) variance Reference point x 0 Also using this concept in the Euclidean setting: VR for non-convex optimization (AH`16, RHSPS`16, FLLZ`18, ZXG`18) & bilinear saddle-point problems (BB`16)

Variance Reduction for Matrix Games Yair Carmon Yujia Jin - PowerPoint PPT Presentation

Poster #212 Variance Reduction for Matrix Games Yair Carmon Yujia Jin Aaron Sidford Kevin Tian ( presenting ) Poster #212 Zero-sum games min x max y f ( x , y ) Super useful! Constraints: checks

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Variance reduction Timo Tiihonen 2014 Variance reduction techniques The most efficient way to

Variance reduction A primer on simplest techniques What is variance reduction Reduce

Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Low-Variance and Zero-Variance Baselines in Extensive-Form Games Trevor Davis 2,* , Martin Schmid

Entropy Games and Matrix Multiplication Games Eugene Asarin Julien Cervelle Aldric Degorre C

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Finding Optimal Mixed Finding Optimal Mixed Strategies to Commit to in g Security Games

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

Interactions and dynamics: some aspects of repeated zero-sum games Sylvain Sorin Laboratoire

Game Theory Basics Game theory is designed to model How rational (payoff-maximizing)

Introduction to Game Theory Lirong Xia Fall, 2016 Homework 1 2 Announcements We will use

Game Theory Catherine Moon csm17@duke.edu With thanks to Ron Parr and Vince Conitzer for some

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS Pseudocode for i = 1 :