variance reduction for matrix games
play

Variance Reduction for Matrix Games Yair Carmon Yujia Jin - PowerPoint PPT Presentation

Poster #212 Variance Reduction for Matrix Games Yair Carmon Yujia Jin Aaron Sidford Kevin Tian ( presenting ) Poster #212 Zero-sum games min x max y f ( x , y ) Super useful! Constraints: checks


  1. Poster #212 Variance Reduction for Matrix Games Yair Carmon Yujia Jin Aaron Sidford Kevin Tian ( presenting )

  2. Poster #212 Zero-sum games min x βˆˆπ’΄ max y βˆˆπ’΅ f ( x , y ) Super useful! ๏ Constraints: checks feasibility (e.g. GAN’s) y ๏ Robustness: represents uncertainty (e.g. adversarial training) y

  3. Poster #212 Zero-sum games min x βˆˆπ’΄ max y βˆˆπ’΅ f ( x , y ) Super useful! ๏ Constraints: checks feasibility (e.g. GAN’s) y ๏ Robustness: represents uncertainty (e.g. adversarial training) y Ideal (approximate) solution -Nash equilibrium Ο΅ player is happy player is happy x y f ( x , y ) β‰₯ max y β€² οΏ½ βˆˆπ’΅ f ( x , y β€² οΏ½ ) βˆ’ Ο΅ f ( x , y ) ≀ min x β€² οΏ½ βˆˆπ’΄ f ( x β€² οΏ½ , y ) + Ο΅

  4. Poster #212 Zero-sum games min x βˆˆπ’΄ max y βˆˆπ’΅ f ( x , y ) Super useful! ๏ Constraints: checks feasibility (e.g. GAN’s) y ๏ Robustness: represents uncertainty (e.g. adversarial training) y Ideal (approximate) solution -Nash equilibrium Ο΅ player is happy player is happy x y f ( x , y ) β‰₯ max y β€² οΏ½ βˆˆπ’΅ f ( x , y β€² οΏ½ ) βˆ’ Ο΅ f ( x , y ) ≀ min x β€² οΏ½ βˆˆπ’΄ f ( x β€² οΏ½ , y ) + Ο΅ We assume is convex-concave Nash equilibrium exists f ⟹

  5. Poster #212 Our contributions

  6. Poster #212 Our contributions 1. Variance reduction framework 
 for general (convex-concave) min x βˆˆπ’΄ max y βˆˆπ’΅ f ( x , y )

  7. Poster #212 Our contributions 1. Variance reduction framework 
 for general (convex-concave) min x βˆˆπ’΄ max y βˆˆπ’΅ f ( x , y ) Centered Fast algorithm gradient estimator

  8. Poster #212 Our contributions Geometry 1. Variance reduction framework 
 matters for general (convex-concave) min x βˆˆπ’΄ max y βˆˆπ’΅ f ( x , y ) Centered Fast algorithm gradient estimator

  9. Poster #212 Our contributions Geometry 1. Variance reduction framework 
 matters for general (convex-concave) min x βˆˆπ’΄ max y βˆˆπ’΅ f ( x , y ) Centered Fast algorithm gradient estimator 2. Concrete centered gradient estimators 
 f ( x ) = y ⊀ Ax for β€œSampling from the difference”

  10. Poster #212 Our contributions β€œSampling from the difference” } Geometry 1. Variance reduction framework 
 matters for general (convex-concave) min x βˆˆπ’΄ max y βˆˆπ’΅ f ( x , y ) Centered New runtimes for Fast algorithm gradient estimator y βˆˆπ’΅ y ⊀ Ax min x βˆˆπ’΄ max 2. Concrete centered gradient estimators 
 f ( x ) = y ⊀ Ax for

  11. Poster #212 Bilinear games y βˆˆπ’΅ y ⊀ Ax , A ∈ ℝ m Γ— n min x βˆˆπ’΄ max ๏ Simplest case ๏ Local model for smooth zero-sum game ๏ Important by themselves

  12. Poster #212 Bilinear games y βˆˆπ’΅ y ⊀ Ax , A ∈ ℝ m Γ— n min x βˆˆπ’΄ max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves

  13. Poster #212 Bilinear games y βˆˆπ’΅ y ⊀ Ax , A ∈ ℝ m Γ— n min x βˆˆπ’΄ max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves simplex 𝒴 = 𝒡 = Matrix games / LP

  14. Poster #212 Bilinear games y βˆˆπ’΅ y ⊀ Ax , A ∈ ℝ m Γ— n min x βˆˆπ’΄ max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves simplex Euclidean, simplex 𝒴 = 𝒡 = 𝒴 = 𝒡 = Matrix games / LP Hard margin SVM

  15. Poster #212 Bilinear games y βˆˆπ’΅ y ⊀ Ax , A ∈ ℝ m Γ— n min x βˆˆπ’΄ max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves simplex Euclidean, simplex Euclidean 𝒴 = 𝒡 = 𝒴 = 𝒡 = 𝒴 = 𝒡 = Matrix games / LP Hard margin SVM Linear regression

  16. Poster #212 Bilinear games y βˆˆπ’΅ y ⊀ Ax , A ∈ ℝ m Γ— n min x βˆˆπ’΄ max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves Balamurugan & Bach `16 simplex Euclidean, simplex Euclidean 𝒴 = 𝒡 = 𝒴 = 𝒡 = 𝒴 = 𝒡 = Matrix games / LP Hard margin SVM Linear regression

  17. Poster #212 Bilinear games y βˆˆπ’΅ y ⊀ Ax , A ∈ ℝ m Γ— n min x βˆˆπ’΄ max ๏ Simplest case Geometry ๏ Local model for smooth zero-sum game matters ๏ Important by themselves Balamurugan & Bach `16 simplex Euclidean, simplex Euclidean 𝒴 = 𝒡 = 𝒴 = 𝒡 = 𝒴 = 𝒡 = Matrix games / LP Hard margin SVM Linear regression Our work

  18. Poster #212 Algorithms and rates A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax min x βˆˆπ’΄ max

  19. Poster #212 Algorithms and rates for A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity min x βˆˆπ’΄ max m ≍ n x ↦ Ax takes n 2 time

  20. Poster #212 Algorithms and rates for A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity min x βˆˆπ’΄ max m ≍ n x ↦ Ax takes n 2 time Exact gradient (Nemirovski `04, Nesterov `07) n 2 β‹… L Ο΅

  21. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 Exact gradient (Nemirovski `04, Nesterov `07) n 2 β‹… L Ο΅

  22. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 matters Exact gradient (Nemirovski `04, Nesterov `07) n 2 β‹… L Ο΅

  23. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 matters Exact gradient Stochastic gradient (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) n β‹… L 2 n 2 β‹… L Ο΅ 2 Ο΅

  24. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β‹… L 2 n 2 β‹… L n 2 + n 3/2 β‹… L Ο΅ 2 Ο΅ Ο΅

  25. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β‹… L 2 n 2 β‹… L n 2 + n 3/2 β‹… L Ο΅ 2 Ο΅ Ο΅ Image credit: Chawit Waewsawangwong

  26. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β‹… L 2 n 2 β‹… L n 2 + n 3/2 β‹… L Ο΅ 2 Ο΅ Ο΅ Image credit: Chawit Waewsawangwong

  27. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β‹… L 2 n 2 β‹… L n 2 + n 3/2 β‹… L Ο΅ 2 Ο΅ Ο΅ Image credit: Chawit Waewsawangwong

  28. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β‹… L 2 n 2 β‹… L n 2 + n 3/2 β‹… L Ο΅ 2 VR always Ο΅ Ο΅ better Geometry matters Image credit: Chawit Waewsawangwong

  29. Poster #212 Algorithms and rates for L = { A ∈ ℝ m Γ— n y βˆˆπ’΅ y ⊀ Ax simplicity simplex-simplex min x βˆˆπ’΄ max max ij | A ij | m ≍ n Geometry simplex-ball x ↦ Ax takes n 2 time max i βˆ₯ A i : βˆ₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β‹… L 2 n 2 β‹… L n 2 + n 3/2 β‹… L Ο΅ 2 VR always VR better Ο΅ Ο΅ better for passes Ξ© (1) over data Geometry matters Image credit: Chawit Waewsawangwong

  30. Poster #212 It’s all in the gradient estimator Reference point x 0

  31. Poster #212 It’s all in the gradient estimator Centered gradient estimator g x 0 ( β‹… ) 𝔽βˆ₯ g x 0 ( x ) βˆ’ βˆ‡ f ( x ) βˆ₯ 2 * ≀ L 2 βˆ₯ x βˆ’ x 0 βˆ₯ 2 and 𝔽 g x 0 ( x ) = βˆ‡ f ( x ) Reference point x 0 Also using this concept in the Euclidean setting: VR for non-convex optimization (AH`16, RHSPS`16, FLLZ`18, ZXG`18) & bilinear saddle-point problems (BB`16)

  32. Poster #212 It’s all in the gradient estimator Centered gradient estimator g x 0 ( β‹… ) 𝔽βˆ₯ g x 0 ( x ) βˆ’ βˆ‡ f ( x ) βˆ₯ 2 * ≀ L 2 βˆ₯ x βˆ’ x 0 βˆ₯ 2 and 𝔽 g x 0 ( x ) = βˆ‡ f ( x ) variance Reference point x 0 Also using this concept in the Euclidean setting: VR for non-convex optimization (AH`16, RHSPS`16, FLLZ`18, ZXG`18) & bilinear saddle-point problems (BB`16)

Recommend


More recommend