Poster #212 Variance Reduction for Matrix Games Yair Carmon Yujia Jin Aaron Sidford Kevin Tian ( presenting )
Poster #212 Zero-sum games min x βπ΄ max y βπ΅ f ( x , y ) Super useful! ΰΉ Constraints: checks feasibility (e.g. GANβs) y ΰΉ Robustness: represents uncertainty (e.g. adversarial training) y
Poster #212 Zero-sum games min x βπ΄ max y βπ΅ f ( x , y ) Super useful! ΰΉ Constraints: checks feasibility (e.g. GANβs) y ΰΉ Robustness: represents uncertainty (e.g. adversarial training) y Ideal (approximate) solution -Nash equilibrium Ο΅ player is happy player is happy x y f ( x , y ) β₯ max y β² οΏ½ βπ΅ f ( x , y β² οΏ½ ) β Ο΅ f ( x , y ) β€ min x β² οΏ½ βπ΄ f ( x β² οΏ½ , y ) + Ο΅
Poster #212 Zero-sum games min x βπ΄ max y βπ΅ f ( x , y ) Super useful! ΰΉ Constraints: checks feasibility (e.g. GANβs) y ΰΉ Robustness: represents uncertainty (e.g. adversarial training) y Ideal (approximate) solution -Nash equilibrium Ο΅ player is happy player is happy x y f ( x , y ) β₯ max y β² οΏ½ βπ΅ f ( x , y β² οΏ½ ) β Ο΅ f ( x , y ) β€ min x β² οΏ½ βπ΄ f ( x β² οΏ½ , y ) + Ο΅ We assume is convex-concave Nash equilibrium exists f βΉ
Poster #212 Our contributions
Poster #212 Our contributions 1. Variance reduction framework β¨ for general (convex-concave) min x βπ΄ max y βπ΅ f ( x , y )
Poster #212 Our contributions 1. Variance reduction framework β¨ for general (convex-concave) min x βπ΄ max y βπ΅ f ( x , y ) Centered Fast algorithm gradient estimator
Poster #212 Our contributions Geometry 1. Variance reduction framework β¨ matters for general (convex-concave) min x βπ΄ max y βπ΅ f ( x , y ) Centered Fast algorithm gradient estimator
Poster #212 Our contributions Geometry 1. Variance reduction framework β¨ matters for general (convex-concave) min x βπ΄ max y βπ΅ f ( x , y ) Centered Fast algorithm gradient estimator 2. Concrete centered gradient estimators β¨ f ( x ) = y β€ Ax for βSampling from the differenceβ
Poster #212 Our contributions βSampling from the differenceβ } Geometry 1. Variance reduction framework β¨ matters for general (convex-concave) min x βπ΄ max y βπ΅ f ( x , y ) Centered New runtimes for Fast algorithm gradient estimator y βπ΅ y β€ Ax min x βπ΄ max 2. Concrete centered gradient estimators β¨ f ( x ) = y β€ Ax for
Poster #212 Bilinear games y βπ΅ y β€ Ax , A β β m Γ n min x βπ΄ max ΰΉ Simplest case ΰΉ Local model for smooth zero-sum game ΰΉ Important by themselves
Poster #212 Bilinear games y βπ΅ y β€ Ax , A β β m Γ n min x βπ΄ max ΰΉ Simplest case Geometry ΰΉ Local model for smooth zero-sum game matters ΰΉ Important by themselves
Poster #212 Bilinear games y βπ΅ y β€ Ax , A β β m Γ n min x βπ΄ max ΰΉ Simplest case Geometry ΰΉ Local model for smooth zero-sum game matters ΰΉ Important by themselves simplex π΄ = π΅ = Matrix games / LP
Poster #212 Bilinear games y βπ΅ y β€ Ax , A β β m Γ n min x βπ΄ max ΰΉ Simplest case Geometry ΰΉ Local model for smooth zero-sum game matters ΰΉ Important by themselves simplex Euclidean, simplex π΄ = π΅ = π΄ = π΅ = Matrix games / LP Hard margin SVM
Poster #212 Bilinear games y βπ΅ y β€ Ax , A β β m Γ n min x βπ΄ max ΰΉ Simplest case Geometry ΰΉ Local model for smooth zero-sum game matters ΰΉ Important by themselves simplex Euclidean, simplex Euclidean π΄ = π΅ = π΄ = π΅ = π΄ = π΅ = Matrix games / LP Hard margin SVM Linear regression
Poster #212 Bilinear games y βπ΅ y β€ Ax , A β β m Γ n min x βπ΄ max ΰΉ Simplest case Geometry ΰΉ Local model for smooth zero-sum game matters ΰΉ Important by themselves Balamurugan & Bach `16 simplex Euclidean, simplex Euclidean π΄ = π΅ = π΄ = π΅ = π΄ = π΅ = Matrix games / LP Hard margin SVM Linear regression
Poster #212 Bilinear games y βπ΅ y β€ Ax , A β β m Γ n min x βπ΄ max ΰΉ Simplest case Geometry ΰΉ Local model for smooth zero-sum game matters ΰΉ Important by themselves Balamurugan & Bach `16 simplex Euclidean, simplex Euclidean π΄ = π΅ = π΄ = π΅ = π΄ = π΅ = Matrix games / LP Hard margin SVM Linear regression Our work
Poster #212 Algorithms and rates A β β m Γ n y βπ΅ y β€ Ax min x βπ΄ max
Poster #212 Algorithms and rates for A β β m Γ n y βπ΅ y β€ Ax simplicity min x βπ΄ max m β n x β¦ Ax takes n 2 time
Poster #212 Algorithms and rates for A β β m Γ n y βπ΅ y β€ Ax simplicity min x βπ΄ max m β n x β¦ Ax takes n 2 time Exact gradient (Nemirovski `04, Nesterov `07) n 2 β L Ο΅
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 Exact gradient (Nemirovski `04, Nesterov `07) n 2 β L Ο΅
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n Geometry simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 matters Exact gradient (Nemirovski `04, Nesterov `07) n 2 β L Ο΅
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n Geometry simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 matters Exact gradient Stochastic gradient (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) n β L 2 n 2 β L Ο΅ 2 Ο΅
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n Geometry simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β L 2 n 2 β L n 2 + n 3/2 β L Ο΅ 2 Ο΅ Ο΅
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n Geometry simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β L 2 n 2 β L n 2 + n 3/2 β L Ο΅ 2 Ο΅ Ο΅ Image credit: Chawit Waewsawangwong
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n Geometry simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β L 2 n 2 β L n 2 + n 3/2 β L Ο΅ 2 Ο΅ Ο΅ Image credit: Chawit Waewsawangwong
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n Geometry simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β L 2 n 2 β L n 2 + n 3/2 β L Ο΅ 2 Ο΅ Ο΅ Image credit: Chawit Waewsawangwong
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n Geometry simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β L 2 n 2 β L n 2 + n 3/2 β L Ο΅ 2 VR always Ο΅ Ο΅ better Geometry matters Image credit: Chawit Waewsawangwong
Poster #212 Algorithms and rates for L = { A β β m Γ n y βπ΅ y β€ Ax simplicity simplex-simplex min x βπ΄ max max ij | A ij | m β n Geometry simplex-ball x β¦ Ax takes n 2 time max i β₯ A i : β₯ 2 matters Exact gradient Stochastic gradient Variance reduction (Nemirovski `04, Nesterov `07) (GK95, NJLS09, CHW10) (our approach) n β L 2 n 2 β L n 2 + n 3/2 β L Ο΅ 2 VR always VR better Ο΅ Ο΅ better for passes Ξ© (1) over data Geometry matters Image credit: Chawit Waewsawangwong
Poster #212 Itβs all in the gradient estimator Reference point x 0
Poster #212 Itβs all in the gradient estimator Centered gradient estimator g x 0 ( β ) π½β₯ g x 0 ( x ) β β f ( x ) β₯ 2 * β€ L 2 β₯ x β x 0 β₯ 2 and π½ g x 0 ( x ) = β f ( x ) Reference point x 0 Also using this concept in the Euclidean setting: VR for non-convex optimization (AH`16, RHSPS`16, FLLZ`18, ZXG`18) & bilinear saddle-point problems (BB`16)
Poster #212 Itβs all in the gradient estimator Centered gradient estimator g x 0 ( β ) π½β₯ g x 0 ( x ) β β f ( x ) β₯ 2 * β€ L 2 β₯ x β x 0 β₯ 2 and π½ g x 0 ( x ) = β f ( x ) variance Reference point x 0 Also using this concept in the Euclidean setting: VR for non-convex optimization (AH`16, RHSPS`16, FLLZ`18, ZXG`18) & bilinear saddle-point problems (BB`16)
Recommend
More recommend