Frank-Wolfe Algorithms for Saddle Point problems author: Gauthier - PowerPoint PPT Presentation

Frank-Wolfe Algorithms for Saddle Point problems author: Gauthier Gidel, Supervisors: Simon Lacoste-Julien & Tony Jebara INRIA Paris, Sierra Team & Columbia University September 15 th 2016 September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Overview ◮ Machine Learning needs to tackle complicated optimization problems ⇒ ML needs optimization. ◮ Frank-Wolfe algorithm (FW) gained in popularity in the last couple of years. ◮ It is a convex optimization algorithm solving constrained problems. ◮ We tried to extend FW to saddle point optimization which is non trivial (we partially answered a 30 years old conjecture). September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Motivations: Games Zero-sum games with two players: ◮ Player 1 has actions { 1 , . . . , I } available. ◮ Player 2 has actions { 1 , . . . , J } available. ◮ If action i and action j , implies a reward M ij for Player 1 ◮ Two players play randomly, x ∈ ∆( | I | ) , y ∈ ∆( | J | ), E [ M ij ] = x ⊤ M y Nash equilibrium: ( x ∗ , y ∗ ) ∈ X × Y , ( x ∗ ) ⊤ M y ≤ ( x ∗ ) ⊤ M y ∗ ≤ x ⊤ M y ∗ ∀ ( x , y ) ∈ X × Y September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Saddle point setting Let L : X × Y → R , where X and Y are convex and compact. • Intuition from two players games: ◮ L is a score function. ◮ P1 chooses action in X and want to minimize the score. ◮ P2 chooses action in Y and want to maximize the score. ◮ The saddle point is the couple of best choice for each player. • L is said to be convex-concave if: 1. ∀ y ∈ Y , x �→ L ( x , y ) is convex. 2. ∀ x ∈ X , y �→ L ( x , y ) is concave. • A saddle point is a couple ( x ∗ , y ∗ ) such that, ∀ ( x , y ) ∈ X × Y , L ( x ∗ , y ) ≤ L ( x ∗ , y ∗ ) ≤ L ( x , y ∗ ) September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Motivations: mores applications Robust learning: 1 We want to learn n 1 � min ℓ ( f θ ( x i ) , y i ) + λ Ω( θ ) (1) n θ ∈ Θ i =1 with an uncertainty regarding the data: n � min θ ∈ Θ max ω i ℓ ( f θ ( x i ) , y i ) + λ Ω( θ ) (2) w ∈ ∆ n i =1 1 Junfeng Wen, Chun-Nam Yu, and Russell Greiner. “Robust Learning under Uncertain Test Distributions: Relating Covariate Shift to Model Misspecification.” In: ICML . 2014, pp. 631–639. September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Standard approaches in literature The standard algorithm to solve Saddle point optimization is the projected gradient algorithm. x ( t +1) = P X ( x ( t ) − η ∇ x L ( x ( t ) , y ( t ) )) y ( t +1) = P Y ( y ( t ) + η ∇ y L ( x ( t ) , y ( t ) )) When the gradient is uniformly bounded, T 1 � x ( t ) , y ( t ) � � T →∞ ( x ∗ , y ∗ ) − → (3) T t =1 September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

The FW algorithm Initialize x (0) . For t = 0 , . . . , T do ◮ Compute: s ( t ) := argmin s , ∇ f ( x ( t ) � � . s ∈X ◮ Let γ t = 2 2+ t . ◮ Update: x ( t +1) = x ( t ) + γ t ( s ( t ) − x ( t ) ) Figure: One step of the FW end algorithm September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

SPFW Then a Saddle point version of Frank Wolfe algorithm is ◮ Let z (0) = ( x (0) , y (0) ) ∈ X × Y ◮ For t = 0 . . . T � ∇ x L ( x ( t ) , y ( t ) ) � ◮ Compute G = −∇ y L ( x ( t ) , y ( t ) ) ◮ Compute s ( t ) := argmin � s , G � s ∈X×Y ◮ Let γ t = 2 2+ t ◮ Update z ( t +1) := (1 − γ t ) z ( t ) + γ t s ( t ) ◮ return ( x ( T ) , y ( T ) ) September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Advantages of SP-FW Why would we use SP-FW ? ◮ Only a LMO (linear oracle). ◮ Gap certificate for free. ◮ Simplicity of implementation. g t 2 ◮ Universal step size 2+ k , adaptive step size 2 C L , . . . ◮ Sparsity of the solution. ◮ Lots of improvement easily available. Block-coordinate, Away Step... When the constraint set is a “complicated” polytope the projection can be super hard whereas the LMO might be tractable. September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Problems with Hard projection The structured SVM: n ω λ Ω( ω ) + 1 ˜ � min H i ( ω ) n i =1 where ˜ H i ( ω ) = max y ∈Y i L i ( y ) − � ω, φ i ( y ) � is the structured hinge loss. Then we can rewrite the problem as n 1 � � � y i ∈Y i L ⊤ i y i − ω ⊤ M i y i min max n Ω( ω ) ≤ R i =1 but as the function is bilinear α ∈ ∆( |Y| ) b T α − ω T Mα min max Ω( ω ) ≤ β If Ω( · ) is a group lasso norm with overlapping projection is hard. Projecting on Y is intractable. September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Problems with hard projection University game: 1. Game between two universities ( A and B ). 2. Admitting d students and have to assign pairs of students into dorms. 3. The game has a payoff matrix M belonging to R ( d ( d − 1) / 2) 2 . 4. M ij,kl is the expected tuition that B gets (or A gives up) if A pairs student i with j and B pairs student k with l . 5. Here the actions are both in the marginal polytope of all perfect unipartite matchings . Hard to project on this polytope whereas the LMO can be solved efficiently with the blossom algorithm 2 . 2 J. Edmonds. “Paths, trees and flowers”. In: Canadian Journal of Mathematics (1965). September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Our contributions Theoretical contributions: ◮ Introduced a SP extension of FW with away step and proved its convergence over a polytope under some conditions (strong convexity of the function big enough). Partially answering a 30 years old conjecture 3 . ◮ With step size γ t ∼ g t � (1 − ρ ) t/ 3 � h t = O (4) . 3 Janice H Hammond. “Solving asymmetric variational inequality problems and systems of equations with generalized nonlinear programming algorithms”. PhD thesis. Massachusetts Institute of Technology, 1984. September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Toy experiments 1 10 0 10 0 10 Duality Gap Duality Gap −2 10 −1 10 τ =0.037 γ = 2/(2+k) τ =−0.22 γ heuristic τ =0.037 γ adaptive −2 10 τ =−2.4 γ heuristic −4 τ =0.29 γ adaptive 10 τ =−46 γ heuristic τ =0.38 γ adaptive τ =−4.6e+03 γ heuristic τ =0.49 γ adaptive τ =−4.6e+03 γ = 2/(2+k) −3 10 0 500 1000 1500 2000 0 100 200 300 400 500 600 Iteration Iteration Figure: SP-AFW on a toy Figure: SP-AFW on a toy example d = 30 example d = 30 with heuristic step-size September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Experiments 1 −2 10 10 0 Primal Suboptimality 10 −3 10 Duality gap −1 10 SP−FW, γ = 2/(2+k) d=28 −4 10 SP−FW, γ = 1/(1+k) d=120 −2 SP−BCFW, γ = 2n/(2n+k) 10 d=496 Subgradient d=2016 SSG, γ = 1/L(k+1) 1/2 d=8128 SSG γ 2 = 0.1/L(k+1) 1/2 d=32640 −3 −5 10 10 0 1 2 2 3 4 10 10 10 10 10 10 Iteration Effective Pass Figure: SP-FW on the University Figure: Structural SVM with game. OCR dataset (highly regularized). September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Conclusion ◮ There already exist a lot a saddle point problem in the machine learning literature and they are most of the time solved by a trick. ◮ There exist a few number of algorithm to solve SP problems directly ! (and they are not well known) ◮ SP-FW work on SPs and is the only algorithm existing able to solve some of these problem. September 15 th 2016 Gauthier Gidel, Simon Lacoste-Julien Frank-Wolfe Algorithms for SP

Thank You !

Frank-Wolfe Algorithms for Saddle Point problems author: Gauthier - PowerPoint PPT Presentation

Frank-Wolfe Algorithms for Saddle Point problems author: Gauthier Gidel, Supervisors: Simon Lacoste-Julien & Tony Jebara INRIA Paris, Sierra Team & Columbia University September 15 th 2016 September 15 th 2016 Gauthier Gidel, Simon

Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 , 3 Tony Jebara 2 Simon

Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 Tony Jebara 2 Simon

WOLFE RESIDENCE 337 Kenmore Road The Douglaston Historic District Kevin Wolfe Architect 1

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Duality (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Saddle-point

Conjugate gradient methods for stochastic Galerkin finite element saddle point matrices B T A

Self-concordant analysis of Frank-Wolfe algorithms Pavel Dvurechensky 1 Shimrit Shtern 2 Mathias

Presentation for: Prospect Name February 17, 2015 Gregg Wolfe-Principal Doreen Guss National

Fortran Programmers Michael Wolfe PGI compiler engineer michael.wolfe@pgroup.com Outline GPU

Search Problems and Algorithms T79.4201 Search Problems and Algorithms (4 ECTS) T-79.4201

Greedy Algorithms, Frank-Wolfe and Friends a modern perspective NIPS 2013 Workshop

The History of Tooled Leather In the 1800s Old West, the saddle was a symbol of stature and

Leather Seat Covers The History of Tooled Leather In the 1800s Old West, the saddle was a

Upper Saddle River Schools Enrollment Study December, 2016 Ross Haber and Associates Selected

Saddle Creek Road Relocation Study Format of the Meeting Open House Brief presentations

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 ,

Intuition, Philosophical Training, and Theism Nick Byrd www.byrdnick.com Outline 1 Motivation

Lead Today. Transform Tomorrow. Third Quarter 2017 Earnings Nov. 3, 2017 Cautionary Statements

Traditional Programming Models: Traditional Programming Models: Stone Knives and Bearskins in the

Sociological Aspects of CKD (UE) in Sri Lanka Kalinga Tudor Silva, Siri Hettige, Ramani

Israel s Government (stage #4) Yahweh Law Supreme Court (priests & shoter) City 1

The Age of Innocence Preliminary Examination passage-based question review 22 september 2015 the

Announcements Its on the web. Homework 1, Due October 4 CSE 421 Its on the

Scarcity, Efficiency, and Scarcity, Efficiency, and Growth Growth Starring Starring The 3

Frank-Wolfe Algorithms for Saddle Point problems author: Gauthier - PowerPoint PPT Presentation

Frank-Wolfe Algorithms for Saddle Point problems author: Gauthier Gidel, Supervisors: Simon Lacoste-Julien & Tony Jebara INRIA Paris, Sierra Team & Columbia University September 15 th 2016 September 15 th 2016 Gauthier Gidel, Simon

Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 , 3 Tony Jebara 2 Simon

Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 Tony Jebara 2 Simon

WOLFE RESIDENCE 337 Kenmore Road The Douglaston Historic District Kevin Wolfe Architect 1

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Duality (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Saddle-point

Conjugate gradient methods for stochastic Galerkin finite element saddle point matrices B T A

Self-concordant analysis of Frank-Wolfe algorithms Pavel Dvurechensky 1 Shimrit Shtern 2 Mathias

Presentation for: Prospect Name February 17, 2015 Gregg Wolfe-Principal Doreen Guss National

Fortran Programmers Michael Wolfe PGI compiler engineer michael.wolfe@pgroup.com Outline GPU

Search Problems and Algorithms T79.4201 Search Problems and Algorithms (4 ECTS) T-79.4201

Greedy Algorithms, Frank-Wolfe and Friends a modern perspective NIPS 2013 Workshop

The History of Tooled Leather In the 1800s Old West, the saddle was a symbol of stature and

Leather Seat Covers The History of Tooled Leather In the 1800s Old West, the saddle was a

Upper Saddle River Schools Enrollment Study December, 2016 Ross Haber and Associates Selected

Saddle Creek Road Relocation Study Format of the Meeting Open House Brief presentations

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 ,

Intuition, Philosophical Training, and Theism Nick Byrd www.byrdnick.com Outline 1 Motivation

Lead Today. Transform Tomorrow. Third Quarter 2017 Earnings Nov. 3, 2017 Cautionary Statements

Traditional Programming Models: Traditional Programming Models: Stone Knives and Bearskins in the

Sociological Aspects of CKD (UE) in Sri Lanka Kalinga Tudor Silva, Siri Hettige, Ramani

Israel s Government (stage #4) Yahweh Law Supreme Court (priests &amp; shoter) City 1

The Age of Innocence Preliminary Examination passage-based question review 22 september 2015 the

Announcements Its on the web. Homework 1, Due October 4 CSE 421 Its on the

Scarcity, Efficiency, and Scarcity, Efficiency, and Growth Growth Starring Starring The 3

Israel s Government (stage #4) Yahweh Law Supreme Court (priests & shoter) City 1