Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 , - PowerPoint PPT Presentation

Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 , 3 Tony Jebara 2 Simon Lacoste-Julien 3 1 INRIA Paris, Sierra Team 2 Department of CS, Columbia University 3 Department of CS & OR (DIRO) Université de Montréal 25th May 2017 Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Overview ◮ Frank-Wolfe algorithm (FW) gained in popularity in the last couple of years. ◮ Main advantage: FW only needs LMO. ◮ Extend FW properties to solve saddle point problem 1 . ◮ Straightforward extension but Non trivial analysis. 1 Gauthier Gidel, Tony Jebara, and Simon Lacoste-Julien. “Frank-Wolfe Algorithms for Saddle Point Problems”. In: AISTATS . 2017. Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Overview ◮ Frank-Wolfe algorithm (FW) gained in popularity in the last couple of years. ◮ Main advantage: FW only needs LMO. ◮ Extend FW properties to solve saddle point problem 1 . ◮ Straightforward extension but Non trivial analysis. Question for the audience: Call for application 1 Gauthier Gidel, Tony Jebara, and Simon Lacoste-Julien. “Frank-Wolfe Algorithms for Saddle Point Problems”. In: AISTATS . 2017. Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Saddle point and link with variational inequalities Let L : X × Y → R , where X and Y are convex and compact. Saddle point problem: solve min x ∈X max y ∈Y L ( x , y ) A solution ( x ∗ , y ∗ ) is called a Saddle Point . Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Saddle point and link with variational inequalities Let L : X × Y → R , where X and Y are convex and compact. Saddle point problem: solve min x ∈X max y ∈Y L ( x , y ) A solution ( x ∗ , y ∗ ) is called a Saddle Point . ◮ Necessary stationary conditions: � x − x ∗ , ∇ x L ( x ∗ , y ∗ ) � ≥ 0 Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Saddle point and link with variational inequalities Let L : X × Y → R , where X and Y are convex and compact. Saddle point problem: solve min x ∈X max y ∈Y L ( x , y ) A solution ( x ∗ , y ∗ ) is called a Saddle Point . ◮ Necessary stationary conditions: � x − x ∗ , ∇ x L ( x ∗ , y ∗ ) � ≥ 0 � y − y ∗ , −∇ y L ( x ∗ , y ∗ ) � ≥ 0 Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Saddle point and link with variational inequalities Let L : X × Y → R , where X and Y are convex and compact. Saddle point problem: solve min x ∈X max y ∈Y L ( x , y ) A solution ( x ∗ , y ∗ ) is called a Saddle Point . ◮ Necessary stationary conditions: � x − x ∗ , ∇ x L ( x ∗ , y ∗ ) � ≥ 0 � y − y ∗ , −∇ y L ( x ∗ , y ∗ ) � ≥ 0 ◮ Variational inequality: � z − z ∗ , g ( z ∗ ) � ≥ 0 ∀ z ∈ X × Y where ( x ∗ , y ∗ ) = z ∗ and g ( z ) = ( ∇ x L ( z ) , −∇ y L ( z )) Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Saddle point and link with variational inequalities Let L : X × Y → R , where X and Y are convex and compact. Saddle point problem: solve min x ∈X max y ∈Y L ( x , y ) A solution ( x ∗ , y ∗ ) is called a Saddle Point . ◮ Necessary stationary conditions: � x − x ∗ , ∇ x L ( x ∗ , y ∗ ) � ≥ 0 � y − y ∗ , −∇ y L ( x ∗ , y ∗ ) � ≥ 0 ◮ Variational inequality: � z − z ∗ , g ( z ∗ ) � ≥ 0 ∀ z ∈ X × Y where ( x ∗ , y ∗ ) = z ∗ and g ( z ) = ( ∇ x L ( z ) , −∇ y L ( z )) ◮ Sufficient condition : Global solution if L convex-concave . ∀ ( x , y ) ∈ X × Y x ′ �→ L ( x ′ , y ) is convex y ′ �→ L ( x , y ′ ) is concave . and Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Motivations: games and robust learning ◮ Zero-sum games with two players: y ∈ ∆( J ) x ⊤ M y x ∈ ∆( I ) max min 2 J. Wen, C. Yu, and R. Greiner. “Robust Learning under Uncertain Test Distributions: Relating Covariate Shift to Model Misspecification.” In: ICML . 2014. Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Motivations: games and robust learning ◮ Zero-sum games with two players: y ∈ ∆( J ) x ⊤ M y x ∈ ∆( I ) max min ◮ Robust learning: 2 We want to learn n 1 � min ℓ ( f θ ( x i ) , y i ) + λ Ω( θ ) n θ ∈ Θ i =1 with an uncertainty regarding the data: n � min θ ∈ Θ max ω i ℓ ( f θ ( x i ) , y i ) + λ Ω( θ ) w ∈ ∆ n i =1 Minimize the worst case → gives robustness 2 J. Wen, C. Yu, and R. Greiner. “Robust Learning under Uncertain Test Distributions: Relating Covariate Shift to Model Misspecification.” In: ICML . 2014. Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Problem with Hard projection The structured SVM: n ω ∈ R d λ Ω( ω ) + 1 � min y ∈Y i ( L i ( y ) − � ω, φ i ( y ) � ) max n i =1 � �� structured empirical loss Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Problem with Hard projection The structured SVM: n ω ∈ R d λ Ω( ω ) + 1 � min max y ∈Y i ( L i ( y ) − � ω, φ i ( y ) � ) n i =1 � �� structured empirical loss Regularization: penalized → constrained. α ∈ ∆( |Y| ) b T α − ω T Mα min max Ω( ω ) ≤ β Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Problem with Hard projection The structured SVM: n ω ∈ R d λ Ω( ω ) + 1 � min max y ∈Y i ( L i ( y ) − � ω, φ i ( y ) � ) n i =1 � �� structured empirical loss Regularization: penalized → constrained. α ∈ ∆( |Y| ) b T α − ω T Mα min max Ω( ω ) ≤ β Difficult to project when: Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Problem with Hard projection The structured SVM: n ω ∈ R d λ Ω( ω ) + 1 � min max y ∈Y i ( L i ( y ) − � ω, φ i ( y ) � ) n i =1 � �� structured empirical loss Regularization: penalized → constrained. α ∈ ∆( |Y| ) b T α − ω T Mα min max Ω( ω ) ≤ β Difficult to project when: ◮ Structured sparsity norm (group lasso norm). Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Problem with Hard projection The structured SVM: n ω ∈ R d λ Ω( ω ) + 1 � min max y ∈Y i ( L i ( y ) − � ω, φ i ( y ) � ) n i =1 � �� structured empirical loss Regularization: penalized → constrained. α ∈ ∆( |Y| ) b T α − ω T Mα min max Ω( ω ) ≤ β Difficult to project when: ◮ Structured sparsity norm (group lasso norm). ◮ The output Y is structured: exponential size. Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Standard approaches in literature ◮ Projected gradient algorithm. x ( t +1) = P X ( x ( t ) − η ∇ x L ( x ( t ) , y ( t ) )) y ( t +1) = P Y ( y ( t ) + η ∇ y L ( x ( t ) , y ( t ) )) 3 GM Korpelevich. “The extragradient method for finding saddle points and other problems”. In: Matecon (1976). Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Standard approaches in literature ◮ Projected gradient algorithm. x ( t +1) = P X ( x ( t ) − η ∇ x L ( x ( t ) , y ( t ) )) y ( t +1) = P Y ( y ( t ) + η ∇ y L ( x ( t ) , y ( t ) )) ◮ Projected extra-gradient 3 . x ( t +1) = P X ( x ( t ) − η ∇ x L ( x ( t ) , y ( t ) )) ¯ y ( t +1) = P Y ( y ( t ) + η ∇ y L ( x ( t ) , y ( t ) )) ¯ Intuition: lookahead move: look at what your opponent would do before deciding your move. x ( t +1) = P X ( x ( t ) − η ∇ x L (¯ x ( t +1) , ¯ y ( t +1) )) y ( t +1) = P Y ( y ( t ) + η ∇ y L (¯ x ( t +1) , ¯ y ( t +1) )) Prevents oscillations for non strongly convex objective. 3 GM Korpelevich. “The extragradient method for finding saddle points and other problems”. In: Matecon (1976). Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Standard approaches in literature ◮ Gradient method works for non-smooth optimization, but T 1 � x ( t ) , y ( t ) � � T →∞ ( x ∗ , y ∗ ) − → T t =1 4 N. He and Z. Harchaoui. “Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization”. In: NIPS . 2015. Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Standard approaches in literature ◮ Gradient method works for non-smooth optimization, but T 1 � x ( t ) , y ( t ) � � T →∞ ( x ∗ , y ∗ ) − → T t =1 ◮ Extragradient method works for smooth optimization, ( x ( t ) , y ( t ) ) → ( x ∗ , y ∗ ) 4 N. He and Z. Harchaoui. “Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization”. In: NIPS . 2015. Gauthier Gidel Frank-Wolfe Algorithms for SP 25th May 2017

Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 , - PowerPoint PPT Presentation

Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 , 3 Tony Jebara 2 Simon Lacoste-Julien 3 1 INRIA Paris, Sierra Team 2 Department of CS, Columbia University 3 Department of CS & OR (DIRO) Universit de Montral 25th May 2017

Frank-Wolfe Algorithms for Saddle Point problems author: Gauthier Gidel, Supervisors: Simon

Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 Tony Jebara 2 Simon

WOLFE RESIDENCE 337 Kenmore Road The Douglaston Historic District Kevin Wolfe Architect 1

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Duality (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Saddle-point

Conjugate gradient methods for stochastic Galerkin finite element saddle point matrices B T A

Self-concordant analysis of Frank-Wolfe algorithms Pavel Dvurechensky 1 Shimrit Shtern 2 Mathias

Presentation for: Prospect Name February 17, 2015 Gregg Wolfe-Principal Doreen Guss National

Fortran Programmers Michael Wolfe PGI compiler engineer michael.wolfe@pgroup.com Outline GPU

Search Problems and Algorithms T79.4201 Search Problems and Algorithms (4 ECTS) T-79.4201

Greedy Algorithms, Frank-Wolfe and Friends a modern perspective NIPS 2013 Workshop

The History of Tooled Leather In the 1800s Old West, the saddle was a symbol of stature and

Leather Seat Covers The History of Tooled Leather In the 1800s Old West, the saddle was a

Upper Saddle River Schools Enrollment Study December, 2016 Ross Haber and Associates Selected

Saddle Creek Road Relocation Study Format of the Meeting Open House Brief presentations

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 ,

(Improved) Optimal Triangulation of Saddle Surfaces Computational Geometric Learning (CGL)

On the Stability of the Einstein Static Universe in f(R)-gravity Naureen Goheer University of

UNDERSTANDING- What is? THE CHURCH The People of God The Body of Christ The Temple of

Algorithms for Geographic Data Spring 2016 Lecture 4: Segmentation Motivation: Geese Migration

Modelling the widths of fission observables in GEF K.-H. Schmidt, B. Jurado CENBG, Gradignan,

Gradient Descent and the Structure of Neural Network Cost Functions presentation by Ian

Tuttes Embedding Theorem Reproven and Extended Craig Gotsman Center for Graphics and

Math 5490 11/17/2014 Dynamical Systems Math 5490 November 17, 2014 Topics in Applied

Sambuz

Useful Links

Newsletter

Mail Us