Boosting Frank-Wolfe by Chasing Gradients Cyrille W. Combettes . - PowerPoint PPT Presentation

Boosting Frank-Wolfe by Chasing Gradients Cyrille W. Combettes . with Sebastian Pokutta School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA, USA 37 th International Conference on Machine Learning July 12–18, 2020

Outline 1 Introduction 2 The Frank-Wolfe algorithm 3 Boosting Frank-Wolfe 4 Computational experiments 2/19

Introduction Let H be a Euclidean space (e.g., R n or R m × n ) and consider min f ( x ) s.t. x ∈ C where • f : H → R is a smooth convex function • C ⊂ H is a compact convex set, C = conv( V ) 3/19

Introduction Let H be a Euclidean space (e.g., R n or R m × n ) and consider min f ( x ) s.t. x ∈ C where • f : H → R is a smooth convex function • C ⊂ H is a compact convex set, C = conv( V ) Example • Sparse logistic regression • Low-rank matrix completion m 1 � 1 � ( Y i , j − X i , j ) 2 min ln(1 + exp( − y i a ⊤ min i x )) 2 |I| X ∈ R m × n m x ∈ R n ( i , j ) ∈I i =1 s.t. � X � nuc � τ s.t. � x � 1 � τ 3/19

Introduction • A natural approach is to use any efficient method and add projections back onto C to ensure feasibility 4/19

Introduction • A natural approach is to use any efficient method and add projections back onto C to ensure feasibility x t − γ t ∇ f ( x t ) x t +1 x t 4/19

Introduction • A natural approach is to use any efficient method and add projections back onto C to ensure feasibility • However, in many situations projections onto C are very expensive 4/19

Introduction • A natural approach is to use any efficient method and add projections back onto C to ensure feasibility • However, in many situations projections onto C are very expensive • This is an issue with the method of projections, not necessarily with the geometry of C : linear minimizations over C can still be relatively cheap 4/19

Introduction • A natural approach is to use any efficient method and add projections back onto C to ensure feasibility • However, in many situations projections onto C are very expensive • This is an issue with the method of projections, not necessarily with the geometry of C : linear minimizations over C can still be relatively cheap Feasible region C Linear minimization Projection ℓ 1 / ℓ 2 / ℓ ∞ -ball O ( n ) O ( n ) ℓ p -ball, p ∈ ]1 , ∞ [ \{ 2 } O ( n ) N/A Nuclear norm-ball O (nnz) O ( mn min { m , n } ) O ( n 3 . 5 ) Flow polytope O ( n ) O ( n 3 ) Birkhoff polytope N/A Matroid polytope O ( n ln( n )) O (poly( n )) N/A: no closed-form exists and solution must be computed via nontrivial optimization 4/19

Introduction • A natural approach is to use any efficient method and add projections back onto C to ensure feasibility • However, in many situations projections onto C are very expensive • This is an issue with the method of projections, not necessarily with the geometry of C : linear minimizations over C can still be relatively cheap Feasible region C Linear minimization Projection ℓ 1 / ℓ 2 / ℓ ∞ -ball O ( n ) O ( n ) ℓ p -ball, p ∈ ]1 , ∞ [ \{ 2 } O ( n ) N/A Nuclear norm-ball O (nnz) O ( mn min { m , n } ) O ( n 3 . 5 ) Flow polytope O ( n ) O ( n 3 ) Birkhoff polytope N/A Matroid polytope O ( n ln( n )) O (poly( n )) N/A: no closed-form exists and solution must be computed via nontrivial optimization • Can we avoid projections? 4/19

The Frank-Wolfe algorithm The Frank-Wolfe algorithm (Frank & Wolfe, 1956) a.k.a. conditional gradient algorithm (Levitin & Polyak, 1966): −∇ f ( x t ) Algorithm Frank-Wolfe (FW) Input: x 0 ∈ C , γ t ∈ [0 , 1]. x t 1: for t = 0 to T − 1 do 2: v t ← arg min �∇ f ( x t ) , v � v ∈V x t +1 ← x t + γ t ( v t − x t ) 3: 5/19

The Frank-Wolfe algorithm The Frank-Wolfe algorithm (Frank & Wolfe, 1956) a.k.a. conditional gradient algorithm (Levitin & Polyak, 1966): −∇ f ( x t ) Algorithm Frank-Wolfe (FW) Input: x 0 ∈ C , γ t ∈ [0 , 1]. v t x t 1: for t = 0 to T − 1 do 2: v t ← arg min �∇ f ( x t ) , v � v ∈V x t +1 ← x t + γ t ( v t − x t ) 3: 5/19

The Frank-Wolfe algorithm The Frank-Wolfe algorithm (Frank & Wolfe, 1956) a.k.a. conditional gradient algorithm (Levitin & Polyak, 1966): −∇ f ( x t ) Algorithm Frank-Wolfe (FW) Input: x 0 ∈ C , γ t ∈ [0 , 1]. v t x t x t +1 1: for t = 0 to T − 1 do 2: v t ← arg min �∇ f ( x t ) , v � v ∈V x t +1 ← x t + γ t ( v t − x t ) 3: 5/19

The Frank-Wolfe algorithm The Frank-Wolfe algorithm (Frank & Wolfe, 1956) a.k.a. conditional gradient algorithm (Levitin & Polyak, 1966): −∇ f ( x t ) Algorithm Frank-Wolfe (FW) Input: x 0 ∈ C , γ t ∈ [0 , 1]. v t x t x t +1 1: for t = 0 to T − 1 do 2: v t ← arg min �∇ f ( x t ) , v � v ∈V x t +1 ← x t + γ t ( v t − x t ) 3: • x t +1 is obtained by convex combination of x t ∈ C and v t ∈ C , thus x t +1 ∈ C 5/19

The Frank-Wolfe algorithm The Frank-Wolfe algorithm (Frank & Wolfe, 1956) a.k.a. conditional gradient algorithm (Levitin & Polyak, 1966): −∇ f ( x t ) Algorithm Frank-Wolfe (FW) Input: x 0 ∈ C , γ t ∈ [0 , 1]. v t x t x t +1 1: for t = 0 to T − 1 do 2: v t ← arg min �∇ f ( x t ) , v � v ∈V x t +1 ← x t + γ t ( v t − x t ) 3: • x t +1 is obtained by convex combination of x t ∈ C and v t ∈ C , thus x t +1 ∈ C • FW uses linear minimizations (the “FW oracle”) instead of projections 5/19

The Frank-Wolfe algorithm The Frank-Wolfe algorithm (Frank & Wolfe, 1956) a.k.a. conditional gradient algorithm (Levitin & Polyak, 1966): −∇ f ( x t ) Algorithm Frank-Wolfe (FW) Input: x 0 ∈ C , γ t ∈ [0 , 1]. v t x t x t +1 1: for t = 0 to T − 1 do 2: v t ← arg min �∇ f ( x t ) , v � v ∈V x t +1 ← x t + γ t ( v t − x t ) 3: • x t +1 is obtained by convex combination of x t ∈ C and v t ∈ C , thus x t +1 ∈ C • FW uses linear minimizations (the “FW oracle”) instead of projections • FW = pick a vertex (using gradient information) and move in that direction 5/19

Boosting Frank-Wolfe by Chasing Gradients Cyrille W. Combettes . - PowerPoint PPT Presentation

Boosting Frank-Wolfe by Chasing Gradients Cyrille W. Combettes . with Sebastian Pokutta School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA, USA 37 th International Conference on Machine Learning July

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

WOLFE RESIDENCE 337 Kenmore Road The Douglaston Historic District Kevin Wolfe Architect 1

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Chasing Bottoms Nils Anders Danielsson Patrik Jansson Chalmers Chasing Bottoms p.1/7

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Blended Conditional Gradients: The unconditioning of conditional gradients Joint work with Gabor

Outline Last time Image gradients Seam carving gradients as energy Edges

Franciss Algorithm as a Core-Chasing Algorithm David S. Watkins Department of Mathematics

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Presentation for: Prospect Name February 17, 2015 Gregg Wolfe-Principal Doreen Guss National

Fortran Programmers Michael Wolfe PGI compiler engineer michael.wolfe@pgroup.com Outline GPU

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

Natural Policy Gradients (cont.) Katerina Fragkiadaki Revision Policy Gradients 1.

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

A PPLICATION : S EARCH IN T OURISM (S KY S CANNER ) Goal: search for hotels/flights/trips using

Co-visualiza+on of full data and in situ data extracts

tvz@insead.edu INSEAD (France) Presentation at DIMACS Workshop on Bounded Rationality

Morphing ensemble Kalman filter and applications Jan Mandel and Jonathan D. Beezley Center for

CONSTRAINT-BASED PLANNING AND SCHEDULING k Ro om ma an n B Ba ar rt t k Ch ha ar

OPERATIONS CHALLENGE LABORATORY PROCEDURE 2019 Version 9.3.19 Goal Analyzing and determining t

Geriatrics Board Review Daniel Pound, MD Clinical Professor Family and Community Medicine, UCSF

No More Wasted Time: Ideas for Grabbing and Maintaining Your Clients' Attention Angie Hasemann,

Sambuz

Useful Links

Newsletter

Mail Us