Proximal Method with Contractions for Smooth Convex Optimization - PowerPoint PPT Presentation

Proximal Method with Contractions for Smooth Convex Optimization Nikita Doikov Yurii Nesterov Catholic University of Louvain, Belgium Grenoble September 23, 2019

Plan of the Talk 1. Proximal Method with Contractions 2. Application to Second-Order Methods 3. Numerical Example 2 / 19

Review: Proximal Method f * = min x ∈ R n f ( x ) Proximal Method: {︂ 2 a k + 1 ‖ y − x k ‖ 2 }︂ 1 x k + 1 = argmin f ( y ) + . y ∈ R n [Rockafellar, 1976] ◮ If f is convex, the objective of the subproblem 2 a k + 1 ‖ y − x k ‖ 2 is strongly convex. 1 h k + 1 ( y ) = f ( y ) + ◮ Let f has Lipschitz gradient with constant L 1 . Gradient (︁ )︁ Method needs ˜ O a k + 1 L 1 iterations to minimize h k + 1 . ◮ It is enough to use for x k + 1 an inexact minimizer of h k + 1 . [Solodov-Svaiter, 2001; Schmidt-Roux-Bach, 2011; Salzo-Villa, 2012] x k ) − f * ≤ L 1 ‖ x 0 − x ∗ ‖ 2 1 Set a k + 1 = Then f (¯ L 1 . . 2 k 4 / 19

Accelerated Proximal Method def = ∑︁ k Denote A k i = 1 a i . Two sequences: { x k } k ≥ 0 , and { v k } k ≥ 0 . Initialization: v 0 = x 0 . Iterations , k ≥ 0: 1. Put y k + 1 = a k + 1 v k + A k x k . A k + 1 {︂ k + 1 ‖ y − y k + 1 ‖ 2 }︂ f ( y ) + A k + 1 2. Compute x k + 1 = argmin . 2 a 2 y ∈ R n A k 3. Put v k + 1 = x k + 1 + a k + 1 ( x k + 1 − x k ) . a 2 1 k + 1 Set A k + 1 = L 1 . Then 8 L 1 ‖ x 0 − x ∗ ‖ 2 f ( x k ) − f * ≤ . 3 ( k + 1 ) 2 [Nesterov, 1983; G¨ uler, 1992; Lin-Mairal-Harchaoui, 2015] ◮ A Universal Catalyst for First-Order Optimization. ◮ What about Second-Order Optimization? 5 / 19

New Algorithm: Proximal Method with Contractions Iterations , k ≥ 0: {︂ (︂ )︂ }︂ a k + 1 y + A k x k 1. Compute v k + 1 = argmin + β d ( v k ; y ) A k + 1 f . A k + 1 y ∈ R n 2. Put x k + 1 = a k + 1 v k + 1 + A k x k . A k + 1 β d ( x ; y ) is Bregman Divergence. Basic setup: β d ( x ; y ) = 1 2 ‖ y − x ‖ 2 . Then (︃ )︃ (︂ )︂ 2 ‖ y − v k ‖ 2 = A k + 1 a k + 1 y + A k x k y )+ A k + 1 + 1 y − y k + 1 ‖ 2 f (˜ k + 1 ‖ ˜ A k + 1 f , 2 a 2 A k + 1 y ≡ a k + 1 y + A k x k and y k + 1 ≡ a k + 1 v k + A k x k where ˜ . A k + 1 A k + 1 ◮ The same iteration as in Accelerated Proximal Method . ◮ Generalization to arbitrary prox-function d ( · ) . 6 / 19

Bregman Divergence Let d ( y ) be a convex differentiable function. Denote Bregman Divergence of d ( · ) , centered at x as def β d ( x ; y ) = d ( y ) − d ( x ) − ⟨∇ d ( x ) , y − x ⟩ ≥ 0 . ◮ Mirror Descent [Nemirovski-Yudin, 1979] ◮ Gradient Methods with Relative Smoothness [Lu-Freund-Nesterov, 2016; Bauschke-Bolte-Teboulle, 2016] Consider regularization of convex g ( · ) by Bregman Divergence: h ( y ) ≡ g ( y ) + β d ( v ; y ) . Main Lemma. T = argmin h ( y ) . Then y ∈ R n h ( y ) ≥ h ( T ) + β d ( T ; y ) . 7 / 19

Proximal Method with Contractions: the Main Idea We want , for all y ∈ R n : β d ( x 0 ; y ) + A k f ( y ) ≥ β d ( v k ; y ) + A k f ( x k ) . ($) def How to propagate it to k + 1 ? Denote a k + 1 = A k + 1 − A k > 0. β d ( x 0 ; y ) + A k + 1 f ( y ) ≡ β d ( x 0 ; y ) + A k f ( y ) + a k + 1 f ( y ) ($) ≥ β d ( v k ; y ) + A k f ( x k ) + a k + 1 f ( y ) (︂ )︂ a k + 1 y + A k x k ≥ β d ( v k ; y ) + A k + 1 f ≡ h k + 1 ( y ) . A k + 1 Let v k + 1 = argmin h k + 1 ( y ) . Then, by the Main Lemma, y ∈ R n h k + 1 ( y ) ≥ h k + 1 ( v k + 1 ) + β d ( v k + 1 ; y ) (︂ a k + 1 v k + 1 + A k x k )︂ ≥ A k + 1 f + β d ( v k + 1 ; y ) . A k + 1 ⏟ ⏞ ≡ x k + 1 8 / 19

Proximal Method with Contractions Iterations , k ≥ 0: {︂ (︂ )︂ }︂ a k + 1 y + A k x k 1. Compute v k + 1 = argmin + β d ( v k ; y ) A k + 1 f . A k + 1 y ∈ R n 2. Put x k + 1 = a k + 1 v k + 1 + A k x k . A k + 1 Rate of convergence: β d ( x 0 ; x ∗ ) f ( x k ) − f * ≤ . A k Questions: ◮ How to choose A k ? Prox-function d ( · ) ? ◮ How to compute v k + 1 ? 9 / 19

Newton Method with Cubic Regularization h * = min x ∈ R n h ( x ) h is convex, with Lipschitz continuous Hessian: ‖∇ 2 h ( x ) − ∇ 2 h ( y ) ‖ ≤ L 2 ‖ x − y ‖ . Model of the objective def h ( x ) + ⟨∇ h ( x ) , y − x ⟩ + 1 2 ⟨∇ 2 h ( x )( y − x ) , y − x ⟩ Ω M ( x ; y ) = + M 6 ‖ y − x ‖ 3 Iterations: z t + 1 := Ω M ( z t ; y ) , t ≥ 0 . argmin y ∈ R n Newton method with Cubic regularization [Nesterov-Polyak, 2006] ◮ Global convergence (︂ )︂ h ( z t ) − h * ≤ O L 2 R 3 . t 2 11 / 19

Computing inexact Proximal Step Apply Cubic Newton to compute the Proximal Step: (︂ )︂ a k + 1 y + A k x k h k + 1 ( y ) ≡ A k + 1 f + β d ( v k ; y ) → min A k + 1 y ∈ R n ◮ Pick d ( x ) = 1 3 ‖ x − x 0 ‖ 3 . ◮ Uniformly convex objective: β h ( x ; y ) ≥ 1 6 ‖ y − x ‖ 3 . Linear rate of convergence for Cubic Newton: (︂ (︂ )︂ )︂ h ( z t ) − h * t ( h ( z 0 ) − h * ) ≤ exp − O √ L 2 . ◮ Let v k + 1 be inexact Proximal Step: ‖∇ h k + 1 ( v k + 1 ) ‖ * ≤ δ k + 1 . Theorem (︁ )︁ 3 / 2 3 − 2 / 3 ‖ x 0 − x ∗ ‖ 2 + 6 1 / 3 ∑︁ k i = 1 δ i f ( x k ) − f * ≤ A k (︂√︁ )︂ 1 ◮ O L 2 ( h k + 1 ) log iterations of Cubic Newton for step k . δ k + 1 12 / 19

The choice of A k (︂ )︂ a k + 1 y + A k x k Contracted objective: g k + 1 ( y ) ≡ A k + 1 f . A k + 1 Derivatives (︂ )︂ a k + 1 y + A k x k 1. Dg k + 1 ( y ) = a k + 1 Df , A k + 1 (︂ )︂ a 2 a k + 1 y + A k x k 2. D 2 g k + 1 ( y ) = A k + 1 D 2 f k + 1 , A k + 1 (︂ )︂ a 3 a k + 1 y + A k x k 3. D 3 g k + 1 ( y ) = k + 1 D 3 f k + 1 , A 2 A k + 1 . . . a p + 1 Notice: D p + 1 f ⪯ L p ( f ) ⇒ D p + 1 g k + 1 ⪯ k + 1 k + 1 L p ( f ) . Therefore, A p a p + 1 1 k + 1 if we have ≤ then L p ( g k + 1 ) ≤ 1. A p L p ( f ) k + 1 k 3 ◮ For Cubic Newton ( p = 2) set A k = L 2 ( f ) . We obtain (︁ 1 )︁ accelerated rate of convergence: O . k 3 13 / 19

High-Order Proximal Accelerated Scheme Basic Method p = 1 : Gradient Method. p = 2 : Newton method with Cubic regularization. p = 3 : Third order methods (admits effective implementation) [Grapiglia-Nesterov, 2019] . . . . p + 1 ‖ x − x 0 ‖ p + 1 . Set A k = k p + 1 1 ◮ Prox-function: d ( x ) = L p ( f ) . ◮ Let δ k = c k 2 . Theorem (︂ )︂ L p ( f ) ‖ x 0 − x ∗ ‖ p + 1 f ( x k ) − f * ≤ O . k p + 1 (︂ )︂ log 1 ◮ O steps of Basic Method every iteration. δ k 14 / 19

Log-sum-exp (︃ m )︃ ∑︁ e ⟨ a i , x ⟩ x ∈ R n f ( x ) = log min . i = 1 ◮ a 1 , . . . , a m ∈ R n — given data. m ∑︁ i ⪰ 0, and use ‖ x ‖ ≡ ⟨ Bx , x ⟩ 1 / 2 . ◮ Denote B ≡ a i a T i = 1 ◮ We have L 1 ≤ 1 , L 2 ≤ 2 . 16 / 19

Log-sum-exp: convergence Minimizing log-sum-exp, n=10, m=30 10 0 squared gradient norm 10 2 10 4 GD AGD 10 6 APM, p=1 CN ACN 10 8 APM, p=2 0 20 40 60 80 100 iterations 17 / 19

Log-sum-exp: inner steps APM, p = 2 7 6 number of inner iterations, t k 5 4 3 2 1 0 10 20 30 40 50 iterations, k 18 / 19

Conclusion Two ingredients ◮ Bregman divergence β d ( v k ; y ) . ◮ Contraction operator (︂ )︂ a k + 1 y + A k x k f ( y ) ↦→ f . A k + 1 Direct acceleration vs. Proximal acceleration (︂ )︂ (︂ )︂ and ˜ 1 1 ◮ The rates are: O O , for the methods of k p + 1 k p + 1 order p ≥ 1. ◮ In practice, the number of inner steps is a constant. ◮ Proximal acceleration is more general — useful for stochastic and distributed optimization. Thank you for your attention! 19 / 19

Proximal Method with Contractions for Smooth Convex Optimization - PowerPoint PPT Presentation

Proximal Method with Contractions for Smooth Convex Optimization Nikita Doikov Yurii Nesterov Catholic University of Louvain, Belgium Grenoble September 23, 2019 Plan of the Talk 1. Proximal Method with Contractions 2. Application to

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Stochastic Perturbations of Proximal-Gradient methods for nonsmooth convex optimization: the

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization L. T. K. Hien 1 N. Gillis 1

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Minimizing within convex bodies using a convex hull method Edouard Oudet Thomas

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS133 Computational Geometry Convex Hull 4/12/2018 1 Convex Hull Given a set of n points,

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi

Stochastic Optimization for Learning over Networks Guanghui (George) Lan School of Industrial

Inner and Outer Approximating Flowpipes for Delay Differential Equations Eric Goubault 1 Sylvie

Plug & Manage Heterogeneous Sensing Devices Levent Grgen, Johan Nystrm-Persson, Amin

Building Hardened Implementations of SCADA/ICS Protocols Using Language-Theoretic Security

Edit Timelines & Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda

CSCE 970 Lecture 7: Earth, cant afford to visit each area to deter- Clustering: Basic Concepts

Convex Optimization in Machine Learning and Inverse Problems Part 3: Augmented Lagrangian Methods

Proximal Method with Contractions for Smooth Convex Optimization - PowerPoint PPT Presentation

Proximal Method with Contractions for Smooth Convex Optimization Nikita Doikov Yurii Nesterov Catholic University of Louvain, Belgium Grenoble September 23, 2019 Plan of the Talk 1. Proximal Method with Contractions 2. Application to

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Stochastic Perturbations of Proximal-Gradient methods for nonsmooth convex optimization: the

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization L. T. K. Hien 1 N. Gillis 1

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Minimizing within convex bodies using a convex hull method Edouard Oudet Thomas

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS133 Computational Geometry Convex Hull 4/12/2018 1 Convex Hull Given a set of n points,

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi

Stochastic Optimization for Learning over Networks Guanghui (George) Lan School of Industrial

Inner and Outer Approximating Flowpipes for Delay Differential Equations Eric Goubault 1 Sylvie

Plug &amp; Manage Heterogeneous Sensing Devices Levent Grgen*, Johan Nystrm-Persson*, Amin

Building Hardened Implementations of SCADA/ICS Protocols Using Language-Theoretic Security

Edit Timelines &amp; Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda

CSCE 970 Lecture 7: Earth, cant afford to visit each area to deter- Clustering: Basic Concepts

Convex Optimization in Machine Learning and Inverse Problems Part 3: Augmented Lagrangian Methods

Plug & Manage Heterogeneous Sensing Devices Levent Grgen, Johan Nystrm-Persson, Amin

Edit Timelines & Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda