Convex optimization based on global lower second-order models - PowerPoint PPT Presentation

Convex optimization based on global lower second-order models Nikita Doikov Yurii Nesterov UCLouvain, Belgium NeurIPS 2020

Problem Composite convex optimization problem: def min x F ( x ) = f ( x ) + ψ ( x ) ◮ f is convex, differentiable. ◮ ψ : R n → R ∪ { + ∞} is convex, simple. ◮ dom ψ is bounded. D def = diam ( dom ψ ) . Example: {︄ ‖ x ‖ ≤ D 0 , 2 , ψ ( x ) = + ∞ , otherwise . ⇒ The problem with ball-regularization: min f ( x ) ‖ x ‖≤ D 2 2 / 13

Review: Gradient Methods Let ∇ f be Lipschitz continuous: ‖∇ f ( y ) − ∇ f ( x ) ‖ * ≤ L ‖ y − x ‖ . The Gradient Method: {︂ }︂ 2 ‖ y − x k ‖ 2 + ψ ( y ) f ( x k ) + ⟨∇ f ( x k ) , y − x k ⟩ + L x k + 1 = argmin . y ◮ Global convergence: F ( x k ) − F * ≤ O ( 1 k ) . The Conditional Gradient Method [Frank-Wolfe, 1956] : {︂ }︂ v k + 1 = f ( x k ) + ⟨∇ f ( x k ) , y − x k ⟩ + ψ ( y ) argmin , y = γ k v k + 1 + ( 1 − γ k ) x k . x k + 1 k + 2 . Then F ( x k ) − F * ≤ O ( 1 2 ◮ Set γ k = k ) . Note: Near-optimal for ‖ · ‖ ∞ -balls [Guzm´ an-Nemirovski, 2015] . 3 / 13

Review: Second-Order Methods Let ∇ 2 f be Lipschitz continuous: ‖∇ 2 f ( x ) − ∇ 2 f ( y ) ‖ ≤ L ‖ x − y ‖ . Newton Method : {︂ ⟨∇ f ( x k ) , y − x k ⟩ + 1 2 ⟨∇ 2 f ( x k )( y − x k ) , y − x k ⟩ = x k + 1 argmin y }︂ + ψ ( y ) . ◮ Quadratic convergence (if ∇ 2 f ( x * ) ≻ 0 and x 0 close to x * ). ◮ No global convergence. A heuristic: use line-search in practice. Newton Method with Cubic Regularization : {︂ ⟨∇ f ( x k ) , y − x k ⟩ + 1 2 ⟨∇ 2 f ( x k )( y − x k ) , y − x k ⟩ x k + 1 = argmin y }︂ 6 ‖ y − x k ‖ 3 + ψ ( y ) L + . ◮ Global rate: F ( x k ) − F * ≤ O ( 1 k 2 ) [Nesterov-Polyak, 2006] . 4 / 13

Overview of the Contributions New second-order algorithms with global convergence proofs. ◮ The methods are universal (no unknown parameters). ◮ Affine-invariant (the norm is not fixed). Stochastic methods (basic and with the variance reduction). Numerical experiments. 5 / 13

Second-Order Lower Model 1. f is convex: f ( y ) ≥ f ( x ) + ⟨∇ f ( x ) , y − x ⟩ . 2. ∇ 2 f is Lipschitz continuous: ‖∇ 2 f ( x ) − ∇ 2 f ( y ) ‖ ≤ L ‖ x − y ‖ . Convexity + Smoothness ⇒ tighter lower bound : ∀ t ∈ [ 0 , 1 ] 2 ⟨∇ 2 f ( x )( y − x ) , y − x ⟩ − t 2 L ‖ y − x ‖ 3 f ( x ) + ⟨∇ f ( x ) , y − x ⟩ + t f ( y ) ≥ . 6 4 3 2 1 0 Second-order First-order −1 −3 −2 −1 0 1 2 3 4 6 / 13

New Algorithm Contracting-Domain Newton Method: {︂ 2 ⟨∇ 2 f ( x k )( y − x k ) , y − x k ⟩ v k + 1 = argmin ⟨∇ f ( x k ) , y − x k ⟩ + γ k y }︂ + ψ ( y ) , = γ k v k + 1 + ( 1 − γ k ) x k . x k + 1 7 / 13

Trust-Region Interpretation Contracting-Domain Newton Method (reformulation): {︂ ⟨∇ f ( x k ) , y − x k ⟩ + 1 2 ⟨∇ 2 f ( x k )( y − x k ) , y − x k ⟩ = x k + 1 argmin y }︂ + γ k ψ ( x k + 1 γ k ( y − x k )) . Regularization of quadratic model by the asymmetric trust region. 8 / 13

Global Convergence Let ∇ 2 f be Lipschitz continuous: ‖∇ 2 f ( x ) − ∇ 2 f ( y ) ‖ ≤ L ‖ x − y ‖ (w.r.t. arbitrary norm). 3 Theorem 1. Set γ k = k + 3 . Then O ( LD 3 F ( x k ) − F * ≤ k 2 ) . Theorem 2. Let ψ be strongly convex with parameter µ > 0. 5 ◮ Set γ k = k + 5 . Then µ · LD 3 F ( x k ) − F * O ( LD ≤ k 4 ) . ]︂ 1 2 . Then 1 + ω , where ω def [︂ 1 LD ◮ Set γ k = = 2 µ (︂ )︂ LD 3 − k − 1 F ( x k ) − F * ≤ exp 2 . 1 + ω 9 / 13

Experiments: Logistic Regression M ∑︁ (︁ )︁ min f i ( x ) , f i ( x ) = log( 1 + exp ⟨ a i , x ⟩ ) . ‖ x ‖ 2 ≤ D i = 1 2 D plays the role of regularization parameter. w8a, D = 20 w8a, D = 100 Frank-Wolfe 10 1 10 0 Grad. Method 10 0 10 1 Fast Grad. Method 10 1 Contr. Newton 10 2 Func. residual Func. residual 2.4s 10 2 Aggr. Newton 5.1s 10 3 10 3 0.5s 10 4 2.5s 10 4 10 5 6.99s 5s 10 5 7s 0.25s 4.58s 10 6 4.48s 10 6 0.28s 4.59s 10 7 10 7 0 50 100 150 200 0 500 1000 1500 2000 Iterations Iterations For bigger D the problem becomes more ill-conditioned . 10 / 13

Stochastic Methods for Logistic Regression Approximate ∇ f ( x ) , ∇ 2 f ( x ) by stochastic estimates. YearPredictionMSD, D = 20 SGD 10 0 SVRG SNewton 10 1 SVRNewton Func. residual 20s 10 2 19.75s 10 3 20.14s 10 4 19.85s 10 5 10 6 0 50 100 150 200 Epochs The problem with big dataset size ( M = 463715) and small dimension ( n = 90). 11 / 13

Conclusions Second-order information helps in a case of ◮ ill-conditioning; ◮ small or moderate dimension (the subproblems are more expensive). No need to tune stepsize. Can be preferable for solving problems over the sets with a non-Euclidean geometry. 12 / 13

Follow Up Results Nikita Doikov and Yurii Nesterov. “Affine-invariant contracting- point methods for Convex Optimization”. In: arXiv:2009.08894 (2020) ◮ General framework of Contracting-Point Methods. ◮ Contracting-Point Tensor Methods of order p ≥ 1: F ( x k ) − F * O ( 1 ≤ k p ) . ◮ Affine-invariant smoothness condition ⇒ Affine-invariant analysis. Thank you for your attention! 13 / 13

Convex optimization based on global lower second-order models - PowerPoint PPT Presentation

Convex optimization based on global lower second-order models Nikita Doikov Yurii Nesterov UCLouvain, Belgium NeurIPS 2020 Problem Composite convex optimization problem: def min x F ( x ) = f ( x ) + ( x ) f is convex,

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

16. Review of convex optimization Convex sets and functions Convex programming models

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Unit Testing a C++ Database Application with Unit Testing a C++ Database Application with Unit

Transparently Scale-out SQL Databases with Data Grids Erik Brandsberg, CTO Heimdall Data Agenda

Modern SQL: Evolution of a dinosaur Markus Winand Krakw, 9-11 May 2018 Still using Windows

Claudia Frugiuele Mixing stops at the LHC in collaboration with P . Agrawal hep ph 1304.3068

CMPS 112: Spring 2019 Comparative Programming Languages Datatypes and Recursion

CAQE: A Certifying QBF Solver FMCAD Austin, Texas, September 29 2015 1 / 15 Markus N. Rabe 1

Non-linear Interpolant Generation and Its Application to Program Verification Naijun Zhan State

Convex optimization based on global lower second-order models - PowerPoint PPT Presentation

Convex optimization based on global lower second-order models Nikita Doikov Yurii Nesterov UCLouvain, Belgium NeurIPS 2020 Problem Composite convex optimization problem: def min x F ( x ) = f ( x ) + ( x ) f is convex,

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

16. Review of convex optimization Convex sets and functions Convex programming models

Faster convex optimization Simulated annealing &amp; Interior point Elad Hazan Joint work with

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Unit Testing a C++ Database Application with Unit Testing a C++ Database Application with Unit

Transparently Scale-out SQL Databases with Data Grids Erik Brandsberg, CTO Heimdall Data Agenda

Modern SQL: Evolution of a dinosaur Markus Winand Krakw, 9-11 May 2018 Still using Windows

Claudia Frugiuele Mixing stops at the LHC in collaboration with P . Agrawal hep ph 1304.3068

CMPS 112: Spring 2019 Comparative Programming Languages Datatypes and Recursion

CAQE: A Certifying QBF Solver FMCAD Austin, Texas, September 29 2015 1 / 15 Markus N. Rabe 1

Non-linear Interpolant Generation and Its Application to Program Verification Naijun Zhan State

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with