Benefiting from Negative Curvature Daniel P. Robinson Johns Hopkins - PowerPoint PPT Presentation

Benefiting from Negative Curvature Daniel P. Robinson Johns Hopkins University Department of Applied Mathematics and Statistics Collaborator: Frank E. Curtis (Lehigh University) US and Mexico Workshop on Optimization and Its Applications Huatulco, Mexico January 8, 2018 Negative Curvature US-Mexico-2018 1 / 31

Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 2 / 31

Motivation Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 3 / 31

Motivation Problem of interest: deterministic setting minimize f ( x ) x ∈ R n f : R n → R assumed to be twice-continuously differentiable. L will denote the Lipschitz constant for ∇ f σ will denote the Lipschitz constant for ∇ 2 f f may be nonconvex Notation: g ( x ) := ∇ f ( x ) H ( x ) := ∇ 2 f ( x ) Negative Curvature US-Mexico-2018 4 / 31

Motivation Much work has been done on convergence two second-order points: D. Goldfarb (1979) [6] - prove convergence result to second-order optimal points (unconstrained) - curvilinear search using descent direction and negative curvature direction D. Goldfarb, C. Mu, J. Wright, and C. Zhou (2017) [7] - consider equality constrained problems - prove convergence result to second-order optimal points - extend curvilinear search for unconstrained F. Facchinei and S. Lucidi (1998) [3] - consider inequality constrained problems - exact penalty function, directions of negative curvature, and line search P. Gill, V. Kungurtsev, and D. Robinson (2017) [4, 5] - consider inequality constrained problems - convergence to second-order optimal points under weak assumptions J. Moré and D. Sorensen (1979), A. Forsgren, P. Gill, and W. Murray (1995), and many more . . . None consistently perform better by using directions of negative curvature! Negative Curvature US-Mexico-2018 5 / 31

Motivation Others hope to avoid saddle-points: J. Lee, M. Simchowich, M. Jordan, and B. Recht (2016) [8] - Gradient descent converges to local minimizer almost surely. - Uses random initialization. Y. Dauphin et al. (2016) [2] - Present a saddle-free Newton method (it is a modified-Newton method) - Goal is to escape saddle points (move away when close) These (and others) try to avoid the ill-effects of negative curvature. Negative Curvature US-Mexico-2018 6 / 31

Motivation Purpose of this research: Design a method that consistently performs better by using directions of negative curvature. Do not try to avoid negative curvature. Use it! Negative Curvature US-Mexico-2018 7 / 31

Deterministic Setting Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 8 / 31

Deterministic Setting The Method Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 9 / 31

Deterministic Setting The Method Overview: Compute descent direction ( s k ) and negative curvature direction ( d k ). Predict which step will make more progress in reducing the objective f . If predicted decrease is not realized, adjust parameters. Iterate until an approximate second-order solution is obtained. Negative Curvature US-Mexico-2018 10 / 31

Deterministic Setting The Method Requirements on the descent direction s k Compute s k to satisfy − g ( x k ) T s k ≥ δ � s k � 2 � g ( x k ) � 2 � � some δ ∈ ( 0 , 1 ] Examples: s k = − g ( x k ) B k s k = − g k with B k appropriately chosen Requirements on the negative curvature direction d k Compute d k to satisfy d T k H ( x k ) d k ≤ γλ k � d k � 2 � � 2 < 0 some γ ∈ ( 0 , 1 ] g ( x k ) T d k ≤ 0 Examples: d k = ± v k with ( λ k , v k ) being the left-most eigenpair of H ( x k ) d k a sufficiently accurate estimate of ± v k Negative Curvature US-Mexico-2018 11 / 31

Deterministic Setting The Method How to use s k and d k ? Use both in a curvilinear linesearch? - Often taints good descent directions by "poorly scaled" directions of negative curvature. - No consistent performance gains! Start using d k only once � g ( x k ) � is “small"? - No consistent performance gains! - Misses areas of the space in which great decrease in f is possible. Use s k when � g ( x k ) � is big relative to | ( λ k ) − | . Otherwise, use d k ? - Better, but still inconsistent performance gains! We propose to use upper-bounding models. It works! Negative Curvature US-Mexico-2018 12 / 31

Deterministic Setting The Method Predicted decrease along descent direction s k If L k ≥ L , then � � f ( x k + α s k ) ≤ f ( x k ) − m s , k ( α ) for all α with m s , k ( α ) := − α g ( x k ) T s k − 1 2 L k α 2 � s k � 2 2 and define the quantity α k := − g ( x k ) T s k = argmax m s , k ( α ) L k � s k � 2 α ≥ 0 2 Comments m s , k ( α k ) is the best predicted decrease along s k If s k = − g ( x k ) , then α k = 1 / L k Negative Curvature US-Mexico-2018 13 / 31

Deterministic Setting The Method Predicted decrease along the negative curvature direction d k If σ k ≥ σ , then � � f ( x k + β d k ) ≤ f ( x k ) − m d , k ( β ) for all β with k H ( x k ) d k − σ k m d , k ( β ) := − β g ( x k ) T d k − 1 2 β 2 d T 6 β 3 � d k � 3 2 and define, with c k := d T k H ( x k ) d k , the quantity � � � c 2 k − 2 σ k � d k � 3 − c k + 2 g ( x k ) T d k β k := = argmax m d , k ( β ) σ k � d k � 3 β ≥ 0 2 Comments m d , k ( β k ) is the best predicted decrease along d k Negative Curvature US-Mexico-2018 14 / 31

Deterministic Setting The Method Choose the step that predicts the largest decrease in f . If m s , k ( α k ) ≥ m d , k ( β k ) , then Try the step s k If m d , k ( β k ) > m s , k ( α k ) , then Try the step d k Question: Why “Try" instead of “Use"? Answer: We do not know if L k ≥ L and σ k ≥ σ - If L k < L , then it could be the case that f ( x k + α k s k ) > f ( x k ) − m s , k ( α k ) - If σ k < σ , then it could be the case that f ( x k + β k d k ) > f ( x k ) − m d , k ( β k ) Negative Curvature US-Mexico-2018 15 / 31

Deterministic Setting The Method Dynamic Step-Size Algorithm 1: for k ∈ N do compute s k and d k satisfying the required step conditions 2: loop 3: compute α k = argmax m s , k ( α ) and β k = argmax m d , k ( β ) 4: α ≥ 0 β ≥ 0 if m s , k ( α k ) ≥ m d , k ( β k ) then 5: if f ( x k + α k s k ) ≤ f ( x k ) − m s , k ( α k ) then 6: set x k + 1 ← x k + α k s k and then exit loop 7: else 8: set L k ← ρ L k [ ρ ∈ ( 1 , ∞ ) ] 9: else 10: if f ( x k + β k d k ) ≤ f ( x k ) − m d , k ( β k ) then 11: set x k + 1 ← x k + β k d k and then exit loop 12: else 13: set σ k ← ρσ k 14: set ( L k + 1 , σ k + 1 ) ∈ ( L min , L k ] × ( σ min , σ k ] 15: Negative Curvature US-Mexico-2018 16 / 31

Deterministic Setting Convergence Results Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 17 / 31

Deterministic Setting Convergence Results Key decrease inequality: For all k ∈ N it holds that � δ 2 2 , 2 γ 3 � � g ( x k ) � 2 | ( λ k ) − | 3 f ( x k ) − f ( x k + 1 ) ≥ max . 3 σ 2 2 L k k Comments: First term in the max holds when x k + 1 = x k + α k s k . Second term in the max holds when x k + 1 = x k + β k d k . The above max holds because we choose whether to try s k or d k based on m s , k ( α k ) ≥ m d , k ( β k ) Can prove that { L k } and { σ k } remain uniformly bounded. Negative Curvature US-Mexico-2018 18 / 31

Deterministic Setting Convergence Results Theorem (Limit points satisfy second-order necessary conditions) The computed iterates satisfy k →∞ � g ( x k ) � 2 = 0 and lim inf k →∞ λ k ≥ 0 lim Theorem (Complexity result) The number of iterations, function, and derivative (i.e., gradient and Hessian) evaluations required until some iteration k ∈ N is reached with � g ( x k ) � 2 ≤ ǫ g and | ( λ k ) − | ≤ ǫ H is at most O ( max { ǫ − 2 g , ǫ − 3 H } ) Negative Curvature US-Mexico-2018 19 / 31

Deterministic Setting Numerical Results Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 20 / 31

Deterministic Setting Numerical Results Refined parameter increase strategy � � f ( x k + α k s k ) − f ( x k ) + m s , k ( α k ) L k ← L k + 2 ˆ α 2 k � s k � 2 � � σ k ← σ k + 6 f ( x k + β k d k ) − f ( x k ) + m d , k ( β k ) ˆ β 3 k � d k � 3 then, with ρ ← 2, use the update L k ← max { ρ L k , min { 10 3 L k , ˆ L k }} σ k ← max { ρσ k , min { 10 3 σ k , ˆ σ k }} Refined parameter decrease strategy L k + 1 ← max { 10 − 3 , 10 − 3 L k , ˆ L k } and σ k + 1 ← σ k when x k + 1 ← x k + α k s k σ k + 1 ← max { 10 − 3 , 10 − 3 σ k , ˆ σ k } and L k + 1 ← L k when x k + 1 ← x k + β k d k Negative Curvature US-Mexico-2018 21 / 31

Benefiting from Negative Curvature Daniel P. Robinson Johns Hopkins - PowerPoint PPT Presentation

Benefiting from Negative Curvature Daniel P. Robinson Johns Hopkins University Department of Applied Mathematics and Statistics Collaborator: Frank E. Curtis (Lehigh University) US and Mexico Workshop on Optimization and Its Applications

The conformal curvature flow Xingwang Xu Department of Mathematics, National University of

Constant mean curvature surfaces in homogeneous manifolds Beno t Daniel August 29, 2012

Curvature and Diffusion in the Heisenberg group Nicolas JUILLET IRMA, Strasbourg Paris, IHP ,

Clustering SME with maquilas in a Clustering SME with maquilas in a local context: benefiting

RIT n. a point on a curve at which the curvature which the curvature Inflection Inflection

N formalism for curvature perturbations formalism for curvature perturbations from inflation

Thin Shells Plates are naturally flat y & Curvature Based & Curvature-Based Energy

Oliviers Ricci curvature and applications Sunhyuk Lim Ohio State University lim.991@osu.edu

About Ricci curvature in the sub-Riemannian Heisenberg group Nicolas JUILLET Universit de

Prescribing Gaussian curvature on compact surfaces and geodesic curvature on its boundary David

JUST THE MATHS SLIDES NUMBER 11.3 DIFFERENTIATION APPLICATIONS 3 (Curvature) by

Curvature and torsion without negatives Geoff Cruttwell Mount Allison University CMS 2019 May

Part 3 Gauss Curvature flow Panagiota Daskalopoulos Columbia University Summer School on

The Negative Marker in Romanian Negative Concord Gianina Iord achioaia Seminar f ur

Topological implications of negative curvature for biological and social networks Bhaskar

Polyhedral 3-manifolds of non-negative Alexandrov curvature Vsevolod Shevchishin joint work with

Preconditioning of Elliptic Saddle Point Systems by Substructuring and a Penalty Approach th

Investor Meeting on Financial Results for 1H FY2019 (Nov. 20, 2019) Questions and Answers Q1.

Water Conservation Goals Rachel Shilton, P.E. Utah Division of Water Resources Steven C. Jones,

SJVIA Cash Flow Projections SJVIA Auditor Treasurer Lawrence Seymour December 08, 2016

Lecture 6: Optimization CS109B Data Science 2 Pavlos Protopapas and Mark Glickman Outline

Lecture 19 Additional Material: Optimization CS109A Introduction to Data Science Pavlos

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

^v' f it i ; {l,r ;" ^' - ru!; ;: 'L) (,h aLP Hr# 'il';Y o.r,r,t n to h, fl0 ,ff

Benefiting from Negative Curvature Daniel P. Robinson Johns Hopkins - PowerPoint PPT Presentation

Benefiting from Negative Curvature Daniel P. Robinson Johns Hopkins University Department of Applied Mathematics and Statistics Collaborator: Frank E. Curtis (Lehigh University) US and Mexico Workshop on Optimization and Its Applications

The conformal curvature flow Xingwang Xu Department of Mathematics, National University of

Constant mean curvature surfaces in homogeneous manifolds Beno t Daniel August 29, 2012

Curvature and Diffusion in the Heisenberg group Nicolas JUILLET IRMA, Strasbourg Paris, IHP ,

Clustering SME with maquilas in a Clustering SME with maquilas in a local context: benefiting

RIT n. a point on a curve at which the curvature which the curvature Inflection Inflection

N formalism for curvature perturbations formalism for curvature perturbations from inflation

Thin Shells Plates are naturally flat y &amp; Curvature Based &amp; Curvature-Based Energy

Oliviers Ricci curvature and applications Sunhyuk Lim Ohio State University lim.991@osu.edu

About Ricci curvature in the sub-Riemannian Heisenberg group Nicolas JUILLET Universit de

Prescribing Gaussian curvature on compact surfaces and geodesic curvature on its boundary David

JUST THE MATHS SLIDES NUMBER 11.3 DIFFERENTIATION APPLICATIONS 3 (Curvature) by

Curvature and torsion without negatives Geoff Cruttwell Mount Allison University CMS 2019 May

Part 3 Gauss Curvature flow Panagiota Daskalopoulos Columbia University Summer School on

The Negative Marker in Romanian Negative Concord Gianina Iord achioaia Seminar f ur

Topological implications of negative curvature for biological and social networks Bhaskar

Polyhedral 3-manifolds of non-negative Alexandrov curvature Vsevolod Shevchishin joint work with

Preconditioning of Elliptic Saddle Point Systems by Substructuring and a Penalty Approach th

Investor Meeting on Financial Results for 1H FY2019 (Nov. 20, 2019) Questions and Answers Q1.

Water Conservation Goals Rachel Shilton, P.E. Utah Division of Water Resources Steven C. Jones,

SJVIA Cash Flow Projections SJVIA Auditor Treasurer Lawrence Seymour December 08, 2016

Lecture 6: Optimization CS109B Data Science 2 Pavlos Protopapas and Mark Glickman Outline

Lecture 19 Additional Material: Optimization CS109A Introduction to Data Science Pavlos

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

^v' f it i ; {l,r ;&quot; ^' - ru!; ;: 'L) (,h aLP Hr# 'il';Y o.r,r,t n to h, fl0 ,ff

Thin Shells Plates are naturally flat y & Curvature Based & Curvature-Based Energy

^v' f it i ; {l,r ;" ^' - ru!; ;: 'L) (,h aLP Hr# 'il';Y o.r,r,t n to h, fl0 ,ff