Benefiting from Negative Curvature Daniel P. Robinson Johns Hopkins University Department of Applied Mathematics and Statistics Collaborator: Frank E. Curtis (Lehigh University) US and Mexico Workshop on Optimization and Its Applications Huatulco, Mexico January 8, 2018 Negative Curvature US-Mexico-2018 1 / 31
Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 2 / 31
Motivation Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 3 / 31
Motivation Problem of interest: deterministic setting minimize f ( x ) x ∈ R n f : R n → R assumed to be twice-continuously differentiable. L will denote the Lipschitz constant for ∇ f σ will denote the Lipschitz constant for ∇ 2 f f may be nonconvex Notation: g ( x ) := ∇ f ( x ) H ( x ) := ∇ 2 f ( x ) Negative Curvature US-Mexico-2018 4 / 31
Motivation Much work has been done on convergence two second-order points: D. Goldfarb (1979) [6] - prove convergence result to second-order optimal points (unconstrained) - curvilinear search using descent direction and negative curvature direction D. Goldfarb, C. Mu, J. Wright, and C. Zhou (2017) [7] - consider equality constrained problems - prove convergence result to second-order optimal points - extend curvilinear search for unconstrained F. Facchinei and S. Lucidi (1998) [3] - consider inequality constrained problems - exact penalty function, directions of negative curvature, and line search P. Gill, V. Kungurtsev, and D. Robinson (2017) [4, 5] - consider inequality constrained problems - convergence to second-order optimal points under weak assumptions J. Moré and D. Sorensen (1979), A. Forsgren, P. Gill, and W. Murray (1995), and many more . . . None consistently perform better by using directions of negative curvature! Negative Curvature US-Mexico-2018 5 / 31
Motivation Others hope to avoid saddle-points: J. Lee, M. Simchowich, M. Jordan, and B. Recht (2016) [8] - Gradient descent converges to local minimizer almost surely. - Uses random initialization. Y. Dauphin et al. (2016) [2] - Present a saddle-free Newton method (it is a modified-Newton method) - Goal is to escape saddle points (move away when close) These (and others) try to avoid the ill-effects of negative curvature. Negative Curvature US-Mexico-2018 6 / 31
Motivation Purpose of this research: Design a method that consistently performs better by using directions of negative curvature. Do not try to avoid negative curvature. Use it! Negative Curvature US-Mexico-2018 7 / 31
Deterministic Setting Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 8 / 31
Deterministic Setting The Method Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 9 / 31
Deterministic Setting The Method Overview: Compute descent direction ( s k ) and negative curvature direction ( d k ). Predict which step will make more progress in reducing the objective f . If predicted decrease is not realized, adjust parameters. Iterate until an approximate second-order solution is obtained. Negative Curvature US-Mexico-2018 10 / 31
Deterministic Setting The Method Requirements on the descent direction s k Compute s k to satisfy − g ( x k ) T s k ≥ δ � s k � 2 � g ( x k ) � 2 � � some δ ∈ ( 0 , 1 ] Examples: s k = − g ( x k ) B k s k = − g k with B k appropriately chosen Requirements on the negative curvature direction d k Compute d k to satisfy d T k H ( x k ) d k ≤ γλ k � d k � 2 � � 2 < 0 some γ ∈ ( 0 , 1 ] g ( x k ) T d k ≤ 0 Examples: d k = ± v k with ( λ k , v k ) being the left-most eigenpair of H ( x k ) d k a sufficiently accurate estimate of ± v k Negative Curvature US-Mexico-2018 11 / 31
Deterministic Setting The Method How to use s k and d k ? Use both in a curvilinear linesearch? - Often taints good descent directions by "poorly scaled" directions of negative curvature. - No consistent performance gains! Start using d k only once � g ( x k ) � is “small"? - No consistent performance gains! - Misses areas of the space in which great decrease in f is possible. Use s k when � g ( x k ) � is big relative to | ( λ k ) − | . Otherwise, use d k ? - Better, but still inconsistent performance gains! We propose to use upper-bounding models. It works! Negative Curvature US-Mexico-2018 12 / 31
Deterministic Setting The Method Predicted decrease along descent direction s k If L k ≥ L , then � � f ( x k + α s k ) ≤ f ( x k ) − m s , k ( α ) for all α with m s , k ( α ) := − α g ( x k ) T s k − 1 2 L k α 2 � s k � 2 2 and define the quantity α k := − g ( x k ) T s k = argmax m s , k ( α ) L k � s k � 2 α ≥ 0 2 Comments m s , k ( α k ) is the best predicted decrease along s k If s k = − g ( x k ) , then α k = 1 / L k Negative Curvature US-Mexico-2018 13 / 31
Deterministic Setting The Method Predicted decrease along the negative curvature direction d k If σ k ≥ σ , then � � f ( x k + β d k ) ≤ f ( x k ) − m d , k ( β ) for all β with k H ( x k ) d k − σ k m d , k ( β ) := − β g ( x k ) T d k − 1 2 β 2 d T 6 β 3 � d k � 3 2 and define, with c k := d T k H ( x k ) d k , the quantity � � � c 2 k − 2 σ k � d k � 3 − c k + 2 g ( x k ) T d k β k := = argmax m d , k ( β ) σ k � d k � 3 β ≥ 0 2 Comments m d , k ( β k ) is the best predicted decrease along d k Negative Curvature US-Mexico-2018 14 / 31
Deterministic Setting The Method Choose the step that predicts the largest decrease in f . If m s , k ( α k ) ≥ m d , k ( β k ) , then Try the step s k If m d , k ( β k ) > m s , k ( α k ) , then Try the step d k Question: Why “Try" instead of “Use"? Answer: We do not know if L k ≥ L and σ k ≥ σ - If L k < L , then it could be the case that f ( x k + α k s k ) > f ( x k ) − m s , k ( α k ) - If σ k < σ , then it could be the case that f ( x k + β k d k ) > f ( x k ) − m d , k ( β k ) Negative Curvature US-Mexico-2018 15 / 31
Deterministic Setting The Method Dynamic Step-Size Algorithm 1: for k ∈ N do compute s k and d k satisfying the required step conditions 2: loop 3: compute α k = argmax m s , k ( α ) and β k = argmax m d , k ( β ) 4: α ≥ 0 β ≥ 0 if m s , k ( α k ) ≥ m d , k ( β k ) then 5: if f ( x k + α k s k ) ≤ f ( x k ) − m s , k ( α k ) then 6: set x k + 1 ← x k + α k s k and then exit loop 7: else 8: set L k ← ρ L k [ ρ ∈ ( 1 , ∞ ) ] 9: else 10: if f ( x k + β k d k ) ≤ f ( x k ) − m d , k ( β k ) then 11: set x k + 1 ← x k + β k d k and then exit loop 12: else 13: set σ k ← ρσ k 14: set ( L k + 1 , σ k + 1 ) ∈ ( L min , L k ] × ( σ min , σ k ] 15: Negative Curvature US-Mexico-2018 16 / 31
Deterministic Setting Convergence Results Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 17 / 31
Deterministic Setting Convergence Results Key decrease inequality: For all k ∈ N it holds that � δ 2 2 , 2 γ 3 � � g ( x k ) � 2 | ( λ k ) − | 3 f ( x k ) − f ( x k + 1 ) ≥ max . 3 σ 2 2 L k k Comments: First term in the max holds when x k + 1 = x k + α k s k . Second term in the max holds when x k + 1 = x k + β k d k . The above max holds because we choose whether to try s k or d k based on m s , k ( α k ) ≥ m d , k ( β k ) Can prove that { L k } and { σ k } remain uniformly bounded. Negative Curvature US-Mexico-2018 18 / 31
Deterministic Setting Convergence Results Theorem (Limit points satisfy second-order necessary conditions) The computed iterates satisfy k →∞ � g ( x k ) � 2 = 0 and lim inf k →∞ λ k ≥ 0 lim Theorem (Complexity result) The number of iterations, function, and derivative (i.e., gradient and Hessian) evaluations required until some iteration k ∈ N is reached with � g ( x k ) � 2 ≤ ǫ g and | ( λ k ) − | ≤ ǫ H is at most O ( max { ǫ − 2 g , ǫ − 3 H } ) Negative Curvature US-Mexico-2018 19 / 31
Deterministic Setting Numerical Results Outline Motivation 1 Deterministic Setting 2 The Method Convergence Results Numerical Results Comments Stochastic Setting 3 Negative Curvature US-Mexico-2018 20 / 31
Deterministic Setting Numerical Results Refined parameter increase strategy � � f ( x k + α k s k ) − f ( x k ) + m s , k ( α k ) L k ← L k + 2 ˆ α 2 k � s k � 2 � � σ k ← σ k + 6 f ( x k + β k d k ) − f ( x k ) + m d , k ( β k ) ˆ β 3 k � d k � 3 then, with ρ ← 2, use the update L k ← max { ρ L k , min { 10 3 L k , ˆ L k }} σ k ← max { ρσ k , min { 10 3 σ k , ˆ σ k }} Refined parameter decrease strategy L k + 1 ← max { 10 − 3 , 10 − 3 L k , ˆ L k } and σ k + 1 ← σ k when x k + 1 ← x k + α k s k σ k + 1 ← max { 10 − 3 , 10 − 3 σ k , ˆ σ k } and L k + 1 ← L k when x k + 1 ← x k + β k d k Negative Curvature US-Mexico-2018 21 / 31
Recommend
More recommend