convergence of cubic regularization for nonconvex
play

Convergence of Cubic Regularization for Nonconvex Optimization under - PowerPoint PPT Presentation

Convergence of Cubic Regularization for Nonconvex Optimization under ojasiewicz Property Cubic-regularization (CR) + 1 + 2 CR :


  1. Convergence of Cubic Regularization for Nonconvex Optimization under Łojasiewicz Property ∗ � �

  2. Cubic-regularization (CR) �∈ℝ � + 1 𝑧 − 𝑦 � + 𝑁 2 𝑧 − 𝑦 � � 𝛼 � 𝑔 𝑦 � � CR : 𝑦 ��� ∈ argmin � 𝑧 − 𝑦 � , 𝛼𝑔 𝑦 � 𝑧 − 𝑦 � 6  Converge to 2 nd -order stationary point (Nesterov’06) nd �  Escape strict-saddle points 2

  3. Motivation and Contribution  General nonconvex optimization • global sublinear convergence (Nesterov’06)  Nonconvex + local geometry • gradient dominance (Nesterov’06)  super-linear convergence • error bound (Yue’18)  quadratic convergence • limited function class  Our contributions  general Łojasiewicz property 3

  4. Lojasiewicz Property ∗ on a compact Definition (Lojasiewicz Property) Let takes a constant value � ∗ set . There exists such that for all � ∗ ∗ � where is the Lojasiewicz exponent.  Satisfied by large function class:  analytic function, polynomials, exp-log functions, etc  ML examples: Lasso, phase retrieval, blind deconvolution, etc. 4

  5. Convergence to 2 nd -order Stationary Point � Lojasiewicz exponent 𝜾 Convergence rate Sharp 𝜄 = +∞ 𝜈 𝑦 � � = 0 finite-step 𝜈 𝑦 � ≤ Θ exp − 2(𝜄 − 1) ��� � 3 𝜄 ∈ 2 , +∞ super-linear 𝜄 = 3 𝜈 𝑦 � ≤ Θ exp −(𝑙 − 𝑙 � ) linear 2 1, 3 � �(���) 𝜄 ∈ 𝜈 𝑦 � ≤ Θ 𝑙 − 𝑙 � Flat ���� 2 sub-linear 5

  6. Convergence of Function Value Lojasiewicz exponent 𝜾 Convergence rate 𝑔 𝑦 � � − 𝑔 ∗ = 0 𝜄 = +∞ ��� � 3 �� 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ exp − 𝜄 ∈ 2 , +∞ � 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ exp −(𝑙 − 𝑙 � ) 𝜄 = 3 2 1, 3 � �� 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ 𝜄 ∈ 𝑙 − 𝑙 � ���� 2 6

  7. Convergence of Variable Sequence Theorem Assume satisfies the Lojasiewicz property. Then, the sequence generated by CR is absolutely-summable as � ��� � ���  Implies Cauchy-convergent  (Nesterov’06): cubic-summable � 𝟒 ��� � ��� 7

  8. Convergence of Variable Sequence Lojasiewicz exponent 𝜾 Convergence rate 𝑦 � � − 𝑦 ∗ = 0 𝜄 = +∞ ��� � 3 �(���) � 𝑦 � − 𝑦 ∗ ≤ Θ exp − + 𝜄 ∈ 2 , +∞ � � 𝜄 = 3 𝑦 � − 𝑦 ∗ ≤ Θ exp −(𝑙 − 𝑙 � ) 2 1, 3 � �(���) 𝑦 � − 𝑦 ∗ 𝜄 ∈ ≤ Θ 𝑙 − 𝑙 � ���� 2 8

  9. Comparison with First-order Algorithm Lojasiewicz exponent 𝜾 Gradient descent Cubic-regularization 𝜄 = +∞ finite-step finite-step 𝜄 ∈ 2, +∞ linear super-linear 𝜄 ∈ [ � sub-linear super-linear � , 2) 𝜄 ∈ 1, � sub-linear Θ(𝑙 � ��� sub-linear Θ(𝑙 � ��� ��� ) �.��� ) � 9

  10. Come to our poster Thursday 05:00 PM Room 210 & 230 AB #4 Thank You! 10

Recommend


More recommend