Convergence of Cubic Regularization for Nonconvex Optimization under Łojasiewicz Property ∗ � �
Cubic-regularization (CR) �∈ℝ � + 1 𝑧 − 𝑦 � + 𝑁 2 𝑧 − 𝑦 � � 𝛼 � 𝑔 𝑦 � � CR : 𝑦 ��� ∈ argmin � 𝑧 − 𝑦 � , 𝛼𝑔 𝑦 � 𝑧 − 𝑦 � 6 Converge to 2 nd -order stationary point (Nesterov’06) nd � Escape strict-saddle points 2
Motivation and Contribution General nonconvex optimization • global sublinear convergence (Nesterov’06) Nonconvex + local geometry • gradient dominance (Nesterov’06) super-linear convergence • error bound (Yue’18) quadratic convergence • limited function class Our contributions general Łojasiewicz property 3
Lojasiewicz Property ∗ on a compact Definition (Lojasiewicz Property) Let takes a constant value � ∗ set . There exists such that for all � ∗ ∗ � where is the Lojasiewicz exponent. Satisfied by large function class: analytic function, polynomials, exp-log functions, etc ML examples: Lasso, phase retrieval, blind deconvolution, etc. 4
Convergence to 2 nd -order Stationary Point � Lojasiewicz exponent 𝜾 Convergence rate Sharp 𝜄 = +∞ 𝜈 𝑦 � � = 0 finite-step 𝜈 𝑦 � ≤ Θ exp − 2(𝜄 − 1) ��� � 3 𝜄 ∈ 2 , +∞ super-linear 𝜄 = 3 𝜈 𝑦 � ≤ Θ exp −(𝑙 − 𝑙 � ) linear 2 1, 3 � �(���) 𝜄 ∈ 𝜈 𝑦 � ≤ Θ 𝑙 − 𝑙 � Flat ���� 2 sub-linear 5
Convergence of Function Value Lojasiewicz exponent 𝜾 Convergence rate 𝑔 𝑦 � � − 𝑔 ∗ = 0 𝜄 = +∞ ��� � 3 �� 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ exp − 𝜄 ∈ 2 , +∞ � 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ exp −(𝑙 − 𝑙 � ) 𝜄 = 3 2 1, 3 � �� 𝑔 𝑦 � − 𝑔 ∗ ≤ Θ 𝜄 ∈ 𝑙 − 𝑙 � ���� 2 6
Convergence of Variable Sequence Theorem Assume satisfies the Lojasiewicz property. Then, the sequence generated by CR is absolutely-summable as � ��� � ��� Implies Cauchy-convergent (Nesterov’06): cubic-summable � 𝟒 ��� � ��� 7
Convergence of Variable Sequence Lojasiewicz exponent 𝜾 Convergence rate 𝑦 � � − 𝑦 ∗ = 0 𝜄 = +∞ ��� � 3 �(���) � 𝑦 � − 𝑦 ∗ ≤ Θ exp − + 𝜄 ∈ 2 , +∞ � � 𝜄 = 3 𝑦 � − 𝑦 ∗ ≤ Θ exp −(𝑙 − 𝑙 � ) 2 1, 3 � �(���) 𝑦 � − 𝑦 ∗ 𝜄 ∈ ≤ Θ 𝑙 − 𝑙 � ���� 2 8
Comparison with First-order Algorithm Lojasiewicz exponent 𝜾 Gradient descent Cubic-regularization 𝜄 = +∞ finite-step finite-step 𝜄 ∈ 2, +∞ linear super-linear 𝜄 ∈ [ � sub-linear super-linear � , 2) 𝜄 ∈ 1, � sub-linear Θ(𝑙 � ��� sub-linear Θ(𝑙 � ��� ��� ) �.��� ) � 9
Come to our poster Thursday 05:00 PM Room 210 & 230 AB #4 Thank You! 10
Recommend
More recommend