Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj
Outline Gradient Descent Method Convergence Analysis Examples General Convex Functions Steepest Descent Method Euclidean and Quadratic Norms � -norm Convergence Analysis Discussion and Examples
Outline Gradient Descent Method Convergence Analysis Examples General Convex Functions Steepest Descent Method Euclidean and Quadratic Norms � -norm Convergence Analysis Discussion and Examples
General Descent Method The Algorithm Given a starting point 𝑦 ∈ dom 𝑔 Repeat 1. Determine a descent direction Δ𝑦 . 2. Line search: Choose a step size 𝑢 � 0 . 3. Update: 𝑦 � 𝑦 � 𝑢∆𝑦 . until stopping criterion is satisfied. Descent Direction � Δ𝑦 ��� � 0 𝛼𝑔 𝑦 �
Gradient Descent Method The Algorithm Given a starting point 𝑦 ∈ dom 𝑔 Repeat 1. Δ𝑦 ≔ �𝛼𝑔�𝑦� . 2. Line search: Choose step size 𝑢 via exact or backtracking line search. 3. Update: 𝑦 ≔ 𝑦 � 𝑢∆𝑦 . until stopping criterion is satisfied. Stopping Criterion �
Outline Gradient Descent Method Convergence Analysis Examples General Convex Functions Steepest Descent Method Euclidean and Quadratic Norms � -norm Convergence Analysis Discussion and Examples
Preliminary ����� ��� ��� ��� � is both strongly convex and smooth � Define as A quadratic upper bound on � � � � �
Analysis for Exact Line Search 1. Minimize Both Sides of � � � � � Left side: ����� , where ����� is the step length that minimizes Right side: is the solution � � ����� � ∗ from Both Sides 2. Subtracting � ∗ ∗ � �
Analysis for Exact Line Search is strongly convex on � � � 2𝑛 𝑔 𝑦 � 𝑞 ∗ ⇒ 𝛼𝑔 𝑦 � 4. Combining � ∗ ∗ 5. Applying it Recursively ��� ∗ � ��� ∗ ∗ as ��� coverges to
Discussions Iteration Complexity � ∗ after at most ��� ∗ iterations ��� ∗ indicates that initialization is important is a function of the condition number When is large log 1/𝑑 � � log�1 � 𝑛/𝑁� � 𝑛/𝑁
Discussions Iteration Complexity � ∗ after at most ��� ∗ ��� ∗ iterations ��� ∗ indicates that initialization is important is a function of the condition number When is large log 1/𝑑 � � log�1 � 𝑛/𝑁� � 𝑛/𝑁
Discussions Iteration Complexity � ∗ after at most ��� ∗ iterations ��� ∗ indicates that initialization is important is a function of the condition number Linear Convergence Error lies below a line on a log-linear plot of error versus iteration number
Analysis for Backtracking Line Search Backtracking Line Search given a descent direction ∆𝑦 for 𝑔 at 𝑦 ∈ 𝐞𝐩𝐧 𝑔, 𝛽 ∈ 0, 0.5 , 𝛾 ∈ 0, 1 𝑢 ≔ 1 w hile 𝑔 𝑦 � 𝑢Δ𝑦 � 𝑔 𝑦 � 𝛽𝑢𝛼𝑔 𝑦 � ∆𝑦, 𝑢 ≔ 𝛾𝑢 � for all � 𝑁 ⇒ �𝑢 � 𝑁𝑢 � 0 � 𝑢 � 1 � � 𝑢 2 2 � � 𝑁𝑢 � � 𝑢 � 𝑔 𝑦 � 𝑢 𝛼𝑔 𝑦 � 𝑔 𝛼𝑔 𝑦 � � 2
Analysis for Backtracking Line Search Backtracking Line Search given a descent direction ∆𝑦 for 𝑔 at 𝑦 ∈ 𝐞𝐩𝐧 𝑔, 𝛽 ∈ 0, 0.5 , 𝛾 ∈ 0, 1 𝑢 ≔ 1 w hile 𝑔 𝑦 � 𝑢Δ𝑦 � 𝑔 𝑦 � 𝛽𝑢𝛼𝑔 𝑦 � ∆𝑦, 𝑢 ≔ 𝛾𝑢 � for all � � 𝑢 � 𝑔 𝑦 � �𝑢/2� 𝛼𝑔 𝑦 � 𝑔 � � � 𝑔 𝑦 � 𝛽𝑢 𝛼𝑔 𝑦 �
Analysis for Backtracking Line Search 2. Backtracking Line Search Terminates Either with � � � Or with a value � � � So, � � � ∗ from Both Sides 3. Subtracting � ∗ ∗ � �
Analysis for Backtracking Line Search 4. Combining with Strong Convexity � ∗ ∗ 5. Applying it Recursively ��� ∗ � ��� ∗ ���� � ∗ with an exponent ��� converges to that depends on the condition number Linear Convergence
Outline Gradient Descent Method Convergence Analysis Examples General Convex Functions Steepest Descent Method Euclidean and Quadratic Norms � -norm Convergence Analysis Discussion and Examples
A Quadratic Problem in A Quadratic Objective Function � � � � ∗ The optimal point The optimal value is The Hessian of is constant and has eigenvalues and min�1, 𝛿� , 𝑁 � max�1, 𝛿� Condition number
A Quadratic Problem in A Quadratic Objective Function � � � � Gradient Descent Method ��� Exact line search starting at � � ��� � 𝛿 𝛿 � 1 ��� � 𝛿 � 𝛿 � 1 𝑦 � , 𝑦 � Convergence is 𝛿 � 1 𝛿 � 1 exactly linear �� �� � 𝛿 𝛿 � 1 𝛿 � 1 𝛿 � 1 𝑔 𝑦 � 𝑔�𝑦 ��� � � 2 𝛿 � 1 𝛿 � 1 � Reduced by the factor
A Quadratic Problem in Comparisons From our general analysis, the error is 1 � 𝑛 reduced by 𝑁 From the closed-form solution, the error is reduced by � � � 𝛿 � 1 1 � 𝑛/𝑁 2𝑛/𝑁 � � 1 � 𝛿 � 1 1 � 𝑛/𝑁 1 � 𝑛/𝑁 When is large, the iteration complexity differs by a factor of
A Quadratic Problem in Experiments For not far from one, convergence is rapid
A Non-Quadratic Problem in The Objective Function � � ��� � ��.� � � ��� � ��.� �� � ��.� � � Gradient descent method with backtracking line search 𝛽 � 0.1, 𝛾 � 0.7
A Non-Quadratic Problem in The Objective Function � � ��� � ��.� � � ��� � ��.� �� � ��.� � � Gradient descent method with exact line search
A Non-Quadratic Problem in Comparisons Both are linear, and exact l.s. is faster
A Problem in A Larger Problem � � � � � ��� and Gradient descent method with backtracking line search 𝛽 � 0.1, 𝛾 � 0.5 Gradient descent method with exact line search
A Problem in Comparisons Both are linear, and exact l.s. is only a bit faster
Gradient Method and Condition Number A Larger Problem � � � � � ��� Replace by �/� �/� �����/� A Family of Optimization Problems � � � � � ��� Indexed by
Gradient Method and Condition Number Number of iterations required to � ∗ �� obtain Backtracking line search with 𝛽 � 0.3 and 𝛾 � 0.7
Gradient Method and Condition Number The condition number of the Hessian ∗ at the optimum � The larger the condition number, the larger the number of iterations
Conclusions 1. The gradient method often exhibits approximately linear convergence. 2. The convergence rate depends greatly on the condition number of the Hessian, or the sublevel sets. 3. An exact line search sometimes improves the convergence of the gradient method, but the effect is not large. 4. The choice of backtracking parameters has a noticeable but not dramatic effect on the convergence.
Outline Gradient Descent Method Convergence Analysis Examples General Convex Functions Steepest Descent Method Euclidean and Quadratic Norms � -norm Convergence Analysis Discussion and Examples
General Convex Functions is convex is Lipschitz continuous 𝛼𝑔 𝑦 � � 𝐻 Gradient Descent Method Given a starting point 𝑦 ��� ∈ dom 𝑔 For 𝑙 � 1,2, … , 𝐿 do Update: 𝑦 ����� � 𝑦 ��� � 𝑢 ��� 𝛼𝑔�𝑦 ��� � End for � � ��� Return ��� �
Analysis ∗ � Define � Let ��� 𝑔 𝑦 ��� � 𝑔 𝑦 ∗ � 𝛼𝑔 𝑦 ��� � 𝑦 ��� � 𝑦 ∗ � 1 𝜃 𝑦 ��� � 𝑦 ����� � 𝑦 ��� � 𝑦 ∗ � 1 � � 𝑦 ��� � 𝑦 ∗ � � 𝑦 � � 𝑦 ��� � 𝑦 � � 𝑦 ∗ 2𝜃 � � �
Analysis ∗ � Define � Let ��� 𝑔 𝑦 ��� � 𝑔 𝑦 ∗ � 𝛼𝑔 𝑦 ��� � 𝑦 ��� � 𝑦 ∗ � 1 𝜃 𝑦 ��� � 𝑦 ����� � 𝑦 ��� � 𝑦 ∗ � 1 � � 𝜃 � � 𝑦 ��� � 𝑦 ∗ � 𝑦 � � 𝑦 ∗ 2 𝛼𝑔 𝑦 � 2𝜃 � � � � 1 � � 𝜃 � � 𝑦 ��� � 𝑦 ∗ 𝑦 � � 𝑦 ∗ 2 𝐻 � 2𝜃 � �
Recommend
More recommend