unconstrained minimization ii
play

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn - PowerPoint PPT Presentation

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Gradient Descent Method Convergence Analysis Examples General Convex Functions Steepest Descent Method Euclidean and Quadratic


  1. Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj

  2. Outline  Gradient Descent Method  Convergence Analysis  Examples  General Convex Functions  Steepest Descent Method  Euclidean and Quadratic Norms  � -norm  Convergence Analysis  Discussion and Examples

  3. Outline  Gradient Descent Method  Convergence Analysis  Examples  General Convex Functions  Steepest Descent Method  Euclidean and Quadratic Norms  � -norm  Convergence Analysis  Discussion and Examples

  4. General Descent Method  The Algorithm Given a starting point 𝑦 ∈ dom 𝑔 Repeat 1. Determine a descent direction Δ𝑦 . 2. Line search: Choose a step size 𝑢 � 0 . 3. Update: 𝑦 � 𝑦 � 𝑢∆𝑦 . until stopping criterion is satisfied.  Descent Direction � Δ𝑦 ��� � 0 𝛼𝑔 𝑦 �

  5. Gradient Descent Method  The Algorithm Given a starting point 𝑦 ∈ dom 𝑔 Repeat 1. Δ𝑦 ≔ �𝛼𝑔�𝑦� . 2. Line search: Choose step size 𝑢 via exact or backtracking line search. 3. Update: 𝑦 ≔ 𝑦 � 𝑢∆𝑦 . until stopping criterion is satisfied.  Stopping Criterion �

  6. Outline  Gradient Descent Method  Convergence Analysis  Examples  General Convex Functions  Steepest Descent Method  Euclidean and Quadratic Norms  � -norm  Convergence Analysis  Discussion and Examples

  7. Preliminary ����� ��� ��� ��� �    is both strongly convex and smooth �  Define as  A quadratic upper bound on � � � � �

  8. Analysis for Exact Line Search 1. Minimize Both Sides of � � � � �  Left side: ����� , where ����� is the step length that minimizes  Right side: is the solution � � ����� � ∗ from Both Sides 2. Subtracting � ∗ ∗ � �

  9. Analysis for Exact Line Search is strongly convex on � � � 2𝑛 𝑔 𝑦 � 𝑞 ∗ ⇒ 𝛼𝑔 𝑦 � 4. Combining � ∗ ∗ 5. Applying it Recursively ��� ∗ � ��� ∗  ∗ as ���  coverges to

  10. Discussions  Iteration Complexity � ∗  after at most ��� ∗ iterations ��� ∗  indicates that initialization is important  is a function of the condition number  When is large log 1/𝑑 � � log�1 � 𝑛/𝑁� � 𝑛/𝑁

  11. Discussions  Iteration Complexity � ∗  after at most ��� ∗ ��� ∗ iterations ��� ∗  indicates that initialization is important  is a function of the condition number  When is large log 1/𝑑 � � log�1 � 𝑛/𝑁� � 𝑛/𝑁

  12. Discussions  Iteration Complexity � ∗  after at most ��� ∗ iterations ��� ∗  indicates that initialization is important  is a function of the condition number  Linear Convergence  Error lies below a line on a log-linear plot of error versus iteration number

  13. Analysis for Backtracking Line Search  Backtracking Line Search given a descent direction ∆𝑦 for 𝑔 at 𝑦 ∈ 𝐞𝐩𝐧 𝑔, 𝛽 ∈ 0, 0.5 , 𝛾 ∈ 0, 1 𝑢 ≔ 1 w hile 𝑔 𝑦 � 𝑢Δ𝑦 � 𝑔 𝑦 � 𝛽𝑢𝛼𝑔 𝑦 � ∆𝑦, 𝑢 ≔ 𝛾𝑢 � for all � 𝑁 ⇒ �𝑢 � 𝑁𝑢 � 0 � 𝑢 � 1 � � 𝑢 2 2 � � 𝑁𝑢 � � 𝑢 � 𝑔 𝑦 � 𝑢 𝛼𝑔 𝑦 � 𝑔 𝛼𝑔 𝑦 � � 2

  14. Analysis for Backtracking Line Search  Backtracking Line Search given a descent direction ∆𝑦 for 𝑔 at 𝑦 ∈ 𝐞𝐩𝐧 𝑔, 𝛽 ∈ 0, 0.5 , 𝛾 ∈ 0, 1 𝑢 ≔ 1 w hile 𝑔 𝑦 � 𝑢Δ𝑦 � 𝑔 𝑦 � 𝛽𝑢𝛼𝑔 𝑦 � ∆𝑦, 𝑢 ≔ 𝛾𝑢 � for all � � 𝑢 � 𝑔 𝑦 � �𝑢/2� 𝛼𝑔 𝑦 � 𝑔 � � � 𝑔 𝑦 � 𝛽𝑢 𝛼𝑔 𝑦 � 

  15. Analysis for Backtracking Line Search 2. Backtracking Line Search Terminates  Either with � � �  Or with a value � � �  So, � � � ∗ from Both Sides 3. Subtracting � ∗ ∗ � �

  16. Analysis for Backtracking Line Search 4. Combining with Strong Convexity � ∗ ∗ 5. Applying it Recursively ��� ∗ � ��� ∗ ����  � ∗ with an exponent ���  converges to that depends on the condition number  Linear Convergence

  17. Outline  Gradient Descent Method  Convergence Analysis  Examples  General Convex Functions  Steepest Descent Method  Euclidean and Quadratic Norms  � -norm  Convergence Analysis  Discussion and Examples

  18. A Quadratic Problem in  A Quadratic Objective Function � � � � ∗  The optimal point  The optimal value is  The Hessian of is constant and has eigenvalues and  min�1, 𝛿� , 𝑁 � max�1, 𝛿�  Condition number

  19. A Quadratic Problem in  A Quadratic Objective Function � � � �  Gradient Descent Method ���  Exact line search starting at � � ��� � 𝛿 𝛿 � 1 ��� � 𝛿 � 𝛿 � 1 𝑦 � , 𝑦 � Convergence is 𝛿 � 1 𝛿 � 1 exactly linear �� �� � 𝛿 𝛿 � 1 𝛿 � 1 𝛿 � 1 𝑔 𝑦 � 𝑔�𝑦 ��� � � 2 𝛿 � 1 𝛿 � 1 �  Reduced by the factor

  20. A Quadratic Problem in  Comparisons   From our general analysis, the error is 1 � 𝑛 reduced by 𝑁  From the closed-form solution, the error is reduced by � � � 𝛿 � 1 1 � 𝑛/𝑁 2𝑛/𝑁 � � 1 � 𝛿 � 1 1 � 𝑛/𝑁 1 � 𝑛/𝑁  When is large, the iteration complexity differs by a factor of

  21. A Quadratic Problem in  Experiments  For not far from one, convergence is rapid

  22. A Non-Quadratic Problem in  The Objective Function � � ��� � ��.� � � ��� � ��.� �� � ��.� � �  Gradient descent method with backtracking line search  𝛽 � 0.1, 𝛾 � 0.7

  23. A Non-Quadratic Problem in  The Objective Function � � ��� � ��.� � � ��� � ��.� �� � ��.� � �  Gradient descent method with exact line search

  24. A Non-Quadratic Problem in  Comparisons  Both are linear, and exact l.s. is faster

  25. A Problem in  A Larger Problem � � � � � ���  and  Gradient descent method with backtracking line search  𝛽 � 0.1, 𝛾 � 0.5  Gradient descent method with exact line search

  26. A Problem in  Comparisons  Both are linear, and exact l.s. is only a bit faster

  27. Gradient Method and Condition Number  A Larger Problem � � � � � ���  Replace by �/� �/� �����/�  A Family of Optimization Problems � � � � � ���  Indexed by

  28. Gradient Method and Condition Number  Number of iterations required to � ∗ �� obtain Backtracking line search with 𝛽 � 0.3 and 𝛾 � 0.7

  29. Gradient Method and Condition Number  The condition number of the Hessian ∗ at the optimum � The larger the condition number, the larger the number of iterations

  30. Conclusions 1. The gradient method often exhibits approximately linear convergence. 2. The convergence rate depends greatly on the condition number of the Hessian, or the sublevel sets. 3. An exact line search sometimes improves the convergence of the gradient method, but the effect is not large. 4. The choice of backtracking parameters has a noticeable but not dramatic effect on the convergence.

  31. Outline  Gradient Descent Method  Convergence Analysis  Examples  General Convex Functions  Steepest Descent Method  Euclidean and Quadratic Norms  � -norm  Convergence Analysis  Discussion and Examples

  32. General Convex Functions  is convex  is Lipschitz continuous 𝛼𝑔 𝑦 � � 𝐻  Gradient Descent Method Given a starting point 𝑦 ��� ∈ dom 𝑔 For 𝑙 � 1,2, … , 𝐿 do Update: 𝑦 ����� � 𝑦 ��� � 𝑢 ��� 𝛼𝑔�𝑦 ��� � End for � � ��� Return ��� �

  33. Analysis ∗ �  Define �  Let ��� 𝑔 𝑦 ��� � 𝑔 𝑦 ∗ � 𝛼𝑔 𝑦 ��� � 𝑦 ��� � 𝑦 ∗ � 1 𝜃 𝑦 ��� � 𝑦 ����� � 𝑦 ��� � 𝑦 ∗ � 1 � � 𝑦 ��� � 𝑦 ∗ � � 𝑦 � � 𝑦 ��� � 𝑦 � � 𝑦 ∗ 2𝜃 � � �

  34. Analysis ∗ �  Define �  Let ��� 𝑔 𝑦 ��� � 𝑔 𝑦 ∗ � 𝛼𝑔 𝑦 ��� � 𝑦 ��� � 𝑦 ∗ � 1 𝜃 𝑦 ��� � 𝑦 ����� � 𝑦 ��� � 𝑦 ∗ � 1 � � 𝜃 � � 𝑦 ��� � 𝑦 ∗ � 𝑦 � � 𝑦 ∗ 2 𝛼𝑔 𝑦 � 2𝜃 � � � � 1 � � 𝜃 � � 𝑦 ��� � 𝑦 ∗ 𝑦 � � 𝑦 ∗ 2 𝐻 � 2𝜃 � �

Recommend


More recommend