Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn - PowerPoint PPT Presentation

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj

Outline  Gradient Descent Method  Convergence Analysis  Examples  General Convex Functions  Steepest Descent Method  Euclidean and Quadratic Norms  � -norm  Convergence Analysis  Discussion and Examples

General Descent Method  The Algorithm Given a starting point 𝑦 ∈ dom 𝑔 Repeat 1. Determine a descent direction Δ𝑦 . 2. Line search: Choose a step size 𝑢 � 0 . 3. Update: 𝑦 � 𝑦 � 𝑢∆𝑦 . until stopping criterion is satisfied.  Descent Direction � Δ𝑦 �� 0 𝛼𝑔 𝑦 �

Gradient Descent Method  The Algorithm Given a starting point 𝑦 ∈ dom 𝑔 Repeat 1. Δ𝑦 ≔ �𝛼𝑔�𝑦� . 2. Line search: Choose step size 𝑢 via exact or backtracking line search. 3. Update: 𝑦 ≔ 𝑦 � 𝑢∆𝑦 . until stopping criterion is satisfied.  Stopping Criterion �

Preliminary ��    is both strongly convex and smooth �  Define as  A quadratic upper bound on � � � � �

Analysis for Exact Line Search 1. Minimize Both Sides of � � � � �  Left side: �� , where �� is the step length that minimizes  Right side: is the solution � � �� ∗ from Both Sides 2. Subtracting � ∗ ∗ � �

Analysis for Exact Line Search is strongly convex on � � � 2𝑛 𝑔 𝑦 � 𝑞 ∗ ⇒ 𝛼𝑔 𝑦 � 4. Combining � ∗ ∗ 5. Applying it Recursively �� ∗ � �� ∗  ∗ as ��  coverges to

Discussions  Iteration Complexity � ∗  after at most �� ∗ iterations �� ∗  indicates that initialization is important  is a function of the condition number  When is large log 1/𝑑 � � log�1 � 𝑛/𝑁� � 𝑛/𝑁

Discussions  Iteration Complexity � ∗  after at most �� ∗ �� ∗ iterations �� ∗  indicates that initialization is important  is a function of the condition number  When is large log 1/𝑑 � � log�1 � 𝑛/𝑁� � 𝑛/𝑁

Discussions  Iteration Complexity � ∗  after at most �� ∗ iterations �� ∗  indicates that initialization is important  is a function of the condition number  Linear Convergence  Error lies below a line on a log-linear plot of error versus iteration number

Analysis for Backtracking Line Search  Backtracking Line Search given a descent direction ∆𝑦 for 𝑔 at 𝑦 ∈ 𝐞𝐩𝐧 𝑔, 𝛽 ∈ 0, 0.5 , 𝛾 ∈ 0, 1 𝑢 ≔ 1 w hile 𝑔 𝑦 � 𝑢Δ𝑦 � 𝑔 𝑦 � 𝛽𝑢𝛼𝑔 𝑦 � ∆𝑦, 𝑢 ≔ 𝛾𝑢 � for all � 𝑁 ⇒ �𝑢 � 𝑁𝑢 � 0 � 𝑢 � 1 � � 𝑢 2 2 � � 𝑁𝑢 � � 𝑢 � 𝑔 𝑦 � 𝑢 𝛼𝑔 𝑦 � 𝑔 𝛼𝑔 𝑦 � � 2

Analysis for Backtracking Line Search  Backtracking Line Search given a descent direction ∆𝑦 for 𝑔 at 𝑦 ∈ 𝐞𝐩𝐧 𝑔, 𝛽 ∈ 0, 0.5 , 𝛾 ∈ 0, 1 𝑢 ≔ 1 w hile 𝑔 𝑦 � 𝑢Δ𝑦 � 𝑔 𝑦 � 𝛽𝑢𝛼𝑔 𝑦 � ∆𝑦, 𝑢 ≔ 𝛾𝑢 � for all � � 𝑢 � 𝑔 𝑦 � �𝑢/2� 𝛼𝑔 𝑦 � 𝑔 � � � 𝑔 𝑦 � 𝛽𝑢 𝛼𝑔 𝑦 � 

Analysis for Backtracking Line Search 2. Backtracking Line Search Terminates  Either with � � �  Or with a value � � �  So, � � � ∗ from Both Sides 3. Subtracting � ∗ ∗ � �

Analysis for Backtracking Line Search 4. Combining with Strong Convexity � ∗ ∗ 5. Applying it Recursively �� ∗ � �� ∗ ��  � ∗ with an exponent ��  converges to that depends on the condition number  Linear Convergence

A Quadratic Problem in  A Quadratic Objective Function � � � � ∗  The optimal point  The optimal value is  The Hessian of is constant and has eigenvalues and  min�1, 𝛿� , 𝑁 � max�1, 𝛿�  Condition number

A Quadratic Problem in  A Quadratic Objective Function � � � �  Gradient Descent Method ��  Exact line search starting at � � �� 𝛿 𝛿 � 1 �� 𝛿 � 𝛿 � 1 𝑦 � , 𝑦 � Convergence is 𝛿 � 1 𝛿 � 1 exactly linear �� 𝛿 𝛿 � 1 𝛿 � 1 𝛿 � 1 𝑔 𝑦 � 𝑔�𝑦 �� 2 𝛿 � 1 𝛿 � 1 �  Reduced by the factor

A Quadratic Problem in  Comparisons   From our general analysis, the error is 1 � 𝑛 reduced by 𝑁  From the closed-form solution, the error is reduced by � � � 𝛿 � 1 1 � 𝑛/𝑁 2𝑛/𝑁 � � 1 � 𝛿 � 1 1 � 𝑛/𝑁 1 � 𝑛/𝑁  When is large, the iteration complexity differs by a factor of

A Quadratic Problem in  Experiments  For not far from one, convergence is rapid

A Non-Quadratic Problem in  The Objective Function � � �� .� � � �� .� �� .� � �  Gradient descent method with backtracking line search  𝛽 � 0.1, 𝛾 � 0.7

A Non-Quadratic Problem in  The Objective Function � � �� .� � � �� .� �� .� � �  Gradient descent method with exact line search

A Non-Quadratic Problem in  Comparisons  Both are linear, and exact l.s. is faster

A Problem in  A Larger Problem � � � � � ��  and  Gradient descent method with backtracking line search  𝛽 � 0.1, 𝛾 � 0.5  Gradient descent method with exact line search

A Problem in  Comparisons  Both are linear, and exact l.s. is only a bit faster

Gradient Method and Condition Number  A Larger Problem � � � � � ��  Replace by �/� �/� ��/�  A Family of Optimization Problems � � � � � ��  Indexed by

Gradient Method and Condition Number  Number of iterations required to � ∗ �� obtain Backtracking line search with 𝛽 � 0.3 and 𝛾 � 0.7

Gradient Method and Condition Number  The condition number of the Hessian ∗ at the optimum � The larger the condition number, the larger the number of iterations

Conclusions 1. The gradient method often exhibits approximately linear convergence. 2. The convergence rate depends greatly on the condition number of the Hessian, or the sublevel sets. 3. An exact line search sometimes improves the convergence of the gradient method, but the effect is not large. 4. The choice of backtracking parameters has a noticeable but not dramatic effect on the convergence.

General Convex Functions  is convex  is Lipschitz continuous 𝛼𝑔 𝑦 � � 𝐻  Gradient Descent Method Given a starting point 𝑦 �� ∈ dom 𝑔 For 𝑙 � 1,2, … , 𝐿 do Update: 𝑦 �� 𝑦 �� 𝑢 �� 𝛼𝑔�𝑦 �� End for � � �� Return ��

Analysis ∗ �  Define �  Let �� 𝑔 𝑦 �� 𝑔 𝑦 ∗ � 𝛼𝑔 𝑦 �� 𝑦 �� 𝑦 ∗ � 1 𝜃 𝑦 �� 𝑦 �� 𝑦 �� 𝑦 ∗ � 1 � � 𝑦 �� 𝑦 ∗ � � 𝑦 � � 𝑦 �� 𝑦 � � 𝑦 ∗ 2𝜃 � � �

Analysis ∗ �  Define �  Let �� 𝑔 𝑦 �� 𝑔 𝑦 ∗ � 𝛼𝑔 𝑦 �� 𝑦 �� 𝑦 ∗ � 1 𝜃 𝑦 �� 𝑦 �� 𝑦 �� 𝑦 ∗ � 1 � � 𝜃 � � 𝑦 �� 𝑦 ∗ � 𝑦 � � 𝑦 ∗ 2 𝛼𝑔 𝑦 � 2𝜃 � � � � 1 � � 𝜃 � � 𝑦 �� 𝑦 ∗ 𝑦 � � 𝑦 ∗ 2 𝐻 � 2𝜃 � �

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn - PowerPoint PPT Presentation

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Gradient Descent Method Convergence Analysis Examples General Convex Functions Steepest Descent Method Euclidean and Quadratic

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical

Unconstrained minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Optimal approximation for unconstrained non-submodular minimization Marwa El Halabi Stefanie

Minimization Using Descent Information we will consider the minimization of unconstrained

Adaptive Low Complexity Algorithms for Unconstrained Minimization Carmine Di Fiore, Stefano

Unconstrained Elastic Matching Unconstrained Elastic Matching and Eigen Eigen- -Deformations

Local, Unconstrained Function Optimization COMPSCI 527 Computer Vision COMPSCI 527

Descent Algorithms for Optimizing Unconstrained Problems Techniques relevant for most (convex)

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

10. Unconstrained minimization terminology and assumptions gradient descent method

Random maps with unconstrained genus Thomas Budzinski Joint work with Nicolas Curien and Bram

A Benchmark Study of Large-scale Unconstrained Face Recognition Shengcai Liao, Zhen Lei, Dong Yi,

Unconstrained Face Recognition and Analysis S. Kevin Zhou Siemens Corporate Research, Inc.

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Algorithms for unconstrained local optimization Fabio Schoen 2008

Wiener filtering illustrations 6.011, Spring 2018 Lec 21 1 Unconstrained Wiener filter

Response Surface Methods Response surface methodology (RSM) is a combination of statistics

Polarization observables from the photoproduction of -mesons using linearly polarized photons

On general criteria for when the spectrum of a combination of random matrices depends only on the

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017 Outline

CSCI 1951-G Optimization Methods in Finance Part 06: Algorithms for Unconstrained Convex

Iterative Methods Mostly for SPD systems Iterative Linear conjugate gradient and its variants

CS475/CM375 Lecture 8: Oct 6, 2011 Iterative Methods Reading: [Saad] Chapt 4 CS475/CM375 (c) 2011

1 Gradient descent with fixed step In this section, we discuss a gradient descent method with