Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient methods Jingwei Liang Department of Applied Mathematics and Theoretical Physics
Table of contents 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions Gradient descent 4 Heavy-ball method 5 Nesterov’s optimal schemes 6 Dynamical system 7
Convexity Convex set A set S ⊂ R n is convex if for any θ ∈ [ 0 , 1 ] and two points x , y ∈ S , θ x + ( 1 − θ ) y ∈ S . Convex function Function F : R n → R is convex if dom ( F ) is convex and for all x , y ∈ dom ( F ) and θ ∈ [ 0 , 1 ] , F ( θ x + ( 1 − θ ) y ) ≤ θ F ( x ) + ( 1 − θ ) F ( y ) . Proper convex: F ( x ) < + ∞ at least for one x and F ( x ) > −∞ for all x . 1st-order condition: F is continuous differentiable F ( y ) ≥ F ( x ) + �∇ F ( x ) , y − x � , ∀ x , y ∈ dom ( F ) . 2nd-order condition: if F is twice differentiable ∇ 2 F ( x ) � 0 , ∀ x ∈ dom ( F ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Unconstrained smooth optimisation Problem Unconstrained smooth optimisation x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. Optimality condition: let x ⋆ be an minimiser of F ( x ) , then 0 = ∇ F ( x ⋆ ) . F ( x k ) ∇ F ( x ) ∇ F ( x ⋆ ) x Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Example: quadratic minimisation Quadratic programming General quadratic programming problem min 2 x T Ax + b T x + c , 1 x ∈ R n where A ∈ R n × n is symmetric positive definite, b ∈ R n and c ∈ R . Optimality condition: 0 = Ax ⋆ + b . Special Special case ase least square || Ax − b || 2 = x T ( A T A ) x − 2 ( A T b ) T x + b T b . Optimality condition A T Ax ⋆ = A T b . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Example: geometric programming Geometric programming � � m � min i = 1 exp ( a T i x + b i ) . x ∈ R n log Optimality condition: � m i x ⋆ + b i ) a i . 0 = 1 i = 1 exp ( a T i x ⋆ + b i ) � m i = 1 exp ( a T Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Outline 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions 4 Gradient descent 5 Heavy-ball method 6 Nesterov’s optimal schemes 7 Dynamical system
Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. The set of minimisers, i.e. Argmin ( F ) = { x ∈ R n : F ( x ) = min x ∈ R n F ( x ) } is non-empty. However, given x ⋆ ∈ Argmin ( F ) , no closed form expression. Iterative strategy to find one x ⋆ ∈ Argmin ( F ) : start from x 0 and generate a train of sequsence { x k } k ∈ N such taht k →∞ x k = x ⋆ ∈ Argmin ( F ) . lim Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. x k − 1 x k x k +1 x k +2 x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Descent methods Iterative scheme For each k = 1 , 2 , ... , find γ k > 0 and d k ∈ R n and then x k + 1 = x k + γ k d k , where d k is called search/descent direction. γ k is called step-size. Descent methods An algorithm is called descent method, if there holds F ( x k + 1 ) < F ( x k ) . NB : if x k ∈ Argmin ( F ) , then x k + 1 = x k ... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Conditions From convexity of F , we have F ( x k + 1 ) ≥ F ( x k ) + �∇ F ( x k ) , x k + 1 − x k � , which gives �∇ F ( x k ) , x k + 1 − x k � ≥ 0 = ⇒ F ( x k + 1 ) ≥ F ( x k ) . Since x k + 1 − x k = γ k d k , the direction d k should be such that �∇ F ( x k ) , d k � < 0 . ∇ F ( x k ) x k x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
General descent method General descent method initial : x 0 ∈ dom ( F ) ; initial repea epeat : 1. Find a descent direction d k . 2. Choose a step-size γ k : line search. 3. Update x k + 1 = x k + γ k d k . un until til : stopping criterion is satisfied. Stopping criterion: ǫ > 0 is the tolerance, Function value: F ( x k + 1 ) − F ( x k ) ≤ ǫ (can be time consuming). Sequence: || x k + 1 − x k || ≤ ǫ . Optimality condition: ||∇ F ( x k ) || ≤ ǫ . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Exact line search Exact line search Suppose that the direction d k is given. Choose γ k such that F ( x ) is minimised along the ray x k + γ d k , γ > 0: γ k = argmin γ> 0 F ( x k + γ d k ) . Useful when the minimistion problem for γ k is simple. γ k can be found analytically for special cases. F ( x k + γd k ) F ( x k + γ k d k ) γ = 0 γ k γ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Backtracking/inexact line search Backtracking line search Suppose that the direction d k is given. Choose δ ∈ ] 0 , 0 . 5 [ and β ∈ ] 0 , 1 [ , let γ = 1 while F ( x k + γ d k ) > F ( x k ) + δγ �∇ F ( x k ) , d k � : γ = βγ. Reduce F enough along the direction d k . Since d k is a descent direction �∇ F ( x k ) , d k � < 0 . Stopping criterion for backtracking: F ( x k + γ d k ) ≤ F ( x k ) + δγ �∇ F ( x k ) , d k � . When γ is small enough F ( x k + γ d k ) ≈ F ( x k ) + γ �∇ F ( x k ) , d k � < F ( x k ) + δγ �∇ F ( x k ) , d k � , whcih means backtracking eventually will stop. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Backtracking/inexact line search Backtracking line search Suppose that the direction d k is given. Choose δ ∈ ] 0 , 0 . 5 [ and β ∈ ] 0 , 1 [ , let γ = 1 while F ( x k + γ d k ) > F ( x k ) + δγ �∇ F ( x k ) , d k � : γ = βγ. F ( x k + γd k ) F ( x k ) + γ ∇ F ( x k ) T d k F ( x k ) + δγ ∇ F ( x k ) T d k γ = 0 γ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Outline 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions 4 Gradient descent 5 Heavy-ball method 6 Nesterov’s optimal schemes 7 Dynamical system
Monotonicity Monotonicity of gradient Let F : R n → R be proper convex and smooth differentiable, then �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ 0 , ∀ x , y ∈ dom ( F ) . C 1 : proper convex and smooth differentiable functions on R n . oof Owing to convexity, given x , y ∈ dom ( F ) , we have Pr Proof F ( y ) ≥ F ( x ) + �∇ F ( x ) , y − x � and F ( x ) ≥ F ( y ) + �∇ F ( y ) , x − y � . Summing them up yields �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ 0 . NB : Let F ∈ C 1 , F is convex if and only if ∇ F ( x ) is monotone. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Lipschitz continuous gradient Lipschitz continuity The gradient of F is L -Lipschitz continuous if there exists L > 0 such that ||∇ F ( x ) − ∇ F ( y ) || ≤ L || x − y || , ∀ x , y ∈ dom ( F ) . L : proper convex functions with L -Lipschitz continuous gradient on R n . C 1 If F ∈ C 1 L , then 2 || x || 2 − F ( x ) H ( x ) = L def is convex. Hint : monotonicity of ∇ H ( x ) , i.e. Hin �∇ H ( x ) − ∇ H ( y ) , x − y � = L || x − y || 2 − �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ L || x − y || 2 − L || x − y || 2 = 0 . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Descent lemma Descent lemma, quadratic upper bound Let F ∈ C 1 L , then there holds F ( y ) ≤ F ( x ) + �∇ F ( x ) , y − x � + L 2 || y − x || 2 , ∀ x , y ∈ dom ( F ) . oof Define H ( t ) = F ( x + t ( y − x )) , then Pr Proof � 1 � 1 F ( y ) − F ( x ) = H ( 1 ) − H ( 0 ) = ∇ H ( t ) d t = ( y − x ) T ∇ F ( x + t ( y − x )) d t 0 0 � 1 � 1 � ( y − x ) T � � �� ≤ ( y − x ) T ∇ F ( x ) d t + ∇ F ( x + t ( y − x )) − ∇ F ( x ) � d t 0 0 � 1 ≤ ( y − x ) T ∇ F ( x ) + || y − x ||||∇ F ( x + t ( y − x )) − ∇ F ( x ) || d t 0 � 1 ≤ ( y − x ) T ∇ F ( x ) + || y − x || tL || y − x || d t 0 = ( y − x ) T ∇ F ( x ) + L 2 || y − x || 2 . 2 || x || 2 − F ( x ) . def NB : first-order condition of convexity for H ( x ) = L Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Descent lemma: consequences Corollary L and x ⋆ ∈ Argmin ( F ) , then Let F ∈ C 1 2 L ||∇ F ( x ) || 2 ≤ F ( x ) − F ( x ⋆ ) ≤ L 2 || x − x ⋆ || 2 , ∀ x ∈ dom ( F ) . 1 oof Right-hand inequality: ∇ F ( x ⋆ ) = 0, Pr Proof F ( x ) ≤ F ( x ⋆ ) + �∇ F ( x ⋆ ) , x − x ⋆ � + L 2 || x − x ⋆ || 2 , ∀ x ∈ dom ( F ) . Left-hand inequality: � 2 || y − x || 2 � F ( x ⋆ ) ≤ min F ( x ) + �∇ F ( x ) , y − x � + L y ∈ dom ( F ) = F ( x ) − 1 2 L ||∇ F ( x ) || 2 . The corresponding y is y = x − 1 L ∇ F ( x ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019
Recommend
More recommend