Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient - PowerPoint PPT Presentation

Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient methods Jingwei Liang Department of Applied Mathematics and Theoretical Physics

Table of contents 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions Gradient descent 4 Heavy-ball method 5 Nesterov’s optimal schemes 6 Dynamical system 7

Convexity Convex set A set S ⊂ R n is convex if for any θ ∈ [ 0 , 1 ] and two points x , y ∈ S , θ x + ( 1 − θ ) y ∈ S . Convex function Function F : R n → R is convex if dom ( F ) is convex and for all x , y ∈ dom ( F ) and θ ∈ [ 0 , 1 ] , F ( θ x + ( 1 − θ ) y ) ≤ θ F ( x ) + ( 1 − θ ) F ( y ) . Proper convex: F ( x ) < + ∞ at least for one x and F ( x ) > −∞ for all x . 1st-order condition: F is continuous differentiable F ( y ) ≥ F ( x ) + �∇ F ( x ) , y − x � , ∀ x , y ∈ dom ( F ) . 2nd-order condition: if F is twice differentiable ∇ 2 F ( x ) � 0 , ∀ x ∈ dom ( F ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Unconstrained smooth optimisation Problem Unconstrained smooth optimisation x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. Optimality condition: let x ⋆ be an minimiser of F ( x ) , then 0 = ∇ F ( x ⋆ ) . F ( x k ) ∇ F ( x ) ∇ F ( x ⋆ ) x Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Example: quadratic minimisation Quadratic programming General quadratic programming problem min 2 x T Ax + b T x + c , 1 x ∈ R n where A ∈ R n × n is symmetric positive definite, b ∈ R n and c ∈ R . Optimality condition: 0 = Ax ⋆ + b . Special Special case ase least square || Ax − b || 2 = x T ( A T A ) x − 2 ( A T b ) T x + b T b . Optimality condition A T Ax ⋆ = A T b . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Example: geometric programming Geometric programming � � m � min i = 1 exp ( a T i x + b i ) . x ∈ R n log Optimality condition: � m i x ⋆ + b i ) a i . 0 = 1 i = 1 exp ( a T i x ⋆ + b i ) � m i = 1 exp ( a T Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions 4 Gradient descent 5 Heavy-ball method 6 Nesterov’s optimal schemes 7 Dynamical system

Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. The set of minimisers, i.e. Argmin ( F ) = { x ∈ R n : F ( x ) = min x ∈ R n F ( x ) } is non-empty. However, given x ⋆ ∈ Argmin ( F ) , no closed form expression. Iterative strategy to find one x ⋆ ∈ Argmin ( F ) : start from x 0 and generate a train of sequsence { x k } k ∈ N such taht k →∞ x k = x ⋆ ∈ Argmin ( F ) . lim Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. x k − 1 x k x k +1 x k +2 x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Descent methods Iterative scheme For each k = 1 , 2 , ... , find γ k > 0 and d k ∈ R n and then x k + 1 = x k + γ k d k , where d k is called search/descent direction. γ k is called step-size. Descent methods An algorithm is called descent method, if there holds F ( x k + 1 ) < F ( x k ) . NB : if x k ∈ Argmin ( F ) , then x k + 1 = x k ... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Conditions From convexity of F , we have F ( x k + 1 ) ≥ F ( x k ) + �∇ F ( x k ) , x k + 1 − x k � , which gives �∇ F ( x k ) , x k + 1 − x k � ≥ 0 = ⇒ F ( x k + 1 ) ≥ F ( x k ) . Since x k + 1 − x k = γ k d k , the direction d k should be such that �∇ F ( x k ) , d k � < 0 . ∇ F ( x k ) x k x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

General descent method General descent method initial : x 0 ∈ dom ( F ) ; initial repea epeat : 1. Find a descent direction d k . 2. Choose a step-size γ k : line search. 3. Update x k + 1 = x k + γ k d k . un until til : stopping criterion is satisfied. Stopping criterion: ǫ > 0 is the tolerance, Function value: F ( x k + 1 ) − F ( x k ) ≤ ǫ (can be time consuming). Sequence: || x k + 1 − x k || ≤ ǫ . Optimality condition: ||∇ F ( x k ) || ≤ ǫ . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Exact line search Exact line search Suppose that the direction d k is given. Choose γ k such that F ( x ) is minimised along the ray x k + γ d k , γ > 0: γ k = argmin γ> 0 F ( x k + γ d k ) . Useful when the minimistion problem for γ k is simple. γ k can be found analytically for special cases. F ( x k + γd k ) F ( x k + γ k d k ) γ = 0 γ k γ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Backtracking/inexact line search Backtracking line search Suppose that the direction d k is given. Choose δ ∈ ] 0 , 0 . 5 [ and β ∈ ] 0 , 1 [ , let γ = 1 while F ( x k + γ d k ) > F ( x k ) + δγ �∇ F ( x k ) , d k � : γ = βγ. Reduce F enough along the direction d k . Since d k is a descent direction �∇ F ( x k ) , d k � < 0 . Stopping criterion for backtracking: F ( x k + γ d k ) ≤ F ( x k ) + δγ �∇ F ( x k ) , d k � . When γ is small enough F ( x k + γ d k ) ≈ F ( x k ) + γ �∇ F ( x k ) , d k � < F ( x k ) + δγ �∇ F ( x k ) , d k � , whcih means backtracking eventually will stop. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Backtracking/inexact line search Backtracking line search Suppose that the direction d k is given. Choose δ ∈ ] 0 , 0 . 5 [ and β ∈ ] 0 , 1 [ , let γ = 1 while F ( x k + γ d k ) > F ( x k ) + δγ �∇ F ( x k ) , d k � : γ = βγ. F ( x k + γd k ) F ( x k ) + γ ∇ F ( x k ) T d k F ( x k ) + δγ ∇ F ( x k ) T d k γ = 0 γ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions 4 Gradient descent 5 Heavy-ball method 6 Nesterov’s optimal schemes 7 Dynamical system

Monotonicity Monotonicity of gradient Let F : R n → R be proper convex and smooth differentiable, then �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ 0 , ∀ x , y ∈ dom ( F ) . C 1 : proper convex and smooth differentiable functions on R n . oof Owing to convexity, given x , y ∈ dom ( F ) , we have Pr Proof F ( y ) ≥ F ( x ) + �∇ F ( x ) , y − x � and F ( x ) ≥ F ( y ) + �∇ F ( y ) , x − y � . Summing them up yields �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ 0 . NB : Let F ∈ C 1 , F is convex if and only if ∇ F ( x ) is monotone. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Lipschitz continuous gradient Lipschitz continuity The gradient of F is L -Lipschitz continuous if there exists L > 0 such that ||∇ F ( x ) − ∇ F ( y ) || ≤ L || x − y || , ∀ x , y ∈ dom ( F ) . L : proper convex functions with L -Lipschitz continuous gradient on R n . C 1 If F ∈ C 1 L , then 2 || x || 2 − F ( x ) H ( x ) = L def is convex. Hint : monotonicity of ∇ H ( x ) , i.e. Hin �∇ H ( x ) − ∇ H ( y ) , x − y � = L || x − y || 2 − �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ L || x − y || 2 − L || x − y || 2 = 0 . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Descent lemma Descent lemma, quadratic upper bound Let F ∈ C 1 L , then there holds F ( y ) ≤ F ( x ) + �∇ F ( x ) , y − x � + L 2 || y − x || 2 , ∀ x , y ∈ dom ( F ) . oof Define H ( t ) = F ( x + t ( y − x )) , then Pr Proof � 1 � 1 F ( y ) − F ( x ) = H ( 1 ) − H ( 0 ) = ∇ H ( t ) d t = ( y − x ) T ∇ F ( x + t ( y − x )) d t 0 0 � 1 � 1 � ( y − x ) T � � �� ≤ ( y − x ) T ∇ F ( x ) d t + ∇ F ( x + t ( y − x )) − ∇ F ( x ) � d t 0 0 � 1 ≤ ( y − x ) T ∇ F ( x ) + || y − x ||||∇ F ( x + t ( y − x )) − ∇ F ( x ) || d t 0 � 1 ≤ ( y − x ) T ∇ F ( x ) + || y − x || tL || y − x || d t 0 = ( y − x ) T ∇ F ( x ) + L 2 || y − x || 2 . 2 || x || 2 − F ( x ) . def NB : first-order condition of convexity for H ( x ) = L Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Descent lemma: consequences Corollary L and x ⋆ ∈ Argmin ( F ) , then Let F ∈ C 1 2 L ||∇ F ( x ) || 2 ≤ F ( x ) − F ( x ⋆ ) ≤ L 2 || x − x ⋆ || 2 , ∀ x ∈ dom ( F ) . 1 oof Right-hand inequality: ∇ F ( x ⋆ ) = 0, Pr Proof F ( x ) ≤ F ( x ⋆ ) + �∇ F ( x ⋆ ) , x − x ⋆ � + L 2 || x − x ⋆ || 2 , ∀ x ∈ dom ( F ) . Left-hand inequality: � 2 || y − x || 2 � F ( x ⋆ ) ≤ min F ( x ) + �∇ F ( x ) , y − x � + L y ∈ dom ( F ) = F ( x ) − 1 2 L ||∇ F ( x ) || 2 . The corresponding y is y = x − 1 L ∇ F ( x ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient - PowerPoint PPT Presentation

Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient methods Jingwei Liang Department of Applied Mathematics and Theoretical Physics Table of contents 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Medicines optimisation The road to excellence Workshop Overview of meds optimisation Your

Introductory Course on Non-smooth Optimisation Lecture 04 - BackwardBackward splitting Jingwei

Introductory Course on Non-smooth Optimisation Lecture 05 - PeacemanRachford,

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Automated and Accurate Geometry Extraction and Shape Optimisation of 3D Topology Optimisation

Introduction to program optimisation Michel Schinz (based on Erik Stenmans slides) Advanced

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

NSTD Introductory Course NSTD Introductory Course New Gen III+ Reactor New Gen III+ Reactor

An adaptive backtracking strategy for non-smooth composite optimisation problems Luca Calatroni

Variability Mechanisms Variability Mechanisms Introductory Course on Variable Stars Introductory

Optimisation Constraint Problems Combinatorial Optimisation Modelling (in MiniZinc) Solving

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Introductory Statistics Refresher Dr. Julia L. Sharp Short Course on Introductory Statistics

Pressure Optimisation Introduction Why carry out Pressure Optimisation How and Who

9. Equality constraints and tradeoffs More least squares Example: moving average model

On the jump of a structure. Antonio Montalb an. U. of Chicago CiE - Heidelberg, July 2009

Session 2 of ATC/ABOC Days 2008 Session 2 of ATC/ABOC Days 2008 Doris Forkel-Wirth, SC-RP Doris

Correction of Treebank Annotation: The Case of the Arabic Treebank Mohamed Maamouri, Ann Bies,

Algorithms for Radio Networks Localization University of FreiburgTechnical Faculty Computer

ADVANCED ALGORITHMS Lecture 14: randomized algorithms 1 ANNOUNCEMENTS HW 3 out; due

Closed ideals in L ( p q ) . Andr as Zs ak Peterhouse, Cambridge (Joint work

Chapter 2 More Properties of Matrices and Matrix Arithmetic Chapter 2 Distributive and