Algorithms for unconstrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for unconstrained local optimization – p.
Optimization Algorithms Most common form for optimization algorithms: Line search-based methods: Given a starting point x 0 a sequence is generated: x k +1 = x k + α k d k where d k ∈ R n : search direction, α k > 0 : step Usually first d k is chosen and than the step is obtained, often from a 1–dimensional optimization Algorithms for unconstrained local optimization – p.
Trust-region algorithms A model m ( x ) and a confidence region U ( x k ) containing x k are defined. The new iterate is chosen as the solution of the constrained optimization problem x ∈ U ( x k ) m ( x ) min The model and the confidence region are possibly updated at each iteration. Algorithms for unconstrained local optimization – p.
Speed measures Let x ⋆ : local optimum. The error in x k might be measured e.g. as e ( x k ) = � x k − x ⋆ � or e ( x k ) = | f ( x k ) − f ( x ⋆ ) | . Given { x k } → x ⋆ if ∃ q > 0 , β ∈ (0 , 1) : (for k large enough): e ( x k ) ≤ qβ k ⇒{ x k } is linearly convergent, or converges with order 1; β : convergence rate A sufficient condition for linear convergence: lim sup e ( x k +1 ) ≤ β e ( x k ) Algorithms for unconstrained local optimization – p.
super–linear convergence If for every β ∈ (0 , 1) exists q : e ( x k ) ≤ qβ k then convergence is super–linear. Sufficient condition: lim sup e ( x k +1 ) = 0 e ( x k ) Algorithms for unconstrained local optimization – p.
Higher order convergence If, given p > 1 , ∃ q > 0 , β ∈ (0 , 1) : e ( x k ) ≤ qβ ( p k ) then { x k } is said to converge with order at least p If p = 2 ⇒ quadratic convergence Sufficient condition: lim sup e ( x k +1 ) e ( x k ) p < ∞ Algorithms for unconstrained local optimization – p.
Examples 1 k converges to 0 with order one 1 (linear convergence) Algorithms for unconstrained local optimization – p.
Examples 1 k converges to 0 with order one 1 (linear convergence) 1 k 2 converges to 0 with order 1 Algorithms for unconstrained local optimization – p.
Examples 1 k converges to 0 with order one 1 (linear convergence) 1 k 2 converges to 0 with order 1 2 − k converges to 0 with order 1 Algorithms for unconstrained local optimization – p.
Examples 1 k converges to 0 with order one 1 (linear convergence) 1 k 2 converges to 0 with order 1 2 − k converges to 0 with order 1 k − k converges to 0 with order 1; convergence is super–linear Algorithms for unconstrained local optimization – p.
Examples 1 k converges to 0 with order one 1 (linear convergence) 1 k 2 converges to 0 with order 1 2 − k converges to 0 with order 1 k − k converges to 0 with order 1; convergence is super–linear 1 2 2 k converges a 0 with order 2 quadratic convergence Algorithms for unconstrained local optimization – p.
Descent directions and the gradient Let f ∈ C 1 ( R n ) , x k ∈ R n : ∇ f ( x k ) � = 0 Let d ∈ R n . If d T ∇ f ( x k ) < 0 then d is a descent direction Taylor expansion: f ( x k + αd ) − f ( x k ) = αd T ∇ f ( x k ) + o ( α ) f ( x k + αd ) − f ( x k ) = d T ∇ f ( x k ) + o (1) α Thus if α is small enough f ( x k + αd ) − f ( x k ) < 0 NB: d might be a descent direction even if d T ∇ f ( x k ) = 0 Algorithms for unconstrained local optimization – p.
Convergence of line search methods If a sequence x k +1 = x k + α k d k is generated in such a way that: L 0 = { x : f ( x ) ≤ f ( x 0 ) } is compact d k � = 0 whenever ∇ f ( x k ) � = 0 f ( x k +1 ) ≤ f ( x k ) if ∇ f ( x k ) � = 0 ∀ k then d T k lim � d k �∇ f ( x k ) = 0 k →∞ Algorithms for unconstrained local optimization – p.
if d k � = 0 then | d T k ∇ f ( x k ) | ≥ σ ( �∇ f ( x k ) � ) � d k � where σ is such that lim k →∞ σ ( t k ) = 0 ⇒ lim k →∞ t k = 0 ( σ is called a forcing function) Algorithms for unconstrained local optimization – p. 1
Then either there exists a finite index ¯ k such that ∇ f ( x ¯ k ) = 0 or otherwise x k ∈ L 0 and all of its limit points are in L 0 { f ( x k ) } admits a limit lim k →∞ ∇ f ( x k ) = 0 for every limit point ¯ x of { x k } we have ∇ f (¯ x ) = 0 Algorithms for unconstrained local optimization – p. 1
Comments on the assumptions f ( x k +1 ) ≤ f ( x k ) : most optimization methods choose d k as a descent direction. If d k is a descent direction, choosing α k “sufficiently small” ensures the validity of the assumption d T lim k →∞ � d k � ∇ f ( x k ) = 0 : given a normalized direction d k , the k scalar product d k T ∇ f ( x k ) is the directional derivative of f along d k : it is required that this goes to zero. This can be achieved through precise line searches (choosing the step so that f is minimized along d k ) | d T k ∇ f ( x k ) | ≥ σ ( �∇ f ( x k ) � ) : letting, e.g., σ ( t ) = ct , c > 0 , if � d k � d k : d T k ∇ f ( x k ) < 0 then the condition becomes d T k ∇ f ( x k ) � d k � �∇ f ( x k � ≤ − c Algorithms for unconstrained local optimization – p. 1
Recalling that d T k ∇ f ( x k ) cos θ k = � d k � �∇ f ( x k � then the condition becomes cos θ k ≤ − c that is, the angle between d k and ∇ f ( x k ) is bounded away from orthogonality. d T k ∇ f ( x k ) θ k Algorithms for unconstrained local optimization – p. 1
Gradient Algorithms General scheme: x k +1 = x k − α k D k ∇ f ( x k ) with D k ≻ 0 e α k > 0 If ∇ f ( x k ) � = 0 then d k = D k ∇ f ( x k ) is a descent direction. In fact d T k ∇ f ( x k ) = −∇ T f ( x k ) D k ∇ f ( x k ) < 0 Algorithms for unconstrained local optimization – p. 1
Steepest Descent or “gradient” method: D k := I i.e. x k +1 = x k − α k ∇ f ( x k ) . If ∇ f ( x k ) � = 0 then d k = −∇ f ( x k ) is a descent direction. Moreover, it is the steepest (w.r.t. the euclidean norm): d ∈ R n ∇ T f ( x k ) d min � d � ≤ 1 Algorithms for unconstrained local optimization – p. 1
∇ f ( x k ) Algorithms for unconstrained local optimization – p. 1
. . . d ∈ R n ∇ T f ( x k ) d min √ d T d ≤ 1 KKT conditions: In the interior ⇒∇ T f ( x k ) = 0 ; if the constraint is active ⇒ ∇ f ( x k ) + λ d � d � = 0 √ d T d = 1 λ ≥ 0 ⇒ d = − ∇ f ( x k ) �∇ f ( x k ) � . Algorithms for unconstrained local optimization – p. 1
Newton’s method � − 1 ∇ 2 f ( x k ) � D k := − Motivation: Taylor expansion of f : f ( x ) ≈ f ( x k ) + ∇ T f ( x k )( x − x k ) + 1 2( x − x k ) T ∇ 2 f ( x k )( x − x k ) Minimizing the approximation: ∇ f ( x k ) + ∇ 2 f ( x k )( x − x k ) = 0 If the hessian is non singular ⇒ � − 1 ∇ f ( x k ) ∇ 2 f ( x k ) � x = x k − Algorithms for unconstrained local optimization – p. 1
Step choice Given d k , how to choose α k so that x k +1 = x k + α k d k ? “optimal” choice (one-dimensional optimization): α k = arg min α ≥ 0 f ( x k + αd k ) . Analytical expression of the optimal step is available only in few cases. E.g. if f ( x ) = 1 2 x T Qx + c T x with Q ≻ 0 . Then f ( x k + αd k ) = 1 2( x k + αd k ) T Q ( x k + αd k ) + c T ( x k + αd k ) = 1 2 α 2 d T k Qd k + α ( Qx k + c ) T d k + β where β does not depend on α . Algorithms for unconstrained local optimization – p. 1
Minimizing w.r.t. α : αd T k Qd k + ( Qx k + c ) T d k = 0 ⇒ α = − ( Qx k + c ) T d k d T k Qd k = − d T k ∇ f ( x k ) d T k ∇ 2 f ( x k ) d k E.g., in steepest descent: �∇ f ( x k ) � 2 α k = ∇ T f ( x k ) ∇ 2 f ( x k ) ∇ f ( x k ) Algorithms for unconstrained local optimization – p. 2
Approximate step size Rules for choosing a step-size (from the sufficient condition for convergence): f ( x k +1 ) < f ( x k ) d T lim k →∞ � d k � ∇ f ( x k ) = 0 k Often it is also required that � x k +1 − x k � → 0 d T K ∇ f ( x k + α k d k ) → 0 In general it is important to insure a sufficient reduction of f and a sufficiently large step x k +1 − x k Algorithms for unconstrained local optimization – p. 2
Avoid too large steps ✉ ✉ ✉ ✉ Algorithms for unconstrained local optimization – p. 2
Avoid too small steps ✉ ✉ ✉ ✉ ✉ Algorithms for unconstrained local optimization – p. 2
Armijo’s rule Input : δ ∈ (0 , 1) , γ ∈ (0 , 1 / 2) , ∆ k > 0 α := ∆ k ; while ( f ( x k + αd k ) > f ( x k ) + γαd T k ∇ f ( x k ) ) do α := δα ; end return α Typical values : δ ∈ [0 . 1 , 0 . 5] , γ ∈ [10 − 4 , 10 − 3 ] . On exit the returned step is such that f ( x k + αd k ) ≤ f ( x k ) + γαd T k ∇ f ( x k ) Algorithms for unconstrained local optimization – p. 2
acceptable steps α γαd T k ∇ f ( x k ) αd T k ∇ f ( x k ) Algorithms for unconstrained local optimization – p. 2
Line search in practice How to choose the initial step size ∆ k ? Let φ ( α ) = f ( x k + αd k ) . A possibility is to choose ∆ k = α ⋆ , the minimizer of a quadratic approximation to φ ( · ) . Example: q ( α ) = c 0 + c 1 α + 1 2 c 2 α 2 q (0) = c 0 := f ( x k ) q ′ (0) = c 1 := d T k ∇ f ( x k ) Then α ⋆ = − c 1 /c 2 . Algorithms for unconstrained local optimization – p. 2
Recommend
More recommend