quasi newton methods for minimization
play

Quasi-Newton methods for minimization Lectures for PHD course on - PowerPoint PPT Presentation

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universit a di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1 / 63 Quasi Newton Method


  1. Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS – Universit´ a di Trento November 21 – December 14, 2011 Quasi-Newton methods for minimization 1 / 63

  2. Quasi Newton Method Outline Quasi Newton Method 1 The symmetric rank one update 2 The Powell-symmetric-Broyden update 3 The Davidon Fletcher and Powell rank 2 update 4 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 5 The Broyden class 6 Quasi-Newton methods for minimization 2 / 63

  3. Quasi Newton Method Algorithm (General quasi-Newton algorithm) k ← 0 ; x 0 assigned; g 0 ← ∇ f ( x 0 ) T ; H 0 ← ∇ 2 f ( x 0 ) − 1 ; while � g k � > ǫ do — compute search direction d k ← − H k g k ; Approximate arg min α> 0 f ( x k + α d k ) by linsearch; — perform step x k +1 ← x k + α k d k ; g k +1 ← ∇ f ( x k +1 ) T ; — update H k +1 � � H k +1 ← some algorithm H k , x k , x k +1 , g k , g k +1 ; k ← k + 1 ; end while Quasi-Newton methods for minimization 3 / 63

  4. The symmetric rank one update Outline Quasi Newton Method 1 The symmetric rank one update 2 The Powell-symmetric-Broyden update 3 The Davidon Fletcher and Powell rank 2 update 4 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 5 The Broyden class 6 Quasi-Newton methods for minimization 4 / 63

  5. The symmetric rank one update Let B k an approximation of the Hessian of f ( x ) . Let x k , x k +1 , g k and g k +1 points and gradients at k and k + 1 -th iterates. Using the Broyden update formula to force secant condition to B k +1 we obtain B k +1 ← B k + ( y k − B k s k ) s T k , s T k s k where s k = x k +1 − x k and y k = g k +1 − g k . By using Sherman–Morrison formula and setting H k = B − 1 we obtain k the update: ( H k y k − s k ) s T k H k +1 ← H k − H k s T k s k + s T k H k g k +1 The previous update do not maintain symmetry. In fact if H k is symmetric then H k +1 not necessarily is symmetric. Quasi-Newton methods for minimization 5 / 63

  6. The symmetric rank one update To avoid the loss of symmetry we can consider an update of the form: H k +1 ← H k + uu T Imposing the secant condition (on the inverse) we obtain H k y k + uu T y k = s k H k +1 y k = s k ⇒ from previous equality k uu T y k = y T y T k H k y k + y T k s k ⇒ � � 1 / 2 y T y T k s k − y T k u = k H k y k we obtain u = s k − H k y k s k − H k y k = � � 1 / 2 u T y k y T k s k − y T k H k y k Quasi-Newton methods for minimization 6 / 63

  7. The symmetric rank one update substituting the expression of u s k − H k y k u = � � 1 / 2 y T k s k − y T k H k y k in the update formula, we obtain H k +1 ← H k + w k w T k w k = s k − H k y k w T k y k The previous update formula is the symmetric rank one formula (SR1). To be definite the previous formula needs w T k y k � = 0 . Moreover if w T k y k < 0 and H k is positive definite then H k +1 may loss positive definitiveness. Have H k symmetric and positive definite is important for global convergence Quasi-Newton methods for minimization 7 / 63

  8. The symmetric rank one update This lemma is used in the forward theorems Lemma Let be q ( x ) = 1 2 x T Ax − b T x + c with A ∈ ❘ n × n symmetric and positive defined. Then y k = g k +1 − g k = Ax k +1 − b − Ax k + b = As k where g k = ∇ q ( x k ) T . Quasi-Newton methods for minimization 8 / 63

  9. The symmetric rank one update Theorem (property of SR1 update) Let be q ( x ) = 1 2 x T Ax − b T x + c with A ∈ ❘ n × n symmetric and positive definite. Let be x 0 and H 0 assigned. Let x k and H k produced by 1 x k +1 = x k + s k ; 2 H k +1 updated by the SR1 formula H k +1 ← H k + w k w T k w k = s k − H k y k w T k y k If s 0 , s 1 , . . . , s n − 1 are linearly independent then H n = A − 1 . Quasi-Newton methods for minimization 9 / 63

  10. The symmetric rank one update Proof. (1 / 2) . We prove by induction the hereditary property H i y j = s j . BASE: For i = 1 is exactly the secant condition of the update. INDUCTION: Suppose the relation is valid for k > 0 the we prove that it is valid for k + 1 . In fact, from the update formula H k +1 y j = H k y j + w T k y j w k w k = s k − H k y k w T k y k by the induction hypothesis for j < k and using lemma on slide 8 we have w T k y j = s T k y j − y T k H k y j = s T k y j − y T k s j = y T k Ay j − y T k Ay j = 0 so that H k +1 y j = H k y j = s j for j = 0 , 1 , . . . , k − 1 . For j = k we have H k +1 y k = s k trivially by construction of the SR1 formula. Quasi-Newton methods for minimization 10 / 63

  11. The symmetric rank one update Proof. (2 / 2) . To prove that H n = A − 1 notice that H n y j = s j , As j = y j , j = 0 , 1 , . . . , n − 1 and combining the equality H n As j = s j , j = 0 , 1 , . . . , n − 1 due to the linear independence of s i we have H n A = I i.e. H n = A − 1 . Quasi-Newton methods for minimization 11 / 63

  12. The symmetric rank one update Properties of SR1 update (1 / 2) 1 The SR1 update possesses the natural quadratic termination property (like CG). 2 SR1 satisfy the hereditary property H k y j = s j for j < k . 3 SR1 does maintain the positive definitiveness of H k if and only if w T k y k > 0 . However this condition is difficult to guarantee. 4 Sometimes w T k y k becomes very small or 0 . This results in serious numerical difficulty (roundoff) or even the algorithm is broken. We can avoid this breakdown by the following strategy Breakdown workaround for SR1 update � � � � � ≥ ǫ � � y k � (i.e. the angle between w k and y k is far � w T � w T if k y k 1 k from 90 degree), then we update with the SR1 formula. Otherwise we set H k +1 = H k . 2 Quasi-Newton methods for minimization 12 / 63

  13. The symmetric rank one update Properties of SR1 update (2 / 2) Theorem (Convergence of nonlinear SR1 update) Let f ( x ) satisfying standard assumption. Let be { x k } a sequence of iterates such that lim k →∞ x k = x ⋆ . Suppose we use the breakdown workaround for SR1 update and the steps { s k } are uniformly linearly independent. Then we have � � H k − ∇ 2 f ( x ⋆ ) − 1 � � = 0 . lim k →∞ A.R.Conn, N.I.M.Gould and P.L.Toint Convergence of quasi-Newton matrices generated by the symmetric rank one update. Mathematic of Computation 50 399–430, 1988. Quasi-Newton methods for minimization 13 / 63

  14. The Powell-symmetric-Broyden update Outline Quasi Newton Method 1 The symmetric rank one update 2 The Powell-symmetric-Broyden update 3 The Davidon Fletcher and Powell rank 2 update 4 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 5 The Broyden class 6 Quasi-Newton methods for minimization 14 / 63

  15. The Powell-symmetric-Broyden update The SR1 update, although symmetric do not have minimum property like the Broyden update for the non symmetric case. The Broyden update B k +1 = B k + ( y k − B k s k ) s T k s T k s k solve the minimization problem � B k +1 − B k � F ≤ � B − B k � F for all Bs k = y k If we solve a similar problem in the class of symmetric matrix we obtain the Powell-symmetric-Broyden (PSB) update Quasi-Newton methods for minimization 15 / 63

  16. The Powell-symmetric-Broyden update Lemma (Powell-symmetric-Broyden update) Let A ∈ ❘ n × n symmetric and s , y ∈ ❘ n with s � = 0 . Consider the set � B ∈ ❘ n × n | Bs = y , B = B T � B = if s T y � = 0 a then there exists a unique matrix B ∈ B such that � A − B � F ≤ � A − C � F for all C ∈ B moreover B has the following form B = A + ωs T + sω T − ( ω T s ) ss T ω = y − As s T s ( s T s ) 2 then B is a rank two perturbation of the matrix A . a This is true if Wolfe line search is performed Quasi-Newton methods for minimization 16 / 63

  17. The Powell-symmetric-Broyden update Proof. (1 / 11) . First of all notice that B is not empty, in fact � 1 � 1 s T yyy T ∈ B s T yyy T s = y So that the problem is not empty. Next we reformulate the problem as a constrained minimum problem: n � 1 subject to Bs = y and B = B T ( A ij − B ij ) 2 arg min 2 B ∈ ❘ n × n i,j =1 The solution is a stationary point of the Lagrangian: � g ( B , λ , M ) = 1 F + λ T ( By − s ) + 2 � A − B � 2 µ ij ( B ij − B ji ) i<j Quasi-Newton methods for minimization 17 / 63

  18. The Powell-symmetric-Broyden update Proof. (2 / 11) . taking the gradient we have ∂ g ( B , λ , B ) = A ij − B ij + λ i s j + M ij = 0 ∂ B ij where  µ ij if i < j ;  M ij = − µ ij if i > j ;  0 If i = j . The previous equality can be written in matrix form as B = A + λs T + M . Quasi-Newton methods for minimization 18 / 63

  19. The Powell-symmetric-Broyden update Proof. (3 / 11) . Imposing symmetry for B A + λs T + M = A T + sλ T + M T = A + sλ T − M solving for M we have M = sλ T − λs T 2 substituting in B we have B = A + sλ T + λs T 2 Quasi-Newton methods for minimization 19 / 63

  20. The Powell-symmetric-Broyden update Proof. (4 / 11) . Imposing s T Bs = s T y s T As + s T sλ T s + s T λs T s = s T y ⇒ 2 λ T s = ( s T ω ) / ( s T s ) where ω = y − As . Imposing Bs = y As + sλ T s + λs T s = y ⇒ 2 s T s − ( s T ω ) s λ = 2 ω ( s T s ) 2 next we compute the explicit form of B . Quasi-Newton methods for minimization 20 / 63

Recommend


More recommend