Trust Region Method Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS – Universit´ a di Trento November 21 – December 14, 2011 Trust Region Method 1 / 36
The Trust Region method Outline The Trust Region method 1 The exact solution of trust region step 2 The dogleg trust region step 3 Trust Region Method 2 / 36
The Trust Region method Introduction Newton and quasi-Newton methods search a solution iteratively by choosing at each step a search direction and minimize in this direction. An alternative approach is to to find a direction and a step-length, then if the step is successful in some sense the step is accepted. Otherwise another direction and step-length is chosen. The choice of the step-length and direction is algorithm dependent but a successful approach is the one based on trust region. Trust Region Method 3 / 36
The Trust Region method Introduction Newton and quasi-Newton at each step (approximately) solve the minimization problem min m ( x k + s ) = f ( x k ) + ∇ f ( x k ) s + 1 2 s T H k s in the case H k is symmetric and positive definite (SPD). If H k is SPD the minimum is s = − H − 1 g k = ∇ f ( x k ) T k g k , and s is the quasi-Newton step. If H k = ∇ 2 f ( x k ) and is SPD, then s = − H − 1 k g k is the Newton step. Trust Region Method 4 / 36
The Trust Region method Introduction If H k is not positive definite, the search direction − H − 1 k g k may fail to be a descent direction and the previous minimization problem can have no solution. The problem is that the model m ( x k + s ) is an approximation of f ( x ) m ( x k + s ) ≈ f ( x k + s ) and this approximation is valid only in a small neighbors of x k . So that an alternative minimization problem is the following min m ( x k + s ) = f ( x k ) + ∇ f ( x k ) s + 1 2 s T H k s , Subject to � s � ≤ δ k δ k is the trust region of the model m ( x ) , i.e. the region where we trust the model is valid. Trust Region Method 5 / 36
The Trust Region method The generic trust region algorithm Algorithm (Generic trust region algorithm) x assigned; δ assigned; g ← ∇ f ( x ) T ; H ← ∇ 2 f ( x ) ; while � g � > ǫ do ← arg min � s �≤ δ m ( x + s ) = f ( x ) + g T s + 1 2 s T Hs ; s pred ← m ( x + s ) − m ( x ) ; ared ← f ( x + s ) − f ( x ) ; if ( ared / pred ) < η 1 then x ← x ; δ ← δγ 1 ; — reject step, reduce δ else x ← x + s ; — accept step, update H if ( ared / pred ) > η 2 then δ ← max { δ, γ 2 � s �} ; — enlarge δ end if end if end while Trust Region Method 6 / 36
The Trust Region method A fundamental lemma The previous algorithm is based on two keys ingredients: The ratio r = ( ared / pred ) which is the ratio of the actual 1 reduction and the predicted reduction. Enlarge or reduce the trust region δ . 2 If the ratio r is between 0 < η 1 < r < η 2 < 1 we have that the model is quite appropriate; we accept the step and do not modify the trust region. If the ratio r is small r ≤ η 1 we have that the model is not appropriate; we do not accept the step and we must reduce the trust region by a factor γ 1 < 1 If the ratio r is large r ≥ η 2 we have that the model is very appropriate; we do accept the step and we enlarge the trust region factor γ 2 > 1 The algorithm is quite insensitive to the constant η 1 and η 2 . Typical values are η 1 = 0 . 25 , η 2 = 0 . 75 , γ 1 = 0 . 5 and γ 2 = 3 . Trust Region Method 7 / 36
The Trust Region method A fundamental lemma Lemma Let f : ❘ n �→ ❘ be twice continuously differentiable, H ∈ ❘ n × n symmetric and positive definite. Then the problem min m ( x + s ) = f ( x ) + ∇ f ( x ) s + 1 2 s T Hs , Subject to � s � ≤ δ is solved by s ( µ ) . = − ( H + µ I ) − 1 g , g = ∇ f ( x ) T for the unique µ ≥ 0 such that � s ( µ ) � = δ , unless � s (0) � ≤ δ , in which case s (0) is the solution. For any µ ≥ 0 , s ( µ ) defines a descent direction for f from x . Trust Region Method 8 / 36
The Trust Region method A fundamental lemma Proof. (1 / 2) . If � s (0) � ≤ δ then s (0) is the global minimum inside the trust region. Otherwise consider the Lagrangian L ( s , µ ) = a + g T s + 1 2 s T Hs + 1 2 µ ( s T s − δ 2 ) , where a = f ( x ) and g = ∇ f ( x ) T . Then we have ∂ L s = − ( H + µ I ) − 1 g ∂ s ( s , µ ) = Hs + µ s + g = 0 ⇒ and s T s = δ 2 . Remember that if H is SPD then H + µ I is SPD for all µ ≥ 0 . Moreover the inverse of an SPD matrix is SPD. From g T s = − g T ( H + µ I ) − 1 g < 0 for all µ ≥ 0 follows that s ( µ ) is a descent direction for all µ ≥ 0 . Trust Region Method 9 / 36
The Trust Region method A fundamental lemma Proof. (2 / 2) . To prove the uniqueness consider expand the gradient g with the eigenvectors of H n � g = α i u i i =1 H is SPD so that u i can be chosen orthonormal. It follows n n α i ( H + µ I ) − 1 g = ( H + µ I ) − 1 � � α i u i = λ i + µ u i i =1 i =1 n α 2 � 2 = � ( H + µ I ) − 1 g � i � � ( λ i + µ ) 2 i =1 � is a monotonically decreasing function of � � ( H + µ I ) − 1 g � and µ . Trust Region Method 10 / 36
The Trust Region method A fundamental lemma Remark As a consequence of the previous Lemma we have: as the ray of the trust region becomes smaller as the scalar µ becomes larger. This means that the search direction become more and more oriented toward the gradient direction. as the ray of the trust region becomes larger as the scalar µ becomes smaller. This means that the search direction become more and more oriented toward the Newton direction. Thus a trust region technique not only change the size of the step-length but also its direction. This results in a more robust numerical technique. The price to pay is that the solution of the minimization is more costly than the inexact line search. Trust Region Method 11 / 36
The Trust Region method Solving the constrained minimization problem Solving the constrained minimization problem As for the line-search problem we have many alternative for solving the constrained minimization problem: We can solve accurately the constrained minimization problem. For example by an iterative method. We can approximate the solution of the constrained minimization problem. as for the line search the accurate solution of the constrained minimization problem is not paying while a good cheap approximations is normally better performing. Trust Region Method 12 / 36
The exact solution of trust region step Outline The Trust Region method 1 The exact solution of trust region step 2 The dogleg trust region step 3 Trust Region Method 13 / 36
The exact solution of trust region step The Newton approach The Newton approach (1 / 5) Consider the Lagrangian L ( s , µ ) = a + g T s + 1 2 s T Hs + 1 2 µ ( s T s − δ 2 ) , where a = f ( x ) and g = ∇ f ( x ) T . Then we can try to solve the nonlinear system � Hs + µ s + g � � 0 � ∂ L ∂ ( s , µ ) ( s , µ ) = = ( s T s − δ 2 ) / 2 0 Using Newton method we have � − 1 � Hs k + µ k s k + g � s k +1 � � s k � � H + µ I � s = − s T ( s T k s k − δ 2 ) / 2 µ k +1 µ k 0 Trust Region Method 14 / 36
The exact solution of trust region step The Newton approach The Newton approach (2 / 5) A better approach is given by solving Φ( µ ) = 0 where s ( µ ) = − ( H + µ I ) − 1 g Φ( µ ) = � s ( µ ) � − δ, and To build Newton method we need to evaluate Φ( µ ) ′ = s ( µ ) T s ( µ ) ′ s ( µ ) ′ = ( H + µ I ) − 2 g , � s ( µ ) � where to evaluate s ( µ ) ′ we differentiate the relation Hs ( µ ) ′ + µ s ( µ ) ′ + s ( µ ) = 0 Hs ( µ ) + µ s ( µ ) = g ⇒ Putting all in a Newton step we obtain � s ( µ k ) � µ k +1 = µ k − s ( µ k ) T s ( µ k ) ′ ( � s ( µ k ) � − δ ) Trust Region Method 15 / 36
The exact solution of trust region step The Newton approach The Newton approach (3 / 5) Newton step can be reorganized as follows s k = − ( H + µ I ) − 1 g s ′ k = − ( H + µ I ) − 1 s k � s T β = k s k µ k +1 = µ k − β ( β − δ ) s T k s ′ k Thus Newton step require two linear system solution per step. However the coefficient matrix is the same so that only one LU factorization, thus the cost per step is essentially due to the LU factorization. Trust Region Method 16 / 36
The exact solution of trust region step The Newton approach The Newton approach (4 / 5) Evaluating Φ( µ ) ′′ we have Φ( µ ) ′′ = � s ( µ ) � 2 + s ( µ ) T s ( µ ) ′′ + ( s ( µ ) T s ( µ ) ′ ) 2 � s ( µ ) � 2 � s ( µ ) � where s ( µ ) ′′ = 0 In fact, from ( H + µ I ) s ( µ ) ′ = s ( µ ) we have Hs ( µ ) ′′ + µ s ( µ ) ′′ + s ( µ ) ′ = s ( µ ) ′ s ( µ ) ′′ = 0 . ⇒ Then for all µ ≥ 0 we have Φ ′′ ( µ ) > 0 . Trust Region Method 17 / 36
The exact solution of trust region step The Newton approach The Newton approach (5 / 5) From Φ ′′ ( µ ) > 0 we have that Newton step underestimates µ at each step. � s ( µ ) � Φ( µ ) δ µ ⋆ µ Trust Region Method 18 / 36
Recommend
More recommend