Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32
Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32
Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32
Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32
Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32
Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32
Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 This is a full-step Newton iteration 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32
Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step, t ∈ ]0 , 1] w w ← w + t ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 This is a full-step Newton iteration Reduced steps are often needed 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 0.5 r ( w ) 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) w 0 -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! While the reduced-steps Newton iteration is stable... 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32
Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32
Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32
Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32
Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32
Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32
Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 We have d d t r ( w + t ∆ w ) t =0 = ∇ r ( w ) T ∆ w = −∇ r ( w ) T ∇ r ( w ) − T r ( w ) = − r ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32
Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 We have d d t r ( w + t ∆ w ) t =0 = ∇ r ( w ) T ∆ w = −∇ r ( w ) T ∇ r ( w ) − T r ( w ) = − r ( w ) Then � d � = − 2 � r ( w ) � 2 < 0 d t � r ( w + t ∆ w ) � 2 � � t =0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32
Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but How to select the step size t ∈ ]0 , 1] ? Globalization... Line-search : reduce t until some criteria of progression on � r � are met Trust region : confine the step ∆ w within a region where ∇ r ( w ) provides a good model of r ( w ) Filter techniques : monitor progress on specific components of r ( w ) separately ... ... ensures that progress is made in one way or another . Note: most of these techniques are specific to optimization. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32
But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
But still, Newton can fail... Solve r ( w ) = 0 Newton stops with r ( w ) � = 0 and ∇ r ( w ) singular 0.6 i.e. the Newton direction ∆ w given by 0.4 ∇ r ( w ) ⊤ ∆ w = − r ( w ) 0.2 r(w) is undefined... 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
But still, Newton can fail... Solve r ( w ) = 0 Newton stops with r ( w ) � = 0 and ∇ r ( w ) singular 0.6 i.e. the Newton direction ∆ w given by 0.4 ∇ r ( w ) ⊤ ∆ w = − r ( w ) 0.2 r(w) is undefined... 0 w -0.2 -0.4 w This is a common failure mode for Newton-based solvers when tackling very non-linear r and starting with a poor initial guess !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32
Convergence of full-step Newton methods Newton method: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + ∆ w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Convergence of full-step Newton methods Newton method: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � ≤ ω � w − w ∗ � , for � Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � M − 1 � � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) Then w k → w ∗ with the following linear-quadratic contraction in each iteration: � � κ k + ω � w k +1 − w ∗ � 2 � w k − w ∗ � � w k − w ∗ � . ≤ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) Then w k → w ∗ with the following linear-quadratic contraction in each iteration: � � κ k + ω � w k +1 − w ∗ � 2 � w k − w ∗ � � w k − w ∗ � . ≤ What about reduced steps ? Slow convergence when t < 1 (damped phase). When full steps become feasible, fast convergence to the solution. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32
Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32
Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32
Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32
Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32
Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32
Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” Newton iteration fails if ∇ r becomes singular 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32
Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” Newton iteration fails if ∇ r becomes singular Newton methods with globalization converge in two phases: damped (slow) phase when reduced steps ( t < 1) are needed, quadratic/ linear when full steps are possible. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32
Outline 1 KKT conditions - Quick Reminder The Newton method 2 Newton on the KKT conditions 3 Sequential Quadratic Programming 4 Hessian approximation 5 Maratos effect 6 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 14 / 32
Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32
Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32
Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 Idea: apply the Newton method on the KKT conditions, i.e. Solve... � ∇ w L ( w , λ ) � r ( w , λ ) = = 0 g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32
Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 Idea: apply the Newton method on the KKT conditions, i.e. Solve... ... by iterating � ∇ w L ( w , λ ) � ∆ w � � r ( w , λ ) = = 0 ∇ r ( w , λ ) T = − r ( w , λ ) g ( w ) ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: ∇ 2 w L ( w , λ )∆ w + ∇ w , λ L ( w , λ )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ w , λ L ( w , λ )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w )∆ λ = −∇ Φ( w ) − ∇ g ( w ) λ ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � ∇ 2 � ∇ Φ( w ) � � � � w L ( w , λ ) ∇ g ( w ) ∆ w = − ∇ g ( w ) T 0 λ + ∆ λ g ( w ) � �� � KKT matrix (symmetric indefinite) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∇ Φ( w ) � � � � ∇ g ( w ) ∆ w = − ∇ g ( w ) T 0 λ + ∆ λ g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ ∇ w L ( w , λ ) is not needed for computing the Newton step 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ ∇ w L ( w , λ ) is not needed for computing the Newton step The updated dual variables λ + are readily provided ! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32
Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min 1 4 0 w s.t. g ( w ) = w T w − 1 = 0 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32
Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32
Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32
Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32
Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 � 2 + 2 λ -1.5 � 1 H ( w , λ ) = -2 1 4 + 2 λ -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32
Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 � 2 + 2 λ -1.5 � 1 H ( w , λ ) = -2 1 4 + 2 λ -2 -1 0 1 2 w 1 � � 2 w 1 + w 2 + 1 ∇ Φ ( w ) = w 1 + 4 w 2 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32
Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min Algorithm: Newton method 1 4 0 w Input : guess w , λ s.t. g ( w ) = w T w − 1 = 0 while �∇L� or � g � ≥ tol do Guess λ = 0, step t = 1 Compute 2 H ( w , λ ) , ∇ g ( w ) , ∇ Φ ( w ) , g ( w ) 1.5 1 Compute Newton direction 0.5 � � ∆ w � ∇ Φ � � � ∇ g H w 1 0 = − ∇ g T λ + 0 g -0.5 -1 ∆ λ = λ + − λ -1.5 -2 Compute Newton step, t ∈ ]0 , 1] -2 -1 0 1 2 w 1 w ← w + t ∆ w , λ ← λ + t ∆ λ return w , λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32
Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min Algorithm: Newton method 1 4 0 w Input : guess w , λ s.t. g ( w ) = w T w − 1 = 0 while �∇L� or � g � ≥ tol do Guess λ = 0, step t = 1 Compute 2 H ( w , λ ) , ∇ g ( w ) , ∇ Φ ( w ) , g ( w ) 1.5 1 Compute Newton direction 0.5 � � ∆ w � ∇ Φ � � � ∇ g H w 1 0 = − ∇ g T λ + 0 g -0.5 -1 ∆ λ = λ + − λ -1.5 -2 Compute Newton step, t ∈ ]0 , 1] -2 -1 0 1 2 w 1 w ← w + t ∆ w , λ ← λ + t ∆ λ return w , λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32
Recommend
More recommend