in case you missed it who am i
play

In case you missed it - Who am I ? Name: S ebastien Gros - PowerPoint PPT Presentation

In case you missed it - Who am I ? Name: S ebastien Gros Nationality: Swiss Residence: G oteborg, Sweden Affiliation: Chalmers University of Technology Department: Signals & Systems Position: Assistant Professor Email:


  1. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  2. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  3. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  4. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  5. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  6. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  7. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 This is a full-step Newton iteration 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  8. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step, t ∈ ]0 , 1] w w ← w + t ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 This is a full-step Newton iteration Reduced steps are often needed 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  9. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 0.5 r ( w ) 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  10. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  11. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  12. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  13. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  14. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  15. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  16. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  17. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  18. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) w 0 -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  19. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  20. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! While the reduced-steps Newton iteration is stable... 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  21. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  22. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  23. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  24. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  25. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  26. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 We have d d t r ( w + t ∆ w ) t =0 = ∇ r ( w ) T ∆ w = −∇ r ( w ) T ∇ r ( w ) − T r ( w ) = − r ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  27. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 We have d d t r ( w + t ∆ w ) t =0 = ∇ r ( w ) T ∆ w = −∇ r ( w ) T ∇ r ( w ) − T r ( w ) = − r ( w ) Then � d � = − 2 � r ( w ) � 2 < 0 d t � r ( w + t ∆ w ) � 2 � � t =0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  28. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but How to select the step size t ∈ ]0 , 1] ? Globalization... Line-search : reduce t until some criteria of progression on � r � are met Trust region : confine the step ∆ w within a region where ∇ r ( w ) provides a good model of r ( w ) Filter techniques : monitor progress on specific components of r ( w ) separately ... ... ensures that progress is made in one way or another . Note: most of these techniques are specific to optimization. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  29. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  30. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  31. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  32. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  33. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  34. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  35. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  36. But still, Newton can fail... Solve r ( w ) = 0 Newton stops with r ( w ) � = 0 and ∇ r ( w ) singular 0.6 i.e. the Newton direction ∆ w given by 0.4 ∇ r ( w ) ⊤ ∆ w = − r ( w ) 0.2 r(w) is undefined... 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  37. But still, Newton can fail... Solve r ( w ) = 0 Newton stops with r ( w ) � = 0 and ∇ r ( w ) singular 0.6 i.e. the Newton direction ∆ w given by 0.4 ∇ r ( w ) ⊤ ∆ w = − r ( w ) 0.2 r(w) is undefined... 0 w -0.2 -0.4 w This is a common failure mode for Newton-based solvers when tackling very non-linear r and starting with a poor initial guess !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  38. Convergence of full-step Newton methods Newton method: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + ∆ w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  39. Convergence of full-step Newton methods Newton method: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  40. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  41. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  42. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  43. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � ≤ ω � w − w ∗ � , for � Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  44. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � M − 1 � � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  45. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) Then w k → w ∗ with the following linear-quadratic contraction in each iteration: � � κ k + ω � w k +1 − w ∗ � 2 � w k − w ∗ � � w k − w ∗ � . ≤ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  46. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) Then w k → w ∗ with the following linear-quadratic contraction in each iteration: � � κ k + ω � w k +1 − w ∗ � 2 � w k − w ∗ � � w k − w ∗ � . ≤ What about reduced steps ? Slow convergence when t < 1 (damped phase). When full steps become feasible, fast convergence to the solution. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  47. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  48. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  49. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  50. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  51. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  52. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” Newton iteration fails if ∇ r becomes singular 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  53. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” Newton iteration fails if ∇ r becomes singular Newton methods with globalization converge in two phases: damped (slow) phase when reduced steps ( t < 1) are needed, quadratic/ linear when full steps are possible. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  54. Outline 1 KKT conditions - Quick Reminder The Newton method 2 Newton on the KKT conditions 3 Sequential Quadratic Programming 4 Hessian approximation 5 Maratos effect 6 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 14 / 32

  55. Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32

  56. Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32

  57. Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 Idea: apply the Newton method on the KKT conditions, i.e. Solve... � ∇ w L ( w , λ ) � r ( w , λ ) = = 0 g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32

  58. Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 Idea: apply the Newton method on the KKT conditions, i.e. Solve... ... by iterating � ∇ w L ( w , λ ) � ∆ w � � r ( w , λ ) = = 0 ∇ r ( w , λ ) T = − r ( w , λ ) g ( w ) ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32

  59. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  60. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: ∇ 2 w L ( w , λ )∆ w + ∇ w , λ L ( w , λ )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  61. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ w , λ L ( w , λ )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  62. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  63. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w )∆ λ = −∇ Φ( w ) − ∇ g ( w ) λ ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  64. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  65. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � ∇ 2 � ∇ Φ( w ) � � � � w L ( w , λ ) ∇ g ( w ) ∆ w = − ∇ g ( w ) T 0 λ + ∆ λ g ( w ) � �� � KKT matrix (symmetric indefinite) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  66. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∇ Φ( w ) � � � � ∇ g ( w ) ∆ w = − ∇ g ( w ) T 0 λ + ∆ λ g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  67. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  68. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ ∇ w L ( w , λ ) is not needed for computing the Newton step 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  69. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ ∇ w L ( w , λ ) is not needed for computing the Newton step The updated dual variables λ + are readily provided ! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  70. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min 1 4 0 w s.t. g ( w ) = w T w − 1 = 0 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  71. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  72. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  73. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  74. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 � 2 + 2 λ -1.5 � 1 H ( w , λ ) = -2 1 4 + 2 λ -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  75. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 � 2 + 2 λ -1.5 � 1 H ( w , λ ) = -2 1 4 + 2 λ -2 -1 0 1 2 w 1 � � 2 w 1 + w 2 + 1 ∇ Φ ( w ) = w 1 + 4 w 2 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  76. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min Algorithm: Newton method 1 4 0 w Input : guess w , λ s.t. g ( w ) = w T w − 1 = 0 while �∇L� or � g � ≥ tol do Guess λ = 0, step t = 1 Compute 2 H ( w , λ ) , ∇ g ( w ) , ∇ Φ ( w ) , g ( w ) 1.5 1 Compute Newton direction 0.5 � � ∆ w � ∇ Φ � � � ∇ g H w 1 0 = − ∇ g T λ + 0 g -0.5 -1 ∆ λ = λ + − λ -1.5 -2 Compute Newton step, t ∈ ]0 , 1] -2 -1 0 1 2 w 1 w ← w + t ∆ w , λ ← λ + t ∆ λ return w , λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  77. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min Algorithm: Newton method 1 4 0 w Input : guess w , λ s.t. g ( w ) = w T w − 1 = 0 while �∇L� or � g � ≥ tol do Guess λ = 0, step t = 1 Compute 2 H ( w , λ ) , ∇ g ( w ) , ∇ Φ ( w ) , g ( w ) 1.5 1 Compute Newton direction 0.5 � � ∆ w � ∇ Φ � � � ∇ g H w 1 0 = − ∇ g T λ + 0 g -0.5 -1 ∆ λ = λ + − λ -1.5 -2 Compute Newton step, t ∈ ]0 , 1] -2 -1 0 1 2 w 1 w ← w + t ∆ w , λ ← λ + t ∆ λ return w , λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

Recommend


More recommend