Convex Optimization ( EE227A: UC Berkeley ) Lecture 25 (Newton, quasi-Newton) 23 Apr, 2013 ◦ Suvrit Sra
Admin ♠ Project poster presentations: Soda 306 HP Auditorium Fri May 10, 2013 4pm – 8pm ♠ HW5 due on May 02, 2013 Will be released today. 2 / 25
Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . 3 / 25
Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . 3 / 25
Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) 3 / 25
Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . 3 / 25
Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . ◮ Equation g ( x + ∆ x ) = 0 approximated by g ( x ) + g ′ ( x )∆ x = 0 ⇒ ∆ x = − g ( x ) /g ′ ( x ) . = 3 / 25
Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . ◮ Equation g ( x + ∆ x ) = 0 approximated by g ( x ) + g ′ ( x )∆ x = 0 ⇒ ∆ x = − g ( x ) /g ′ ( x ) . = ◮ If x is close to x ∗ , we can expect ∆ x ≈ ∆ x ∗ = x ∗ − x 3 / 25
Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . ◮ Equation g ( x + ∆ x ) = 0 approximated by g ( x ) + g ′ ( x )∆ x = 0 ⇒ ∆ x = − g ( x ) /g ′ ( x ) . = ◮ If x is close to x ∗ , we can expect ∆ x ≈ ∆ x ∗ = x ∗ − x ◮ Thus, we may write x ∗ ≈ x − g ( x ) g ′ ( x ) 3 / 25
Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . ◮ Equation g ( x + ∆ x ) = 0 approximated by g ( x ) + g ′ ( x )∆ x = 0 ⇒ ∆ x = − g ( x ) /g ′ ( x ) . = ◮ If x is close to x ∗ , we can expect ∆ x ≈ ∆ x ∗ = x ∗ − x ◮ Thus, we may write x ∗ ≈ x − g ( x ) g ′ ( x ) ◮ Which suggests the iterative process x k +1 ← x k − g ( x k ) g ′ ( x k ) 3 / 25
Newton method ◮ Suppose we have a system of nonlinear equations G : R n → R n . G ( x ) = 0 4 / 25
Newton method ◮ Suppose we have a system of nonlinear equations G : R n → R n . G ( x ) = 0 ◮ Again, arguing as above we arrive at the Newton system G ( x ) + G ′ ( x )∆ x = 0 , where G ′ ( x ) is the Jacobian . 4 / 25
Newton method ◮ Suppose we have a system of nonlinear equations G : R n → R n . G ( x ) = 0 ◮ Again, arguing as above we arrive at the Newton system G ( x ) + G ′ ( x )∆ x = 0 , where G ′ ( x ) is the Jacobian . ◮ Assume G ′ ( x ) is non-degenerate (invertible), we obtain x k +1 = x k − [ G ′ ( x k )] − 1 G ( x k ) . 4 / 25
Newton method ◮ Suppose we have a system of nonlinear equations G : R n → R n . G ( x ) = 0 ◮ Again, arguing as above we arrive at the Newton system G ( x ) + G ′ ( x )∆ x = 0 , where G ′ ( x ) is the Jacobian . ◮ Assume G ′ ( x ) is non-degenerate (invertible), we obtain x k +1 = x k − [ G ′ ( x k )] − 1 G ( x k ) . ◮ This is Newton’s method for solving nonlinear equations 4 / 25
Newton method f ( x ) such that x ∈ R n min 5 / 25
Newton method f ( x ) such that x ∈ R n min ∇ f ( x ) = 0 is necessary for optimality 5 / 25
Newton method f ( x ) such that x ∈ R n min ∇ f ( x ) = 0 is necessary for optimality Newton system ∇ f ( x ) + ∇ 2 f ( x )∆ x = 0 , which leads to x k +1 = x k − [ ∇ 2 f ( x k )] − 1 ∇ f ( x k ) . the Newton method for optimization 5 / 25
Newton method – remarks ◮ Newton method for equations is more general than minimizing f ( x ) by finding roots of ∇ f ( x ) = 0 6 / 25
Newton method – remarks ◮ Newton method for equations is more general than minimizing f ( x ) by finding roots of ∇ f ( x ) = 0 ◮ Reason: Not every function G : R n → R n is a derivative! Example Consider the linear system Ax − b = 0 . Unless A is symmetric, does not correspond to a derivative (Why?) 6 / 25
Newton method – remarks ◮ Newton method for equations is more general than minimizing f ( x ) by finding roots of ∇ f ( x ) = 0 ◮ Reason: Not every function G : R n → R n is a derivative! Example Consider the linear system Ax − b = 0 . Unless A is symmetric, does not correspond to a derivative (Why?) ◮ If it were a derivative, then its own derivative is a Hessian, and we know that Hessians must be symmetric, QED. 6 / 25
Newton method – remarks ◮ In general, Newton method highly nontrivial to analyze Example Consider the iteration x k +1 = x k − 1 x k , x 0 = 2 . May be viewed as iter for e x 2 / 2 = 0 (which has no real solution ) 7 / 25
Newton method – remarks ◮ In general, Newton method highly nontrivial to analyze Example Consider the iteration x k +1 = x k − 1 x k , x 0 = 2 . May be viewed as iter for e x 2 / 2 = 0 (which has no real solution ) Unknown whether this iteration generates a bounded sequence! 7 / 25
Newton method – remarks ◮ In general, Newton method highly nontrivial to analyze Example Consider the iteration x k +1 = x k − 1 x k , x 0 = 2 . May be viewed as iter for e x 2 / 2 = 0 (which has no real solution ) Unknown whether this iteration generates a bounded sequence! Newton fractals (Complex dynamics) z 3 − 2 z + 2 x 8 + 15 x 4 − 16 7 / 25
Newton method – alternative view Quadratic approximation 2 �∇ 2 f ( x k )( x − x k ) , x − x k � . φ ( x ) := f ( x ) + �∇ f ( x k ) , x − x k � + 1 8 / 25
Newton method – alternative view Quadratic approximation 2 �∇ 2 f ( x k )( x − x k ) , x − x k � . φ ( x ) := f ( x ) + �∇ f ( x k ) , x − x k � + 1 Assuming ∇ 2 f ( x k ) ≻ 0 , choose x k +1 as argmin of φ ( x ) 8 / 25
Newton method – alternative view Quadratic approximation 2 �∇ 2 f ( x k )( x − x k ) , x − x k � . φ ( x ) := f ( x ) + �∇ f ( x k ) , x − x k � + 1 Assuming ∇ 2 f ( x k ) ≻ 0 , choose x k +1 as argmin of φ ( x ) φ ′ ( x k +1 ) = ∇ f ( x k ) + ∇ 2 f ( x k )( x k +1 − x k ) = 0 . 8 / 25
Newton method – convergence ◮ Method breaks down if ∇ 2 f ( x k ) �≻ 0 ◮ Only locally convergent Example Find the root of x g ( x ) = √ 1 + x 2 . Clearly, x ∗ = 0 . 9 / 25
Newton method – convergence ◮ Method breaks down if ∇ 2 f ( x k ) �≻ 0 ◮ Only locally convergent Example Find the root of x g ( x ) = √ 1 + x 2 . Clearly, x ∗ = 0 . Exercise: Analyze behavior of Newton method for this problem. Hint: Consider the cases: | x 0 | < 1 , x 0 = ± 1 and | x 0 | > 1 . 9 / 25
Newton method – convergence ◮ Method breaks down if ∇ 2 f ( x k ) �≻ 0 ◮ Only locally convergent Example Find the root of x g ( x ) = √ 1 + x 2 . Clearly, x ∗ = 0 . Exercise: Analyze behavior of Newton method for this problem. Hint: Consider the cases: | x 0 | < 1 , x 0 = ± 1 and | x 0 | > 1 . Damped Newton method x k +1 = x k − α k [ ∇ 2 f ( x k )] − 1 ∇ f ( x k ) 9 / 25
Newton – local convergence rate ◮ Suppose method generates sequence { x k } → x ∗ 10 / 25
Newton – local convergence rate ◮ Suppose method generates sequence { x k } → x ∗ ◮ where x ∗ is a local min, i.e., ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) ≻ 0 10 / 25
Newton – local convergence rate ◮ Suppose method generates sequence { x k } → x ∗ ◮ where x ∗ is a local min, i.e., ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) ≻ 0 ◮ Let g ( x k ) ≡ ∇ f ( x k ) ; Taylor’s theorem: 0 = g ( x ∗ ) = g ( x k ) + �∇ g ( x k ) , x ∗ − x k � + o ( � x k − x ∗ � ) 10 / 25
Newton – local convergence rate ◮ Suppose method generates sequence { x k } → x ∗ ◮ where x ∗ is a local min, i.e., ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) ≻ 0 ◮ Let g ( x k ) ≡ ∇ f ( x k ) ; Taylor’s theorem: 0 = g ( x ∗ ) = g ( x k ) + �∇ g ( x k ) , x ∗ − x k � + o ( � x k − x ∗ � ) ◮ Multiply by [ ∇ g ( x k )] − 1 to obtain x k − x ∗ − [ ∇ g ( x k )] − 1 g ( x k ) = o ( � x k − x ∗ � ) 10 / 25
Recommend
More recommend