Computational Optimization Newton’s Method 2/5/08
Newton’s Method Method for finding a zero of a function. Recall FONC ∇ = f x ( ) 0 For quadratic case: ′ ′ = + 1 g x ( ) xQx bx 2 ∇ = + = g x ( ) Qx b 0 =− Minimum must satisfy, Qx * b − ⇒ =− 1 * (unique if Q is invertible) x Q b
General nonlinear functions For non-quadratic f (twice cont. diff): Approximate by 2 nd order TSA Solve for FONC for quadratic approx. 1 ′ ′ ≈ + − ∇ + − ∇ − 2 f ( ) y f x ( ) ( y x ) f x ( ) ( y x ) f x ( )( y x ) 2 Calculate FONC ∇ = ∇ + ∇ − = 2 f ( ) y f x ( ) f x ( )( y x ) 0 Solve for y ∇ − = −∇ 2 f x ( )( y x ) f x ( ) − 1 ⎡ ⎤ = − ∇ ∇ 2 y x f x ( ) f x ( ) ⎣ ⎦ ��� ����� � Pure Newton Direction
Basic Newton’s Algorithm Start with x 0 For k =1,…,K � If x k is optimal then stop 2 ( ∇ = −∇ f x p ) f x ( ) � Solve: k k � X k+1 =x k +p
Theorem 3.5 (NW) Convergence of Pure Newton’s Let f be twice cont. diff with a Lipschitz Hessian in the neighborhood of a solution x* that satisfies the SOSC. For x 0 sufficiently close to x*, Newton’s method converges to x*. The rate of convergence of {x k } is quadratic ∇ converges quadratically to 0. f ( x ) k
Analysis of Newton’s Method Newton’s Method converges to a zero of a function very fast (quadratic convergence) Expensive both in storage and time. � Must compute and store Hessian Must solve Newton Equation � May not find a local minimum. Could find any stationary point.
Method 1: d=-H -1 g Require Matrix Vector multiplication for nxn matrix H -1 g n n n(n multiplies +(n-1) adds) 2n 2 -n FLOPS Say O(n 2 ) Also requires computing H -1 O(n 3 ) Matlab comand d=-inv(H)*g
Computing Inverse Gaussian Elimination ⎡ ⎤ ⎡ ⎤ 1 0 0 4 2 1 m u ltip ly b y 1 /4 n + 1 ⎢ ⎥ ⎢ ⎥ 0 1 0 2 5 3 m u ltip ly ro w 1 b y -2 n + 1 ⎢ ⎥ ⎢ ⎥ (n-1 times) ⎢ ⎥ ⎢ ⎥ + ⎣ ⎦ ⎣ ⎦ 0 0 1 1 3 7 ad d to ro w n 1 = + + − + = + 2 n 1 ( n 1)( n 1) n n ⎡ ⎤ ⎡ ⎤ 1 1 1 0 0 1 ⎢ ⎥ ⎢ ⎥ 4 2 4 − ⎢ ⎥ ⎢ ⎥ 1 5 1 0 0 4 rep eat ro u g h ly n tim es 2 ⎢ 2 ⎥ ⎢ ⎥ − 5 2 7 ⎢ ⎥ ⎣ 1 0 1 ⎦ ⎣ 0 ⎦ 2 4 3 0 (n )
Method 2: Hp=-g Solve by Gaussian Elimination Take about 2n 3 /3 Faster but still 0(n 3 ) Matlab command P=-H\g
Method 3: Cholesky Factorization 0(n 3 ) Factorize Matrix - H= LDL’ =L*U where L is lower diagonal and DL’ is upper diagonal matrix Why? Ax=b for X by solving LUx=b First solve Ly=b by forward elimination Then U’x=y by backward elimination Matlab command to compute cholesky factorization is R=chol(H) but only works if H is p.d. Gives factorization R’*R=H
Forward elimination for Ly=b ⎡ � ⎤ ⎡ ⎤ ⎡ ⎤ l 0 0 y b 1 1 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � � l l y b ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = 2 1 2 2 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � � � � � 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ l l l y b n 1 n 2 n n n n b = ⇒ = 1 l y b y 1 1 1 1 1 l 1 1 − b l y + = ⇒ = 2 2 1 1 l y l y b y 2 1 1 2 2 2 2 2 l 2 2 � − 1 n ∑ − b l y n i m m + + + = ⇒ = = � m 1 l y l y l y b y n 1 1 n 2 2 2 2 2 n n l n n O(n 2 ) operations
Back Substitution for Ux=y ⎡ � ⎤ ⎡ ⎤ ⎡ ⎤ u u u x y 11 12 1 n 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � 0 u u x y ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = 22 2 n 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � � � � � � ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ � ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 0 0 u x y nn n n y = ⇒ = n u x y x n n n n n l 1 1 − y u x − − + = ⇒ = ( n 1 ) n ( n 1 ) n u x u x y x − − − − − − ( n 1 ) ( n 1 ) ( n 1 ) n ( n 1 ) n ( n 1 ) n 1 u − − ( n 1 ) ( n 1 ) � n ∑ − y u x 1 m i i + + + = ⇒ = = + � m i 1 u x u y u x y x − n 1 n ( n 1 )1 2 1 1 1 n i u i i O(n 2 ) operations
Problems with Newton’s Newton’s method may converge to a local max or stationary problem Would like to make sure we decrease function at each iteration. Newton equation may not have solution or may not have unique solution
Guarantee Descent Would like to guarantee descent ∇ < direction is picked p ' f x ( ) 0 k k = − ∇ p H f x ( ) k k k Any H k p.d. works since ∇ = −∇ ∇ < p ' f x ( ) f x ( )' H f x ( ) 0 k k k k k
What if Hessian is not p.d. Δ ∇ k 2 Add diagonal matrix to f x ( ) ∇ + Δ 2 k so that f x ( ) is p.d. Modified Cholesky Factorization does this automatically.
Theorem: Superlinear Convergence of Newton Like Methods Assume f be twice cont. differentiable on open set S. 2 ( ) ∇ p.d. and Lipschitz cont on S i.e. f x ∇ −∇ ≤ − 2 2 f x ( ) f y ( ) L x y for all x,y ∈ S and some fixed finite L { } the sequence generated by ⊂ x S k + = + = ⊂ * x x p lim k x x S k 1 k k → ∞ k
Theorem 10.1(continued) { } → ∇ = * * superlinearly and ( ) 0 f x x x k if and only if ( ) − p p ( ) = k n k lim 0 where p n k →∞ p k k is the Newton direction at x k + = + x x p k 1 k k
Alternative results In NW superlinear convergence results iff − ∇ 2 ( ( *)) B f x p k k = lim 0 → ∞ p k k So we can use a quasi-newton algorithm with positive definite matrix approximating Hessian B k
Problems with Newton’s Newton’s method converge to a local max or stationary problem Pure Newton’s method may not converge at all. Only has local convergence, i.e. only starts if sufficiently close to the solution.
Stepsize problems too Newton’s has only local convergence. If too far from solution may not converge at all. Try this example starting from 1.1: − = + x x f x ( ) ln( e e ) − − x x ( e e ) = f '( ) x + − x x ( e e ) − − x x 2 ( e e ) = − > f ''( ) x 1 0 + − x x 2 ( e e )
Need adaptive stepsize Add stepsize in each iteration + = + α x x p k 1 k k k Could use exact stepsize algorithm like golden section search to solve α = + α arg min f x ( p ) α k k k But unnecessarily expensive Can do approximate linesearch like Armijo search (next lecture)
Final Newton’s Algorithm Start with x 0 For k =1,…,K � If x k is optimal then stop � Solve: 2 ( ∇ + = f x ) E LDL ' k using modified cholesky = −∇ ' ( ) LDL p f x factorization k k � Perform linesearch to determine X k+1 =x k + α k p k
Newton’s Method Summary Pure Newton � Very fast convergence (if converges) � Each iteration expensive � Must add linesearch and modified Cholesky factorization to guarantee convergence globally. � Requires Calculation/storage of Hessian
Do Lab 3
Recommend
More recommend