MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department - PowerPoint PPT Presentation

MATH 4211/6211 – Optimization Newton’s Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University 0 Xiaojing Ye, Math & Stat, Georgia State University

Newton’s method • Improve gradient method by using second-order (Hessian) information. • Approximate f at x ( k ) locally by a quadratic function, and use the minimizer of the quadratic function as x ( k +1) . • The Newton’s method resolves to iterating x ( k +1) = x ( k ) − ( H ( k ) ) − 1 g ( k ) where g ( k ) = ∇ f ( x ( k ) ) and H ( k ) = ∇ 2 f ( x ( k ) ) . 1 Xiaojing Ye, Math & Stat, Georgia State University

The Newton’s (or Newton-Raphson) method executes the two steps below in each iteration: • Step 1: Solve d ( k ) from H ( k ) d ( k ) = − g ( k ) ; • Step 2: Update x ( k +1) = x ( k ) + d ( k ) . Therefore the key is solving a linear system in Step 1 in every iteration. 2 Xiaojing Ye, Math & Stat, Georgia State University

• Pros : – very fast convergence near solution x ∗ (more later). • Cons : – not a descent method; – Hessian may not be invertible; – may diverge if initial guess is bad. We will see how fast Newton’s method is, and how to remedy the issues. 3 Xiaojing Ye, Math & Stat, Georgia State University

Let us first see what happens when applying Newton’s method to minimize the quadratic functions with Q ≻ 0 : f ( x ) = 1 2 x ⊤ Qx − b ⊤ x We know that ∇ 2 f ( x ) = Q ∇ f ( x ) = Qx − b and In addition, the unique minimizer is x ∗ = Q − 1 b . Therefore, given any initial x (0) , we have x (1) = x (0) − ( H (0) ) − 1 g (0) = x (0) − ( Q ) − 1 ( Qx (0) − b ) = Q − 1 b = x ∗ which means the Newton’s method converges in 1 iteration. 4 Xiaojing Ye, Math & Stat, Georgia State University

Convergence of the Newton’s method for general case. Theorem . Suppose f ∈ C 3 ( R n ; R ) , and ∃ x ∗ ∈ R n such that ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) is invertible. Then for all x (0) sufficiently close to x ∗ , the Newton’s method is well-defined for all k , and x ( k ) → x ∗ with order at least 2. Proof . Since f ∈ C 3 and ∇ 2 f ( x ∗ ) is invertible, we know ∃ r, c 1 , c 2 > 0 , such that ∀ x ∈ B ( x ∗ ; r ) , there are • �∇ f ( x ∗ ) − ∇ f ( x ) − ∇ 2 f ( x )( x ∗ − x ) � ≤ c 1 � x ∗ − x � 2 ; • ∇ 2 f ( x ) is invertible; • � ( ∇ 2 f ( x )) − 1 � ≤ c 2 . 5 Xiaojing Ye, Math & Stat, Georgia State University

c 1 c 2 , 1 − ) (here 1 − means any number slightly 1 Proof (cont) . Let ε = min( r, smaller than 1 ). If x ( k ) ∈ B ( x ∗ ; ε ) , then � x ( k +1) − x ∗ � = � x ( k ) − ( H ( k ) ) − 1 g ( k ) − x ∗ � = � ( H ( k ) ) − 1 ( H ( k ) ( x ( k ) − x ∗ ) − g ( k ) ) � ≤ � ( H ( k ) ) − 1 �� H ( k ) ( x ( k ) − x ∗ ) − g ( k ) � ≤ � ( H ( k ) ) − 1 �� 0 − g ( k ) − H ( k ) ( x ∗ − x ( k ) ) � ≤ c 1 c 2 � x ( k ) − x ∗ � 2 ≤ � x ( k ) − x ∗ � ≤ ε which implies x ( k +1) ∈ B ( x ∗ ; ε ) � x ( k +1) − x ∗ � ≤ c 1 c 2 � x ( k ) − x ∗ � 2 and for all k by induction. This implies the convergence is of order at least 2 . 6 Xiaojing Ye, Math & Stat, Georgia State University

Now we consider modifications to overcome the issues of Newton’s method. Issue #1 : d ( k ) = − ( H ( k ) ) − 1 g ( k ) may not be a descent direction. Theorem . If g ( k ) � = 0 and H ( k ) ≻ 0 , then d ( k ) is a descent direction. Proof . Let d ( k ) = − ( H ( k ) ) − 1 g ( k ) , and denote φ ( α ) = f ( x ( k ) + α d ( k ) ) . Then φ (0) = f ( x ( k ) ) , and φ ′ (0) = ∇ f ( x ( k ) ) ⊤ d ( k ) = − g ( k ) ( H ( k ) ) − 1 g ( k ) < 0 Therefore, ∃ ¯ α > 0 such that φ ( α ) < φ (0) , i.e., f ( x ( k ) + α d ( k ) ) < f ( x ( k ) ) α ) . Therefore d ( k ) is a descent direction. for all α ∈ (0 , ¯ 7 Xiaojing Ye, Math & Stat, Georgia State University

Issue #2 : H ( k ) may not be positive definite (or invertible). Observation . Suppose H is symmetric, then it has eigenvalue decomposition H = U ⊤ Λ U for some orthogonal U and Λ = diag( λ 1 , . . . , λ n ) , where λ 1 ≥ · · · ≥ λ n . Let µ > max(0 , − λ n ) , then λ i + µ > 0 for all i . Then H + µ I = U ⊤ ( Λ + µ I ) U ≻ 0 since all eigenvalues λ i + µ > 0 . 8 Xiaojing Ye, Math & Stat, Georgia State University

Levenberg-Marquardt’s modification of Newton’s method . Replace H ( k ) by H ( k ) + µ k I for sufficiently large µ k > 0 , and • d ( k ) = − ( H ( k ) + µ k I ) − 1 g ( k ) is a descent direction; • choose α k properly such that x ( k +1) = x ( k ) − α k ( H ( k ) + µ k I ) − 1 g ( k ) is a descent method. 9 Xiaojing Ye, Math & Stat, Georgia State University

Newton’s method for nonlinear least-squares. Suppose we want to solve m ( r i ( x )) 2 � minimize f ( x ) where f ( x ) = i =1 and r i : R n → R may not be affine. Now denote r ( x ) = [ r 1 ( x ) , . . . , r m ( x )] ⊤ ∈ R m . Then the Jacobian of r : R n → R m is   ∂r 1 ∂r 1 ∂x 1 ( x ) ∂x n ( x ) · · · . .  ...  ∈ R m × n  . . J ( x ) = . .     ∂r m ∂r m  ∂x 1 ( x ) · · · ∂x n ( x ) 10 Xiaojing Ye, Math & Stat, Georgia State University

Note that f ( x ) = � r ( x ) � 2 , therefore, ∇ f ( x ) = 2 J ( x ) ⊤ r ( x ) ∇ 2 f ( x ) = 2( J ( x ) ⊤ J ( x ) + S ( x )) where S ( x ) = � m i =1 r i ( x ) ∇ 2 r i ( x ) ∈ R n × n . In this case, Newton’s method yields x ( k +1) = x ( k ) − ( J ( k ) ⊤ J ( k ) + S ( k ) ) − 1 J ( k ) ⊤ r ( k ) where J ( k ) = J ( x ( k ) ) , S ( k ) = S ( x ( k ) ) , r ( k ) = r ( x ( k ) ) . 11 Xiaojing Ye, Math & Stat, Georgia State University

• If S ( k ) ≈ 0 , then we have x ( k +1) = x ( k ) − ( J ( k ) ⊤ J ( k ) ) − 1 J ( k ) ⊤ r ( k ) This is known as the Gauss-Newton’s method . • If J ( k ) ⊤ J ( k ) is not positive definite, then we modify it: x ( k +1) = x ( k ) − ( J ( k ) ⊤ J ( k ) + µ k I ) − 1 J ( k ) ⊤ r ( k ) This is known as the Levenberg-Marquardt’s method . 12 Xiaojing Ye, Math & Stat, Georgia State University

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department - PowerPoint PPT Presentation

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University 0 Xiaojing Ye, Math & Stat, Georgia State University Newtons method Improve gradient method by using

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Algorithms for Constrained Optimization Xiaojing Ye Department

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Non-Simplex Methods for LP Xiaojing Ye Department of Mathematics

Optimization Unconstrained optimization Constrained optimization Newton with equality

NEWTON SEPAC End of Year Report to Newton School Committee June 10, 2019 Newton SEPAC Co-Chairs

Worldwide Newton Conference Paris, September 2004 eBook composition on the Newton MessagePad 2100

Images of Isaac Newton 1 Portrait of Isaac Newton, Godfrey Kneller, 1689 This image is in the

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

NEWTON EARLY CHILDHOOD PROGRAM STAFF PRESENTATION NEWTON, MA 14 JANUARY 2020 SCHEDULE OVERVIEW

SIR ISAAC NEWTON (1642-1727) Born in the small village of Woolsthorpe, Newton quickly made an

Newton never dies It only gets new hardware Paul Guyot Worldwide Newton Conference 2004

Faces Introduction/Problem Statement Tell me this is Newton Dont tell me this is Newton

Directed Algebraic Topology Scott Newton PhD Student, Ohio State University newton.385@osu.edu

CS 101: Computer Programming and Utilization About These Slides Based on Chapter 8 of the

2 Numerical Mathematics Iteration schemes 2.1 The Newton-Raphson method This is the usual

CSC321 Lecture 14: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 14:

Network flow formulations for a class of nurse scheduling problems Pieter Smet Peter Brucker

Certified Complex Numerical Root Finding Alexander Kobel Max Planck Institute for Informatics

LOGISTIC REGRESSION, GRADIENT LOGISTIC REGRESSION, GRADIENT DESCENT, NEWTON DESCENT, NEWTON

Optimality theory for point estimates Why bother doing the Newton Raphson steps? Why not just use

Introduction to EULAG (and cloud modeling in general) Wojciech W. Grabowski NCAR, Boulder, USA

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department - PowerPoint PPT Presentation

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University 0 Xiaojing Ye, Math & Stat, Georgia State University Newtons method Improve gradient method by using

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &amp;

MATH 4211/6211 Optimization Algorithms for Constrained Optimization Xiaojing Ye Department

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &amp;

MATH 4211/6211 Optimization Non-Simplex Methods for LP Xiaojing Ye Department of Mathematics

Optimization Unconstrained optimization Constrained optimization Newton with equality

NEWTON SEPAC End of Year Report to Newton School Committee June 10, 2019 Newton SEPAC Co-Chairs

Worldwide Newton Conference Paris, September 2004 eBook composition on the Newton MessagePad 2100

Images of Isaac Newton 1 Portrait of Isaac Newton, Godfrey Kneller, 1689 This image is in the

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

NEWTON EARLY CHILDHOOD PROGRAM STAFF PRESENTATION NEWTON, MA 14 JANUARY 2020 SCHEDULE OVERVIEW

SIR ISAAC NEWTON (1642-1727) Born in the small village of Woolsthorpe, Newton quickly made an

Newton never dies It only gets new hardware Paul Guyot Worldwide Newton Conference 2004

Faces Introduction/Problem Statement Tell me this is Newton Dont tell me this is Newton

Directed Algebraic Topology Scott Newton PhD Student, Ohio State University newton.385@osu.edu

CS 101: Computer Programming and Utilization About These Slides Based on Chapter 8 of the

2 Numerical Mathematics Iteration schemes 2.1 The Newton-Raphson method This is the usual

CSC321 Lecture 14: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 14:

Network flow formulations for a class of nurse scheduling problems Pieter Smet Peter Brucker

Certified Complex Numerical Root Finding Alexander Kobel Max Planck Institute for Informatics

LOGISTIC REGRESSION, GRADIENT LOGISTIC REGRESSION, GRADIENT DESCENT, NEWTON DESCENT, NEWTON

Optimality theory for point estimates Why bother doing the Newton Raphson steps? Why not just use

Introduction to EULAG (and cloud modeling in general) Wojciech W. Grabowski NCAR, Boulder, USA

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &