Stat 451 Lecture Notes 02 12 Root-finding and Optimization Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaard’s ISwR book, Chapter 1 in Givens & Hoeting, and Chapter 7 of Lange 2 Updated: February 8, 2016 1 / 49
Outline 1 Introduction 2 Univariate problems 3 Multivariate problems 4 Other miscellaneous things 2 / 49
Motivation In statistical applications, point estimation problems often boil down maximizing a function: maximum likelihood least squares maximum a posteriori When the function to be optimized is “smooth,” we can reformulate optimization into a root-finding problem. Trouble: these problems often have no analytical solution. Therefore, we need numerical tools to solve them. 3 / 49
General setup Two kinds of problems: Root-finding: Solve f ( x ) = 0 for x ∈ R d , d ≥ 1. Optimization: Maximize g ( x ) for x ∈ R d , d ≥ 1. Equivalent if f is the derivative of g . We will address univariate and multivariate cases separately. Methods construct a sequence { x t : t ≥ 0 } designed to converge (as t → ∞ ) to the solution, denoted by x ⋆ . 4 / 49
General setup, cont. Theoretical considerations: Under what conditions on f (or g ) and initial guess x 0 can we prove that x t → x ⋆ ? If x t → x ⋆ , then how fast, i.e., what is its convergence order? Practical consideration: How to write and implement the algorithm? Can’t run the algorithm till t = ∞ , so how to stop? 5 / 49
Outline 1 Introduction 2 Univariate problems Bisection Newton’s method Fisher scoring Secant method Fixed-point iteration Available functions in R 3 Multivariate problems 4 Other miscellaneous things 6 / 49
Outline 1 Introduction 2 Univariate problems Bisection Newton’s method Fisher scoring Secant method Fixed-point iteration Available functions in R 3 Multivariate problems 4 Other miscellaneous things 7 / 49
Bisection – basic idea Find unique root x ⋆ of f in interval [ a , b ]. Claim: f ( a ) f ( b ) ≤ 0. Why? Pick an initial guess x 0 = ( a + b ) / 2. Then x ⋆ must be in either [ a , x 0 ] or [ x 0 , b ] . Evaluate f ( x ) at the endpoints to determine which one. The selected interval, call it [ a 1 , b 1 ], is now just like the initial interval; i.e., we know it must contain x ⋆ . Take x 1 = ( a 1 + b 1 ) / 2. Continue this process to construct a sequence { x t : t ≥ 0 } . 8 / 49
Bisection algorithm Assume f ( x ) and the interval [ a , b ] are given. 1 Set x = ( a + b ) / 2. 2 If f ( a ) f ( x ) ≤ 0, then b = x ; else a = x . 3 If “converged,” then stop; otherwise return to Step 1. The convergence criteria is usually something like | x new − x old | < ε, where ε is a specified small number, e.g., ε = 10 − 7 . A relative convergence criteria might be better: | x new − x old | < ε. | x old | 9 / 49
Bisection theory Claim: if f is continuous, then x t → x ⋆ . Proof: If [ a t , b t ] is the bounding interval at step t , then f ( a t ) f ( b t ) ≤ 0 and t →∞ a t = lim lim t →∞ b t . So, x t converges to some x ∞ , and by continuity f ( x ∞ ) 2 ≤ 0. Then f ( x ∞ ) = 0 and, since x ⋆ is unique root, x ∞ = x ⋆ . Convergence holds under very mild conditions of f , but the robustness comes at the price of the order of convergence. 10 / 49
Examples Find x ⋆ to maximize the function g ( x ) = log x 1 + x , x ∈ [1 , 5] . Note that g ′ ( x ) = 1 + x − 1 − log x . (1 + x ) 2 Find the 100 p -th percentile of a Student-t distribution, i.e., find x ⋆ such that F ( x ⋆ ) = p , where F is the t ν distribution function, with df = ν fixed. 11 / 49
Outline 1 Introduction 2 Univariate problems Bisection Newton’s method Fisher scoring Secant method Fixed-point iteration Available functions in R 3 Multivariate problems 4 Other miscellaneous things 12 / 49
Basic idea Newton’s method is usually presented in a calculus class. Idea is to approximate a nonlinear function near its root by a linear function which can be solved by hand. Recall Taylor’s theorem gives the linear approximation of a function f ( x ) in a neighborhood of some point x 0 as f ( x ) ≈ f ( x 0 ) + f ′ ( x 0 ) ( x − x 0 ) . Setting this approximation equal to 0 and solving gives x = x 0 − f ( x 0 ) f ′ ( x 0 ) . 13 / 49
Newton method – algorithm Assume the function f ( x ), its derivative f ′ ( x ), and an initial guess x 0 are given. Set t = 0. 1 Calculate x t +1 = x t − f ( x t ) f ′ ( x t ) . 2 If the convergence criteria is met, then stop; otherwise, set t ← t + 1 and return to Step 1. Warnings: Convergence depends on choice of x 0 . Unlike bisection, Newton might not converge! 14 / 49
Newton method – theory Claim: If f ′′ is continuous and x ⋆ is a root of f , with f ′ ( x ⋆ ) � = 0, then there exists a neighborhood N of x ⋆ such that Newton’s method converges to x ⋆ for any x 0 ∈ N . Proof uses Taylor approximation — given in class. Proof also shows that the convergence order is quadratic. Other results about Newton’s method are available; see HW. If Newton converges, then it’s faster than bisection, but added speed has a cost: Requires differentiability and the derivative f ′ Convergence is sensitive to choice of x 0 . 15 / 49
Outline 1 Introduction 2 Univariate problems Bisection Newton’s method Fisher scoring Secant method Fixed-point iteration Available functions in R 3 Multivariate problems 4 Other miscellaneous things 16 / 49
Fisher scoring In maximum likelihood applications, the goal is to find roots of the log-likelihood function, i.e., ℓ ′ (ˆ θ ) = 0. In this context, Newton’s method looks like θ t +1 = θ t − ℓ ′ ( θ t ) ℓ ′′ ( θ t ) , t ≥ 0 . But recall that − ℓ ′′ ( θ ) is an approximation to the Fisher information I n ( θ ). So, can rewrite Newton’s method as θ t +1 = θ t + ℓ ′ ( θ t ) I n ( θ t ) , t ≥ 0 . This modification is called Fisher scoring . 17 / 49
Outline 1 Introduction 2 Univariate problems Bisection Newton’s method Fisher scoring Secant method Fixed-point iteration Available functions in R 3 Multivariate problems 4 Other miscellaneous things 18 / 49
Secant method – basic idea Newton’s method requires a formula for f ′ ( x ). To avoid this, approximate f ′ ( x ) at x 0 by a difference ratio . That is, recall from calculus that f ′ ( x ) ≈ f ( x + h ) − f ( x ) , h small and positive . h Then the secant method follows Newton’s method exactly, except we substitute a difference ratio for f ′ ( x ). Name is because the linear approximation is a secant not a tangent line. 19 / 49
Secant method – algorithm Suppose f ( x ) and two initial guesses x 0 , x 1 are given. Set t = 1. 1 Calculate f ( x t ) x t +1 = x t − . f ( x t ) − f ( x t − 1 ) x t − x t − 1 2 If the convergence criteria are satisfied, then stop; else, set t ← t + 1 and return to Step 1. Two initial guesses are needed because the difference ratio in the first iteration requires two values. Can be unstable at early iterations because the difference ratio may be a poor approximation of f ′ ; reasonable sacrifice if f ′ is not available. If secant method converges, order is almost quadratic. 20 / 49
Outline 1 Introduction 2 Univariate problems Bisection Newton’s method Fisher scoring Secant method Fixed-point iteration Available functions in R 3 Multivariate problems 4 Other miscellaneous things 21 / 49
Fixed-point iteration – basic idea Some problems require finding a fixed-point — i.e., a point x ⋆ such that F ( x ⋆ ) = x ⋆ . A root-finding problem can be written as a fixed point problem with F ( x ) = f ( x ) + x . If the function F ( x ) is a contraction , i.e., | F ( x ) − F ( y ) | ≤ λ | x − y | , for λ ∈ (0 , 1) , then the point F ( x ) will be closer to x ⋆ = F ( x ⋆ ) than x . Banach’s fixed-point theorem says: contraction mappings have unique fixed point x ⋆ , and from a starting point x 0 , the iterates x t +1 = F ( x t ) will converge to x ⋆ . 22 / 49
Fixed-point iteration – algorithm Suppose F ( x ) and an initial guess x 0 are given. Set t = 0. 1 Calculate x t +1 = F ( x t ). 2 If convergence criteria are met, then stop; else, set t ← t + 1 and return to Step 1. It can be shown that λ t | F ( x t ) − x ⋆ | ≤ 1 − λ | x 0 − x ⋆ | , so fixed-point iteration converges at a geometric rate. If using fixed-point methods for root-finding, F ( x ) = f ( x ) + x may not be the best choice, e.g., maybe a scaled version would be better. 23 / 49
Example – Kepler’s equation Kepler’s equation in orbital mechanics says x = M + ε sin x , where M and ε ∈ (0 , 1) are fixed quantities. 3 Our goal is to solve for x , given M and ε . Write F ( x ) = M + ε sin x . Then F ′ ( x ) = ε cos x and | F ′ ( x ) | is uniformly bounded by ε . So F is a contraction and fixed-point iteration will converge to a solution to Kepler’s equation. 3 See wikipedia page on Kepler’s equation for more info. 24 / 49
Outline 1 Introduction 2 Univariate problems Bisection Newton’s method Fisher scoring Secant method Fixed-point iteration Available functions in R 3 Multivariate problems 4 Other miscellaneous things 25 / 49
Recommend
More recommend