Non-linear Least Squares and Durbins Problem Asymptotic Theory - PowerPoint PPT Presentation

Non-linear Least Squares and Durbin’s Problem Asymptotic Theory — Part V James J. Heckman University of Chicago Econ 312 This draft, April 18, 2006

This lecture consists of two parts: 1. Non-linear least squares: This looks at Non-linear least squares estimation in detail; and 2. Durbin’s problem: This examines the correction of asymptotic variances in the case of two stage estimators. 1

1 Nonlinear Least Squares In this section, we examine in detail the Non-linear Least Squares estimator. The section is organized as follows: • Section 1.1: Recap the analog principle motivation for the NLLS estimator (using the extremum principle); • Section 1.2: Consistency of the NLLS estimator; • Section 1.3: Draw analogy with the OLS estimator; • Section 1.4: Asymptotic normality of NLLS estimator; • Section 1.5: Discussion of asymptotic e � ciency; • Section 1.6: Estimation of b � �� . 2

� � � � � 1.1 NLLS estimator as an application of the Extremum principle Here we recap the derivation of the NLLS estimator as an application of the Extremum principle, from section 3.2 of the notes Asymptotic Theory II, with slight modification in no- tation. As noted there, we could also motivate NLLS as a moment estimator (refer section 3.2 of Asymptotic Theory II). 1. The model: We assume that in the population the following model holds: = � ( � � | � 0 ) + � � (1) = � ( � � ; � ) + [ � ( � � ; � 0 ) � � ( � � ; � )] + � � where � � is a vector of exogenous variables. Unlike in 3

� the linear regression model, � may not necessarily be of the same dimension as � . Since � ( � � | � ) is a nonlinear function of � � and � , (*) is called the nonlinear regression model. Assume ( � � � � � � � � ) i.i.d.; � � � � � � so that: � � � � � ( � � ; � ) � � . Then we can write out a least square criterion function as below. 2. Criterion function: We choose criterion function as: � = � ( � � � � ( � � ; � )) 2 = � [ � ( � � ; � 0 ) � � ( � � ; � )] 2 + � 2 Then � possess the property that it is minimized at � = � 0 (true parameter value). If � = � 0 is the only such value, model is identified (wrt � criterion). 4

� � 3. Analog in sample: X � � ( � ) = 1 ( � � � � � ( � � ; � )) 2 Pick: � =1 as analog to � in the sample. As established in the OLS case in the notes Asymptotic Theory II (Section 3.2), we can show that plim � � = � . 4. The estimator: We construct the NLLS estimator as: ˆ � � = argmin � � ( � ) Thus we chose ˆ � to minimize � � ( � ) . In the next few sections, we establish consistency and asymptotic normality for the NLLS estimator (under certain conditions), and discuss conditions for asymptotic e � ciency. 5

1.2 Consistency of NLLS estimator Assume: 1. � � i.i.d., � ( � � ) = 0 � � ( � 2 � ) = � 2 � ; 2. � 0 is a � vector of unknown parameters; 0 ; 3. Assume � � � � � 0 � � �� 4. �� exists and is continuous in nbd of � 0 ; 5. � � ( � ) is continuous in � uniformly in � (i.e., for every � � 0 there exists � � 0 such that | � � ( � 1 ) � � � ( � 2 ) | � � for � 1 � � 2 closer than � (i.e. || � 1 � � 2 || � � ), for all � 1 � � 2 in nbd of � 0 and for all � ); 6

� � � � � � � P 6. 1 � � ( � 1 ) � � ( � 2 ) converges uniformly in � 1 � � 2 in nbd of � =1 � 0 ; P ( � � ( � 0 ) � � � ( � )) 2 6 = 0 if � 6 = � 0 . 7. lim 1 Then, we have that there exists a unique root b � � such that: X b ( � � � � ( � � | � )) 2 ; � � = arg min and that it is consistent, i.e.: b � � 0 Proof: Amemiya p. 129. The proof is an application of the Extremum Analogy Theorem for the class of estimators defined as b � � = arg min � � ( � ) . 7

�� 1.3 Analogy with OLS estimator Gallant (1975): Consider the NLLS model from (1) above: � � = � ( � � | � ) + � � Now expand � in nbd of � � in Taylor series to get: ¯ ¯ � � = � ( � � | � � ) + �� ( � � | � ) ¯ � � ( � � � � ) + � � ¯ �� 0 Rewrite the equation as: ¯ ¯ � � � � � � � ( � � | � � ) + �� ( � � | � ) � � � � = �� ( � � | � ) ¯ � + � � ¯ �� 0 8

�� Now by analogy with classical linear regression model, we have: ¯ ¯ • � � � � � � � ( � � | � � ) + �� ( � � | � ) � � � � is analogous to the ¯ ¯ �� 0 dependent variable in OLS. • �� ( � � | � ) is analogous to the independent variables matrix � in OLS. 9

�� The NLLS estimator is: � X μ �� ( � � | � ) ¶ μ �� ( � � | � ) ¶ ¸ � 1 b = (2) �� 0 � 0 � 0 μX �� ( � � | � ) ¶ × � 0 � so that in comparison to the OLS estimator we have: Ã ! Ã ! � ˜ � ˜ P � • � 0 � replaced by ; and � =1 �� 0 μ �� ¶ P � • � 0 � replaced by � � , � =1 where ˜ � = ( � 1 � � � � � � � ) � � = ( � 1 � � � � � � � ) . 10

�� Then analogy with OLS goes through exactly. Now, as for the OLS case, we can do hypothesis testing, etc., using derivatives in nbd of optimum. Using the analogy, we also obtain the estimator for Asy. var ( b � �� ) as: ¯ ¯ � = �� ( � � � � ) \ Asy. var ( b ¯ � 2 (˜ � � ) � 1 � where ˜ � 0 � 0 � �� ) = ˆ � ˜ ¯ ˆ 11

� �� 1.4 Asymptotic normality To justify large sample normality, we need additional conditions on the model. The required conditions for asymptotic normality, assuming the conditions for consistency hold, are the following . ¯ ¯ ¯ ¯ P 1 ¯ ¯ 1. lim = � a positive definite matrix; ¯ ¯ �� 0 � =1 � 0 � 0 P 2. 1 �� 0 converges uniformly to a finite matrix in an � =1 open nbd of � 0 ; � 2 � � 3. �� 0 is continuous in � in an open nbd of � 0 uniformly ( · �� need uniform continuity of first and second partials); 12

� � � �� ¸ P � � 2 � � 1 4. lim = 0 for all � in an open nbd of � 0 ; �� 0 � 2 � =1 and ¯ ¯ P � � ( � 1 ) � 2 � � 5. 1 ¯ converges to a finite matrix uniformly ¯ �� 0 � =1 � 2 for all � 1 � � 2 in an open nbd of � 0 � Then: � ( b � � � � 0 ) � � (0 � � 2 � � � 1 ) where � 2 � = �� ( � � ) . Sketch of proof (For rigorous proof, see Amemiya, p.132-4).The intuition for this result is exactly as in Cramer’s Theorem (refer to Section 2 of notes Asymptotic Theory III). 13

� � �� Look at first order condition: X = � 2 ( � � � � � ( � )) �� =1 Then as in Cramer’s theorem (Theorem 3 in handout III) we get: ¯ ¯ X ¯ ¯ = � 1 ¯ ¯ ( � � ) �� ¯ ¯ � 0 � 0 14

� �� This is asymptotically normal (i.i.d. r.v.) by Lindeberg-Levy Central Limit Theorem. Then using equation (2) we obtain: " ¶# � 1 μ �� ( � � | � ) ¶ μ �� ( � � | � ) X 1 � ( b � � � 0 ) = �� 0 � =1 μ �� ( � � | � ) ¶ X × 1 ( � � ) � � =1 We get that this is asymptotically normal in nbd of � 0 , if £ 1 P ¡ �� ¢ ¡ �� ¢¤ � converges uniformly to a non-singular ma- �� 0 trix � (which is true by assumption). This completes the analogy with the Cramer’s theorem proved in earlier lecture. (See Amemiya for a rigorous derivation. Also, see the result in Gallant.) 15

� 1.5 Asymptotic e � ciency of NLLS estimator Analogy of the NLLS estimator with �� is complete if we assume � � is normal. Then, we get the log likelihood function: X 1 2 ln � 2 ( � � � � ( � � | � )) 2 ln $ = � � 2 � 2 � � So that here we get b � �� = b � �� (FOC and asy. theory as before). 16

�� Thus we obtain the general result that any nonlinear regression model � � �� if we have that � � normal. Though the nonlinear regression is picking another criterion, the estimator is identical to the MLE estimator. NLLS estimator is e � cient in normal case. In general, · Greene (p. 305-8) shows that (unless � � is normal) NLLS is not necessarily asymptotically e � cient. 17

� �� Estimation of b 1.6 Now, consider the problem of numerical estimation: How to obtain b � ? The two commonly used methods are: i. Newton-Raphson; and ii. Gauss-Newton. 18

Non-linear Least Squares and Durbins Problem Asymptotic Theory - PowerPoint PPT Presentation

Non-linear Least Squares and Durbins Problem Asymptotic Theory Part V James J. Heckman University of Chicago Econ 312 This draft, April 18, 2006 This lecture consists of two parts: 1. Non-linear least squares: This looks at Non-linear

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

8. Least squares Review of linear equations Least squares Example: curve-fitting

Geometry of Least Squares 2 Least squares from the

9. Equality constraints and tradeoffs More least squares Example: moving average model

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Parameter estimation: non-linear least squares and non-linear mixed effects modeling Anika

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

LBNF / RR Injection Kicker Angled Adapter Measurement Update Dennis Barak, Chris Jensen, George

Geodesic Snakes Level-Set Evolution CS7960 Advanced Image Processing April 8 th , 2010 Jonathan

SRF and Cryogenics (121.02) Genfa Wu In partnership with: India/DAE PIP-II Independent Project

Chinese Overseas Direct Investment in Destination Countries Other Than the U.S. Brian D. Beglin

Observations of convective cooling in the tropical tropopause layer in AIRS data Hyun Cheol Kim

CENG4480 Lecture 07: PID Control Bei Yu byu@cse.cuhk.edu.hk (Latest update: August 19, 2020)

Solving PDEs for Electrostatics Via Relaxation (Simple, Not Industrial Strength) Rubin H Landau

SPATIAL SEARCHES IN ASTRONOMY DATABASES MULTI-DIMENSIONAL INDEXING FOR SIMULATIONS AND