Non-linear Least Squares and Durbin’s Problem Asymptotic Theory — Part V James J. Heckman University of Chicago Econ 312 This draft, April 18, 2006
This lecture consists of two parts: 1. Non-linear least squares: This looks at Non-linear least squares estimation in detail; and 2. Durbin’s problem: This examines the correction of as- ymptotic variances in the case of two stage estimators. 1
1 Nonlinear Least Squares In this section, we examine in detail the Non-linear Least Squares estimator. The section is organized as follows: • Section 1.1: Recap the analog principle motivation for the NLLS estimator (using the extremum principle); • Section 1.2: Consistency of the NLLS estimator; • Section 1.3: Draw analogy with the OLS estimator; • Section 1.4: Asymptotic normality of NLLS estimator; • Section 1.5: Discussion of asymptotic e � ciency; • Section 1.6: Estimation of b � ���� . 2
� � � � � 1.1 NLLS estimator as an application of the Extremum principle Here we recap the derivation of the NLLS estimator as an ap- plication of the Extremum principle, from section 3.2 of the notes Asymptotic Theory II, with slight modification in no- tation. As noted there, we could also motivate NLLS as a moment estimator (refer section 3.2 of Asymptotic Theory II). 1. The model: We assume that in the population the fol- lowing model holds: = � ( � � | � 0 ) + � � (1) = � ( � � ; � ) + [ � ( � � ; � 0 ) � � ( � � ; � )] + � � where � � is a vector of exogenous variables. Unlike in 3
� the linear regression model, � may not necessarily be of the same dimension as � . Since � ( � � | � ) is a nonlinear function of � � and � , (*) is called the nonlinear regression model. Assume ( � � � � � � � � ) i.i.d.; � � � � � � so that: � � � � � ( � � ; � ) � � . Then we can write out a least square criterion function as below. 2. Criterion function: We choose criterion function as: � = � ( � � � � ( � � ; � )) 2 = � [ � ( � � ; � 0 ) � � ( � � ; � )] 2 + � 2 Then � possess the property that it is minimized at � = � 0 (true parameter value). If � = � 0 is the only such value, model is identified (wrt � criterion). 4
� � 3. Analog in sample: X � � ( � ) = 1 ( � � � � � ( � � ; � )) 2 Pick: � =1 as analog to � in the sample. As established in the OLS case in the notes Asymptotic Theory II (Section 3.2), we can show that plim � � = � . 4. The estimator: We construct the NLLS estimator as: ˆ � � = argmin � � ( � ) Thus we chose ˆ � to minimize � � ( � ) . In the next few sections, we establish consistency and asymp- totic normality for the NLLS estimator (under certain condi- tions), and discuss conditions for asymptotic e � ciency. 5
1.2 Consistency of NLLS estimator Assume: 1. � � i.i.d., � ( � � ) = 0 � � ( � 2 � ) = � 2 � ; 2. � 0 is a � vector of unknown parameters; 0 ; 3. Assume � � � � � 0 � � �� � 4. �� � �� exists and is continuous in nbd of � 0 ; 5. � � ( � ) is continuous in � uniformly in � (i.e., for every � � 0 there exists � � 0 such that | � � ( � 1 ) � � � ( � 2 ) | � � for � 1 � � 2 closer than � (i.e. || � 1 � � 2 || � � ), for all � 1 � � 2 in nbd of � 0 and for all � ); 6
� � � � � � � P 6. 1 � � ( � 1 ) � � ( � 2 ) converges uniformly in � 1 � � 2 in nbd of � =1 � 0 ; P ( � � ( � 0 ) � � � ( � )) 2 6 = 0 if � 6 = � 0 . 7. lim 1 Then, we have that there exists a unique root b � � such that: X b ( � � � � ( � � | � )) 2 ; � � = arg min and that it is consistent, i.e.: b � � 0 Proof: Amemiya p. 129. The proof is an application of the Extremum Analogy Theorem for the class of estimators defined as b � � = arg min � � ( � ) . 7
�� 1.3 Analogy with OLS estimator Gallant (1975): Consider the NLLS model from (1) above: � � = � ( � � | � ) + � � Now expand � in nbd of � � in Taylor series to get: ¯ ¯ � � = � ( � � | � � ) + �� ( � � | � ) ¯ � � ( � � � � ) + � � ¯ �� 0 Rewrite the equation as: ¯ ¯ � � � � � � � ( � � | � � ) + �� ( � � | � ) � � � � = �� ( � � | � ) ¯ � + � � ¯ �� 0 8
�� Now by analogy with classical linear regression model, we have: ¯ ¯ • � � � � � � � ( � � | � � ) + �� ( � � | � ) � � � � is analogous to the ¯ ¯ �� 0 dependent variable in OLS. • �� ( � � | � ) is analogous to the independent variables ma- trix � in OLS. 9
�� � � � � �� � � �� �� � � The NLLS estimator is: � X μ �� ( � � | � ) ¶ μ �� ( � � | � ) ¶ ¸ � 1 b = (2) �� 0 � 0 � 0 μX �� ( � � | � ) ¶ × � 0 � so that in comparison to the OLS estimator we have: à ! à ! � ˜ � ˜ P � • � 0 � replaced by ; and � =1 �� 0 μ �� � ¶ P � • � 0 � replaced by � � , � =1 where ˜ � = ( � 1 � � � � � � � ) � � = ( � 1 � � � � � � � ) . 10
�� � Then analogy with OLS goes through exactly. Now, as for the OLS case, we can do hypothesis testing, etc., using derivatives in nbd of optimum. Using the analogy, we also obtain the estimator for Asy. var ( b � ���� ) as: ¯ ¯ � = �� ( � � � � ) \ Asy. var ( b ¯ � 2 (˜ � � ) � 1 � where ˜ � 0 � 0 � ���� ) = ˆ � ˜ ¯ ˆ 11
� �� � � �� �� � � �� � � �� �� �� � � 1.4 Asymptotic normality To justify large sample normality, we need additional condi- tions on the model. The required conditions for asymptotic normality, assuming the conditions for consistency hold, are the following . ¯ ¯ ¯ ¯ P 1 ¯ ¯ 1. lim = � a positive definite matrix; ¯ ¯ �� 0 � =1 � 0 � 0 P 2. 1 �� 0 converges uniformly to a finite matrix in an � =1 open nbd of � 0 ; � 2 � � 3. ���� 0 is continuous in � in an open nbd of � 0 uniformly ( · �� need uniform continuity of first and second partials); 12
� � � �� � � ¸ P � � 2 � � 1 4. lim = 0 for all � in an open nbd of � 0 ; ���� 0 � 2 � =1 and ¯ ¯ P � � ( � 1 ) � 2 � � 5. 1 ¯ converges to a finite matrix uniformly ¯ ���� 0 � =1 � 2 for all � 1 � � 2 in an open nbd of � 0 � Then: � ( b � � � � 0 ) � � (0 � � 2 � � � 1 ) where � 2 � = ��� ( � � ) . Sketch of proof (For rigorous proof, see Amemiya, p.132-4).The intuition for this result is exactly as in Cramer’s Theorem (refer to Section 2 of notes Asymptotic Theory III). 13
� � �� �� � � �� � �� � �� Look at first order condition: X = � 2 ( � � � � � ( � )) �� � � =1 Then as in Cramer’s theorem (Theorem 3 in handout III) we get: ¯ ¯ X ¯ ¯ = � 1 ¯ ¯ ( � � ) �� � � �� � ¯ ¯ � 0 � 0 14
� �� � � �� � �� � � � � � � This is asymptotically normal (i.i.d. r.v.) by Lindeberg-Levy Central Limit Theorem. Then using equation (2) we obtain: " ¶# � 1 μ �� ( � � | � ) ¶ μ �� ( � � | � ) X 1 � ( b � � � 0 ) = �� 0 � =1 μ �� ( � � | � ) ¶ X × 1 ( � � ) � � =1 We get that this is asymptotically normal in nbd of � 0 , if £ 1 P ¡ �� � ¢ ¡ �� � ¢¤ � converges uniformly to a non-singular ma- �� 0 trix � (which is true by assumption). This completes the analogy with the Cramer’s theorem proved in earlier lecture. (See Amemiya for a rigorous derivation. Also, see the result in Gallant.) 15
� 1.5 Asymptotic e � ciency of NLLS estima- tor Analogy of the NLLS estimator with ��� is complete if we assume � � is normal. Then, we get the log likelihood function: X 1 2 ln � 2 ( � � � � ( � � | � )) 2 ln $ = � � 2 � 2 � � So that here we get b � ��� = b � ���� (FOC and asy. theory as before). 16
�� Thus we obtain the general result that any nonlinear regression model � � ��� if we have that � � normal. Though the nonlinear regression is picking another criterion, the estimator is identical to the MLE estimator. NLLS estimator is e � cient in normal case. In general, · Greene (p. 305-8) shows that (unless � � is normal) NLLS is not necessarily asymptotically e � cient. 17
� ���� Estimation of b 1.6 Now, consider the problem of numerical estimation: How to obtain b � ? The two commonly used methods are: i. Newton-Raphson; and ii. Gauss-Newton. 18
Recommend
More recommend