Properties of Extremum Estimators Asymptotic Theory — Part III James J. Heckman University of Chicago Econ 312 This draft, April 12, 2006
As we saw in an earlier lecture (Asymptotic Theory — Part II), the Maximum Likelihood Estimator, Nonlinear Least Squares Estimator (NLS) and even the OLS estimator are all examples of “Extremum Estimators”. In this lecture we examine theo- rems and proofs for the consistency and asymptotic normality of Extremum Estimators in a somewhat specialized, but easily generalized, form. 1
The following theorems lay out the conditions under which extremum estimators are consistent and asymptotically nor- mal. They each talk about estimators using the maximum principle, but can trivially be extended to minimum principle estimators by placing a negative sign in front of � ( �� � ) , 1 i.e. min � = max[ � � ] . 1 In this lecture, { � } denotes all the data, and hence includes both dependent and independent variables (corresponding to { � } and { � } in the earlier lecture). 2
1 Consistency of extremum estima- tors The first theorem proves consistency when the criterion func- tion has a globally unique maximum or minimum, respectively in the population. Thus � is uniquely identified. Di � erentia- bility of � � ( � ) is not required. The second theorem states the additional assumptions you have to make if � is only locally identified, i.e. there are mul- tiple solutions to { max � } but only one is in the neighborhood � ( � 0 ) of � 0 . It assumes di � erentiability of � � ( � ) . 3
� � �� �� Theorem 1 (Global): Assume that 1. Parameter space � is a compact subset of � � ; 2. � � ( �� � ) is continuous in �� � � � , � � and is a measur- able function of �� � � � � ; 3. � � ( �� � ) � � ( � ) , a nonstochastic function, in probability uniformly as � � � ; and 4. � 0 = arg max � � � � ( � ) is globally identified. (i.e. � ( � ) achieves global maximum at � 0 ). If we let ˆ � � = arg max � � � � � ( �� � ) , then: ˆ � � 0 . 4
Observe that continuity of � ( � ) follows from the fact that lim- its of uniformly continuous functions are continuous, and con- tinuity of � in � and compactness of � implies uniform conti- nuity of � ( � ) . 5
� � � � � � � � � � � Proof. Let � ( � 0 ) be an open neighborhood in � � containing � 0 . Then � � ( � 0 ) , the complement of � ( � 0 ) , is closed, so � � � � ( � 0 ) � � is compact. � max � � � � ( � ) exists. Denote � = [ � ( � 0 ) � max � � � � ( � )] � 0 . Let � � be the event = {| � � ( � ) � � ( � ) | � �� 2 } = { � �� 2 � � � ( � ) � � ( � ) � �� 2 } This event is “likely” with � big due to assumption (3) (uni- form convergence of � � to � ), i.e.: pr. uniformly � = � Pr { � � } � 1 as � � � (*) 6
� � � � � � � � � Then � � implies: ³ ´ ³ ´ ˆ ˆ 1. � � �� 2 2. � � ( � 0 ) � � ( � 0 ) � �� 2 ³ ´ ˆ � � � ( � 0 ) by the definition of ˆ Also we have � � � � . Then from the above facts we get: � (ˆ � � ) � � � (ˆ � � ) � �� 2 � � � ( � 0 ) � �� 2 � � ( � 0 ) � �� � � (ˆ � � ) � � ( � 0 ) � � . Since we have a strict inequality, from the definition of � , we get that: � � � { ˆ � � � � ( � 0 ) } � for � su � ciently large. 7
�� � � Then it must be that: Pr { � � } � Pr { ˆ � � � � ( � 0 ) } � Then, from equation (*) we have that: � �� Pr { ˆ lim � �� Pr { � � } = 1 � lim � � � � ( � 0 ) } = 1 and so ˆ � � 0 , because choice of � is arbitrary. 8
�� �� � Theorem 2 (Local): Assume that: 1. Parameter space � is an open subset of � � that contains � 0 ; 2. � � ( �� � ) is a measurable function of � � � � � ; 3. exists and is continuous in an open neighborhood � 1 ( � 0 ) of � 0 (this implies � � is continuous � � � � 1 ( � 0 ) ); 4. There exists an open neighborhood � 2 ( � 0 ) of � 0 such that � � ( �� � ) � � ( � ) , a non-stochastic function, in probabil- ity uniformly � � � � 2 ( � 0 ) as � � � ; and 5. � 0 = arg max � � � 2 ( � 0 ) � ( � ) is locally identified. 9
If we let ˆ � � denote the set of roots of �� � �� = 0 corresponding to the local maxima; then, for any � � 0 � n o � � ˆ lim � �� Pr � � inf | � � � 0 | � 0 = 0 � Proof. See Amemiya, chapter 4. 10
2 Asymptotic normality of extremum estimators Now we will show that under certain conditions on the first and second derivatives of � , the criterion function for an estimator which uses the extremum principle, the asymptotic distribution of the extremum estimator ˆ � � (chosen as the maximizer of � � ) is normal. 11
� � � � � � Theorem 3 (Cramer): Assume the conditions of Theorem 2, in addition: 1. � 2 � � ���� 0 exists and is continuous in an open neighborhood of � 0 ; 2. There exists an open neighborhood � ( � 0 ) of � 0 such that � � ( �� � ) � � ( � ) , a nonstochastic function, in probabil- ity uniformly � � � � ( � 0 ) as � � � . ¯ 3. � 2 � � ( � ) ¯ ¯ � � ( � 0 ) if � � � � 0 , where ¯ ���� 0 μ � 2 � � ( � ) ¶ � ( � 0 ) = � lim is nonsingular; and ���� 0 � 0 12
� �� �� � � � � � � �� �� � à ! ¯ ¯ �� � ( � ) ¯ 4. � � (0 � � ( � 0 )) , where ¯ " � 0 # ¯ · �� � ( � ) 0 ¯ �� � ( � ) ¯ � ( � 0 ) = � ¯ � 0 If we let ˆ � � denote the root of �� � = 0 , then: ³ ´ ¡ 0 � � ( � 0 ) � 1 � ( � 0 ) � ( � 0 ) � 1 ¢ ˆ � � � � 0 13
� �� �� � � � �� � � � �� �� � � � �� � � � � � � � ¯ ¯ ¯ Proof. By assumption we have: �� � = 0 . ¯ ˆ Then taking a Taylor expansion of the l.h.s. around � 0 , we have ¯ ¯ ¯ ³ ´ + � 2 � � ¯ ¯ ¯ ˆ ¯ ¯ ¯ = �� � + � � (1) , � � � � 0 ¯ ˆ ¯ ¯ ���� 0 � 0 where � � lies between ˆ � � and � 0 . Multiplying by � , we get: ¯ ¯ ³ ´ + � 2 � � ¯ ¯ ˆ ¯ ¯ � � (1) + = 0 � � �� � � � � � 0 ¯ ¯ ���� 0 � 0 Rearranging, we get: ¯ ¯ μ � 2 � � ¶ � 1 � ³ ´ ¯ ¯ ˆ ¯ ¯ = � + � � (1) � � �� � � � � � 0 ¯ ¯ ���� 0 � 0 14
� � � �� � � � � � � � � � � � � � � Since ˆ = � � 0 , we see the first object on the � � 0 r.h.s. becomes: ¯ ¯ � 2 � � � � 2 � � ¯ ¯ ¯ ¯ = � ( � 0 ) � ¯ ¯ ���� 0 ���� 0 � 0 where � ( � 0 ) is constant. As for the second object on the r.h.s., by assumption, à ! ¯ ¯ �� � ( � ) ¯ � � (0 � � ( � 0 )) � ¯ � 0 Putting this all together we have, by Slutsky’s Theorem, ³ ´ ¡ ¢ ˆ 0 � � ( � 0 ) � 1 � ( � 0 ) � ( � 0 ) � � � � 0 15
� � � � �� �� �� � � � � Observe that assumption (4) is a consequence of a uniform central limit theorem. à ! à � ! ¯ X ¯ �� � ( � ) 1 ¯ = � � ( � ) ¯ � =1 � 0 i.i.d. random variables with mean zero and we norm them by � . We get, by a CLT, that the variance of this random variable is � μ �� � ( � ) ¶ μ �� � ( � ) ¶ 0 ¸ · 16
References [1] Amemiya, Advanced Econometrics , 1985, chapter 4. [2] Newey and McFadden, Large Sample Estimation and Hy- pothesis Testing, in Handbook of Econometrics , 1994, chap- ter 36, Volume IV. 17
Recommend
More recommend