least squares estimation large sample properties
play

Least Squares Estimation- Large-Sample Properties Ping Yu School of - PowerPoint PPT Presentation

Least Squares Estimation- Large-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Large-Sample 1 / 63 Asymptotics for the LSE 1 Covariance Matrix Estimators 2 Functions of Parameters 3


  1. Least Squares Estimation- Large-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Large-Sample 1 / 63

  2. Asymptotics for the LSE 1 Covariance Matrix Estimators 2 Functions of Parameters 3 The t Test 4 p -Value 5 Confidence Interval 6 The Wald Test 7 Confidence Region Problems with Tests of Nonlinear Hypotheses 8 Test Consistency 9 Asymptotic Local Power 10 Ping Yu (HKU) Large-Sample 2 / 63

  3. Introduction � β , σ 2 ( X 0 X ) � 1 � If u j x � N ( 0 , σ 2 ) , we have shown that b β j X � N . In general the distribution of u j x is unknown. Even if it is known, the unconditional distribution of b β is hard to derive since b β = ( X 0 X ) � 1 X 0 y is a complicated function of f x i g n i = 1 . The asymptotic (or large sample) method approximates (unconditional) sampling distributions based on the limiting experiment that the sample size n tends to infinity. It does not require any assumption on the distribution of u j x , and only some moments restrictions are imposed. Three steps: consistency, asymptotic normality and estimation of the covariance matrix. Ping Yu (HKU) Large-Sample 2 / 63

  4. Asymptotics for the LSE Asymptotics for the LSE Ping Yu (HKU) Large-Sample 3 / 63

  5. Asymptotics for the LSE Consistency Express b β as β = ( X 0 X ) � 1 X 0 y = ( X 0 X ) � 1 X 0 ( X β + u ) = β +( X 0 X ) � 1 X 0 u . b (1) To show b β is consistent, we impose the following additional assumptions. Assumption OLS.1 0 : rank ( E [ xx 0 ]) = k . Assumption OLS.2 0 : y = x 0 β + u with E [ x u ] = 0 . h k x k 2 i Assumption OLS.1 0 implicitly assumes that E < ∞ . Assumption OLS.1 0 is the large-sample counterpart of Assumption OLS.1. Assumption OLS.2 0 is weaker than Assumption OLS.2. Ping Yu (HKU) Large-Sample 4 / 63

  6. Asymptotics for the LSE Theorem p Under Assumptions OLS.0, OLS.1 0 , OLS.2 0 and OLS.3, b β ! β . � Proof. p p From (1), to show b ! β , we need only to show that ( X 0 X ) � 1 X 0 u β � � ! 0. Note that ! � 1 ! n n 1 1 ∑ ∑ ( X 0 X ) � 1 X 0 u x i x 0 = x i u i i n n i = 1 i = 1 ! n n 1 i , 1 ∑ ∑ p x i x 0 ! E [ x i x 0 i ] � 1 E [ x i u i ] = 0 . = g x i u i � n n i = 1 i = 1 Here, the convergence in probability is from (I) the WLLN which implies n n 1 i ] and 1 ∑ ∑ p p x i x 0 ! E [ x i x 0 � x i u i � ! E [ x i u i ] ; (2) i n n i = 1 i = 1 � � (II) the fact that g ( A , b ) = A � 1 b is a continuous function at E [ x i x 0 i ] , E [ x i u i ] . The last equality is from Assumption OLS.2 0 . Ping Yu (HKU) Large-Sample 5 / 63

  7. Asymptotics for the LSE Proof. [Proof continue] (I) To apply the WLLN, we require (i) x i x 0 i and x i u i are i.i.d., which is implied by Assumption OLS.0 and that functions of i.i.d. data are also i.i.d.; (ii) h k x k 2 i < ∞ (OLS.1 0 ) and E [ k x u k ] < ∞ . E [ k x u k ] < ∞ is implied by the E Cauchy-Schwarz inequality, a h k x k 2 i 1 / 2 h j u j 2 i 1 / 2 E [ k x u k ] � E E , which is finite by Assumption OLS.1 0 and OLS.3. (II) To guarantee A � 1 b to be a � � i ] � 1 exists which E [ x i x 0 , we must assume that E [ x i x 0 continuous function at i ] , E [ x i u i ] is implied by Assumption OLS.1 0 . b h k X k 2 i 1 / 2 h k Y k 2 i 1 / 2 a Cauchy-Schwarz inequality: For any random m � n matrices X and Y , E [ k X 0 Y k ] � E E , where the inner product is defined as h X , Y i = E [ k X 0 Y k ] , and for a m � n matrix A , � � 1 / 2 = [ trace ( A 0 A )] 1 / 2 . ∑ m i = 1 ∑ n j = 1 a 2 k A k = ij i ] � 1 = E [ x 2 i ] � 1 is the reciprocal of E [ x 2 b If x i 2 R , E [ x i x 0 i ] which is a continuous function of E [ x 2 i ] only if E [ x 2 i ] 6 = 0. Ping Yu (HKU) Large-Sample 6 / 63

  8. Asymptotics for the LSE σ 2 and s 2 Consistency of b Theorem p p σ 2 ! σ 2 and s 2 ! σ 2 . Under the assumptions of Theorem 1, b � � Proof. Note that i b y i � x 0 b u i = β i b u i + x 0 i β � x 0 = β �b � u i � x 0 = β � β . i Thus �b � �b � 0 �b � u 2 i = u 2 b i � 2 u i x 0 x i x 0 β � β + β � β β � β (3) i i and n σ 2 = 1 ∑ u 2 b b i n i = 1 Ping Yu (HKU) Large-Sample 7 / 63

  9. Asymptotics for the LSE Proof. [Proof continue] !�b !�b � �b � 0 � n n n 1 1 1 ∑ ∑ ∑ u 2 u i x 0 x i x 0 = i � 2 β � β + β � β β � β i i n n n i = 1 i = 1 i = 1 p ! σ 2 , � where the last line uses the WLLN, (2), Theorem 1 and the CMT. Finally, since n / ( n � k ) ! 1, it follows that n p s 2 = σ 2 ! σ 2 n � k b � by the CMT. One implication of this theorem is that multiple estimators can be consistent for the population parameter. σ 2 and s 2 are unequal in any given application, they are close in value While b when n is very large. Ping Yu (HKU) Large-Sample 8 / 63

  10. Asymptotics for the LSE Asymptotic Normality To study the asymptotic normality of b β , we impose the following additional assumption. h k x k 4 i Assumption OLS.5 : E [ u 4 ] < ∞ and E < ∞ . Theorem Under Assumptions OLS.0, OLS.1 0 , OLS.2 0 , OLS.3 and OLS.5, �b � p d n β � β ! N ( 0 , V ) , � h i � � where V = Q � 1 Ω Q � 1 with Q = E x i x 0 x i x 0 i u 2 and Ω = E . i i Proof. From (1), ! � 1 ! �b � n n p 1 1 ∑ ∑ x i x 0 n β � β = p n x i u i . i n i = 1 i = 1 Ping Yu (HKU) Large-Sample 9 / 63

  11. Asymptotics for the LSE Proof. [Proof continue] Note first that h� � i h� � 2 i 1 / 2 h i 1 / 2 h k x i k 4 i 1 / 2 h i 1 / 2 � � � i u 2 � x i x 0 u 4 u 4 < ∞ , � x i x 0 E � E E � E E (4) � i i i i where the first inequality is from the Cauchy-Schwarz inequality, the second inequality is from the Schwarz matrix inequality, a and the last inequality is from Assumption OLS.5. So by the CLT, n 1 ∑ d p n x i u i � ! N ( 0 , Ω ) . i = 1 Given that n � 1 ∑ n p i = 1 x i x 0 � ! Q , i �b � p d ! Q � 1 N ( 0 , Ω ) = N ( 0 , V ) n β � β � by Slutsky’s theorem. a Schwarz matrix inequality: For any random m � n matrices X and Y , k X 0 Y k � k X kk Y k . This is a special form of the Cauchy-Schwarz inequality, where the inner product is defined as h X , Y i = k X 0 Y k . In the homoskedastic model, V reduces to V 0 = σ 2 Q � 1 . We call V 0 the homoskedastic covariance matrix . Ping Yu (HKU) Large-Sample 10 / 63

  12. Asymptotics for the LSE Partitioned Formula of V 0 Sometimes, to state the asymptotic distribution of part of b β as in the residual regression, we partition Q and Ω as � Q 11 � � Ω 11 � Q 12 Ω 12 Q = , Ω = . Q 21 Q 22 Ω 21 Ω 22 Recall from the proof of the FWL theorem, ! Q � 1 � Q � 1 11 . 2 Q 12 Q � 1 Q � 1 = 11 . 2 22 , � Q � 1 22 . 1 Q 21 Q � 1 Q � 1 11 22 . 1 where Q 11 . 2 = Q 11 � Q 12 Q � 1 22 Q 21 and Q 22 . 1 = Q 22 � Q 21 Q � 1 11 Q 12 . �b � = σ 2 Q � 1 Thus when the error is homoskedastic, n � AVar β 1 11 . 2 , and �b � β 1 , b = � σ 2 Q � 1 11 . 2 Q 12 Q � 1 n � ACov β 2 22 . We can also derive the general formulas in the heteroskedastic case, but these formulas are not easily interpretable and so less useful. Ping Yu (HKU) Large-Sample 11 / 63

  13. Asymptotics for the LSE LSE as a MoM Estimator The LSE is a MoM estimator, and the moment conditions are E [ x u ] = 0 with u = y � x 0 β . The sample analog is the normal equation n � � = 0 , 1 ∑ y i � x 0 x i i β n i = 1 the solution of which is exactly the LSE. h i � � = � Q , and Ω = E x i x 0 x i x 0 i u 2 M = � E , so i i �b � � 0 , Q � 1 Ω Q � 1 � p d n β � β ! N = N ( 0 , V ) . � Note that the asymptotic variance V takes the sandwich form. The larger the � � x i x 0 E , the smaller the V . i Although the LSE is a MoM estimator, it is a special MoM estimator because it can be treated as a "projection" estimator. Ping Yu (HKU) Large-Sample 12 / 63

  14. Asymptotics for the LSE Intuition Consider a simple linear regression model y i = β x i + u i , where E [ x i ] is normalized to be 0. From introductory econometrics courses, n ∑ x i y i d Cov ( x , y ) b i = 1 β = = , n d Var ( x ) ∑ x 2 i i = 1 and under homoskedasticity, �b � σ 2 AVar β = nVar ( x ) . �b � � � � � � ∂ E [ xu ] β So the larger the Var ( x ) , the smaller the AVar . Actually, Var ( x ) = � . ∂β Ping Yu (HKU) Large-Sample 13 / 63

  15. Asymptotics for the LSE Asymptotics for the Weighted Least Squares (WLS) Estimator The WLS estimator is a special GLS estimator with a diagonal weight matrix. Recall that b β GLS = ( X 0 WX ) � 1 X 0 Wy , which reduces to ! � 1 ! n n ∑ ∑ b w i x i x 0 β WLS = w i x i y i i i = 1 i = 1 when W = diag f w 1 , ��� , w n g . Note that this estimator is a MoM estimator under the moment condition (check!) E [ w i x i u i ] = 0 , so �b � p d n β WLS � β ! N ( 0 , V W ) , � h i � � � 1 E � � � 1 . w i x i x 0 w 2 i x i x 0 i u 2 w i x i x 0 where V W = E E i i i Ping Yu (HKU) Large-Sample 14 / 63

  16. Covariance Matrix Estimators Covariance Matrix Estimators Ping Yu (HKU) Large-Sample 15 / 63

Recommend


More recommend