Least Squares Estimation- Large-Sample Properties Ping Yu School of - PowerPoint PPT Presentation

Least Squares Estimation- Large-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Large-Sample 1 / 63

Asymptotics for the LSE 1 Covariance Matrix Estimators 2 Functions of Parameters 3 The t Test 4 p -Value 5 Confidence Interval 6 The Wald Test 7 Confidence Region Problems with Tests of Nonlinear Hypotheses 8 Test Consistency 9 Asymptotic Local Power 10 Ping Yu (HKU) Large-Sample 2 / 63

Introduction � β , σ 2 ( X 0 X ) � 1 � If u j x � N ( 0 , σ 2 ) , we have shown that b β j X � N . In general the distribution of u j x is unknown. Even if it is known, the unconditional distribution of b β is hard to derive since b β = ( X 0 X ) � 1 X 0 y is a complicated function of f x i g n i = 1 . The asymptotic (or large sample) method approximates (unconditional) sampling distributions based on the limiting experiment that the sample size n tends to infinity. It does not require any assumption on the distribution of u j x , and only some moments restrictions are imposed. Three steps: consistency, asymptotic normality and estimation of the covariance matrix. Ping Yu (HKU) Large-Sample 2 / 63

Asymptotics for the LSE Asymptotics for the LSE Ping Yu (HKU) Large-Sample 3 / 63

Asymptotics for the LSE Consistency Express b β as β = ( X 0 X ) � 1 X 0 y = ( X 0 X ) � 1 X 0 ( X β + u ) = β +( X 0 X ) � 1 X 0 u . b (1) To show b β is consistent, we impose the following additional assumptions. Assumption OLS.1 0 : rank ( E [ xx 0 ]) = k . Assumption OLS.2 0 : y = x 0 β + u with E [ x u ] = 0 . h k x k 2 i Assumption OLS.1 0 implicitly assumes that E < ∞ . Assumption OLS.1 0 is the large-sample counterpart of Assumption OLS.1. Assumption OLS.2 0 is weaker than Assumption OLS.2. Ping Yu (HKU) Large-Sample 4 / 63

Asymptotics for the LSE Theorem p Under Assumptions OLS.0, OLS.1 0 , OLS.2 0 and OLS.3, b β ! β . � Proof. p p From (1), to show b ! β , we need only to show that ( X 0 X ) � 1 X 0 u β � � ! 0. Note that ! � 1 ! n n 1 1 ∑ ∑ ( X 0 X ) � 1 X 0 u x i x 0 = x i u i i n n i = 1 i = 1 ! n n 1 i , 1 ∑ ∑ p x i x 0 ! E [ x i x 0 i ] � 1 E [ x i u i ] = 0 . = g x i u i � n n i = 1 i = 1 Here, the convergence in probability is from (I) the WLLN which implies n n 1 i ] and 1 ∑ ∑ p p x i x 0 ! E [ x i x 0 � x i u i � ! E [ x i u i ] ; (2) i n n i = 1 i = 1 � � (II) the fact that g ( A , b ) = A � 1 b is a continuous function at E [ x i x 0 i ] , E [ x i u i ] . The last equality is from Assumption OLS.2 0 . Ping Yu (HKU) Large-Sample 5 / 63

Asymptotics for the LSE Proof. [Proof continue] (I) To apply the WLLN, we require (i) x i x 0 i and x i u i are i.i.d., which is implied by Assumption OLS.0 and that functions of i.i.d. data are also i.i.d.; (ii) h k x k 2 i < ∞ (OLS.1 0 ) and E [ k x u k ] < ∞ . E [ k x u k ] < ∞ is implied by the E Cauchy-Schwarz inequality, a h k x k 2 i 1 / 2 h j u j 2 i 1 / 2 E [ k x u k ] � E E , which is finite by Assumption OLS.1 0 and OLS.3. (II) To guarantee A � 1 b to be a � � i ] � 1 exists which E [ x i x 0 , we must assume that E [ x i x 0 continuous function at i ] , E [ x i u i ] is implied by Assumption OLS.1 0 . b h k X k 2 i 1 / 2 h k Y k 2 i 1 / 2 a Cauchy-Schwarz inequality: For any random m � n matrices X and Y , E [ k X 0 Y k ] � E E , where the inner product is defined as h X , Y i = E [ k X 0 Y k ] , and for a m � n matrix A , � � 1 / 2 = [ trace ( A 0 A )] 1 / 2 . ∑ m i = 1 ∑ n j = 1 a 2 k A k = ij i ] � 1 = E [ x 2 i ] � 1 is the reciprocal of E [ x 2 b If x i 2 R , E [ x i x 0 i ] which is a continuous function of E [ x 2 i ] only if E [ x 2 i ] 6 = 0. Ping Yu (HKU) Large-Sample 6 / 63

Asymptotics for the LSE σ 2 and s 2 Consistency of b Theorem p p σ 2 ! σ 2 and s 2 ! σ 2 . Under the assumptions of Theorem 1, b � � Proof. Note that i b y i � x 0 b u i = β i b u i + x 0 i β � x 0 = β �b � u i � x 0 = β � β . i Thus �b � �b � 0 �b � u 2 i = u 2 b i � 2 u i x 0 x i x 0 β � β + β � β β � β (3) i i and n σ 2 = 1 ∑ u 2 b b i n i = 1 Ping Yu (HKU) Large-Sample 7 / 63

Asymptotics for the LSE Proof. [Proof continue] !�b !�b � �b � 0 � n n n 1 1 1 ∑ ∑ ∑ u 2 u i x 0 x i x 0 = i � 2 β � β + β � β β � β i i n n n i = 1 i = 1 i = 1 p ! σ 2 , � where the last line uses the WLLN, (2), Theorem 1 and the CMT. Finally, since n / ( n � k ) ! 1, it follows that n p s 2 = σ 2 ! σ 2 n � k b � by the CMT. One implication of this theorem is that multiple estimators can be consistent for the population parameter. σ 2 and s 2 are unequal in any given application, they are close in value While b when n is very large. Ping Yu (HKU) Large-Sample 8 / 63

Asymptotics for the LSE Asymptotic Normality To study the asymptotic normality of b β , we impose the following additional assumption. h k x k 4 i Assumption OLS.5 : E [ u 4 ] < ∞ and E < ∞ . Theorem Under Assumptions OLS.0, OLS.1 0 , OLS.2 0 , OLS.3 and OLS.5, �b � p d n β � β ! N ( 0 , V ) , � h i � � where V = Q � 1 Ω Q � 1 with Q = E x i x 0 x i x 0 i u 2 and Ω = E . i i Proof. From (1), ! � 1 ! �b � n n p 1 1 ∑ ∑ x i x 0 n β � β = p n x i u i . i n i = 1 i = 1 Ping Yu (HKU) Large-Sample 9 / 63

Asymptotics for the LSE Proof. [Proof continue] Note first that h� � i h� � 2 i 1 / 2 h i 1 / 2 h k x i k 4 i 1 / 2 h i 1 / 2 � � � i u 2 � x i x 0 u 4 u 4 < ∞ , � x i x 0 E � E E � E E (4) � i i i i where the first inequality is from the Cauchy-Schwarz inequality, the second inequality is from the Schwarz matrix inequality, a and the last inequality is from Assumption OLS.5. So by the CLT, n 1 ∑ d p n x i u i � ! N ( 0 , Ω ) . i = 1 Given that n � 1 ∑ n p i = 1 x i x 0 � ! Q , i �b � p d ! Q � 1 N ( 0 , Ω ) = N ( 0 , V ) n β � β � by Slutsky’s theorem. a Schwarz matrix inequality: For any random m � n matrices X and Y , k X 0 Y k � k X kk Y k . This is a special form of the Cauchy-Schwarz inequality, where the inner product is defined as h X , Y i = k X 0 Y k . In the homoskedastic model, V reduces to V 0 = σ 2 Q � 1 . We call V 0 the homoskedastic covariance matrix . Ping Yu (HKU) Large-Sample 10 / 63

Asymptotics for the LSE Partitioned Formula of V 0 Sometimes, to state the asymptotic distribution of part of b β as in the residual regression, we partition Q and Ω as � Q 11 � � Ω 11 � Q 12 Ω 12 Q = , Ω = . Q 21 Q 22 Ω 21 Ω 22 Recall from the proof of the FWL theorem, ! Q � 1 � Q � 1 11 . 2 Q 12 Q � 1 Q � 1 = 11 . 2 22 , � Q � 1 22 . 1 Q 21 Q � 1 Q � 1 11 22 . 1 where Q 11 . 2 = Q 11 � Q 12 Q � 1 22 Q 21 and Q 22 . 1 = Q 22 � Q 21 Q � 1 11 Q 12 . �b � = σ 2 Q � 1 Thus when the error is homoskedastic, n � AVar β 1 11 . 2 , and �b � β 1 , b = � σ 2 Q � 1 11 . 2 Q 12 Q � 1 n � ACov β 2 22 . We can also derive the general formulas in the heteroskedastic case, but these formulas are not easily interpretable and so less useful. Ping Yu (HKU) Large-Sample 11 / 63

Asymptotics for the LSE LSE as a MoM Estimator The LSE is a MoM estimator, and the moment conditions are E [ x u ] = 0 with u = y � x 0 β . The sample analog is the normal equation n � � = 0 , 1 ∑ y i � x 0 x i i β n i = 1 the solution of which is exactly the LSE. h i � � = � Q , and Ω = E x i x 0 x i x 0 i u 2 M = � E , so i i �b � � 0 , Q � 1 Ω Q � 1 � p d n β � β ! N = N ( 0 , V ) . � Note that the asymptotic variance V takes the sandwich form. The larger the � � x i x 0 E , the smaller the V . i Although the LSE is a MoM estimator, it is a special MoM estimator because it can be treated as a "projection" estimator. Ping Yu (HKU) Large-Sample 12 / 63

Asymptotics for the LSE Intuition Consider a simple linear regression model y i = β x i + u i , where E [ x i ] is normalized to be 0. From introductory econometrics courses, n ∑ x i y i d Cov ( x , y ) b i = 1 β = = , n d Var ( x ) ∑ x 2 i i = 1 and under homoskedasticity, �b � σ 2 AVar β = nVar ( x ) . �b � � � � � � ∂ E [ xu ] β So the larger the Var ( x ) , the smaller the AVar . Actually, Var ( x ) = � . ∂β Ping Yu (HKU) Large-Sample 13 / 63

Asymptotics for the LSE Asymptotics for the Weighted Least Squares (WLS) Estimator The WLS estimator is a special GLS estimator with a diagonal weight matrix. Recall that b β GLS = ( X 0 WX ) � 1 X 0 Wy , which reduces to ! � 1 ! n n ∑ ∑ b w i x i x 0 β WLS = w i x i y i i i = 1 i = 1 when W = diag f w 1 , �� , w n g . Note that this estimator is a MoM estimator under the moment condition (check!) E [ w i x i u i ] = 0 , so �b � p d n β WLS � β ! N ( 0 , V W ) , � h i � � � 1 E � � � 1 . w i x i x 0 w 2 i x i x 0 i u 2 w i x i x 0 where V W = E E i i i Ping Yu (HKU) Large-Sample 14 / 63

Covariance Matrix Estimators Covariance Matrix Estimators Ping Yu (HKU) Large-Sample 15 / 63

Least Squares Estimation- Large-Sample Properties Ping Yu School of - PowerPoint PPT Presentation

Least Squares Estimation- Large-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Large-Sample 1 / 63 Asymptotics for the LSE 1 Covariance Matrix Estimators 2 Functions of Parameters 3

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

Least Squares Estimation, Filtering, and Prediction Motivation If the second-order statistics

2 Y X Not linear in variables 0 1 Y X 1 Not linear in

EC3062 ECONOMETRICS THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression

COMS 4721: Machine Learning for Data Science Lecture 3, 1/24/2017 Prof. John Paisley Department

Ten Years of Implementation and Experience Kirk Glerum , Kinshuman Kinshumann , Steve Greenberg ,

Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University

Comparison of Bayesian and Frequentisot Inference 18.05 Spring 2014 Jeremy Orloff and Jonathan

CrIMSS Error Modeling with ATMS Proxy Data Bill Blackwell, Laura Jairam, Vince Leslie, Michael

A National Web Conference on the Purpose and Demonstration of the Health IT Hazard Manager and