OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference Calculations x ) 2 x i y i x i − ¯ x y i − ¯ y ( x i − ¯ x )( y i − ¯ y ) ( x i − ¯ 1 1 ? ? ? ? 2 5 ? ? ? ? 3 3 ? ? ? ? 4 6 ? ? ? ? 5 2 ? ? ? ? 6 7 ? ? ? ?
OLS Goodness-of-Fit Inference Intercept ˆ β 0 Simple formula: ˆ y − ˆ β 0 = ¯ β 1 ¯ x
OLS Goodness-of-Fit Inference Intercept ˆ β 0 Simple formula: ˆ y − ˆ β 0 = ¯ β 1 ¯ x Intuition: OLS fit always runs through point (¯ x , ¯ y )
OLS Goodness-of-Fit Inference Intercept ˆ β 0 Simple formula: ˆ y − ˆ β 0 = ¯ β 1 ¯ x Intuition: OLS fit always runs through point (¯ x , ¯ y ) Ex.: ˆ β 0 = 4 − 0 . 6857 ∗ 3 . 5 = 1 . 6
OLS Goodness-of-Fit Inference Intercept ˆ β 0 Simple formula: ˆ y − ˆ β 0 = ¯ β 1 ¯ x Intuition: OLS fit always runs through point (¯ x , ¯ y ) Ex.: ˆ β 0 = 4 − 0 . 6857 ∗ 3 . 5 = 1 . 6 ˆ y = 1 . 6 + 0 . 6857 ˆ x
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference Ways of Thinking About OLS 1 Estimating Unit-level Causal Effect 2 Ratio of Cov ( X , Y ) and Var ( X )
OLS Goodness-of-Fit Inference Ways of Thinking About OLS 1 Estimating Unit-level Causal Effect 2 Ratio of Cov ( X , Y ) and Var ( X ) 3 Minimizing residual sum of squares (SSR)
OLS Goodness-of-Fit Inference OLS Minimizes SSR � n y ) 2 Total Sum of Squares (SST): i = 1 ( y i − ¯ We can partition SST into two parts (ANOVA): Explained Sum of Squares (SSE) Residual Sum of Squares (SSR) SST = SSE + SSR OLS is the line with the lowest SSR
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 ¯ y 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 y ¯ 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference ¯ x y 7 6 5 ¯ y 4 3 2 1 x 0 1 2 3 4 5 6 7
OLS Goodness-of-Fit Inference Questions about OLS calculations?
OLS Goodness-of-Fit Inference Are Our Estimates Any Good? Yes, if: 1 Works mathematically 2 Causally valid theory 3 Linear relationship between X and Y 4 X is measured without error 5 No missing data (or MCAR; see Lecture 5) 6 No confounding
OLS Goodness-of-Fit Inference Linear Relationship If linear, no problems If non-linear, we need to transform Power terms (e.g., x 2 , x 3 ) log (e.g., log ( x ) ) Other transformations If categorical: convert to set of indicators Multivariate interactions (next week)
OLS Goodness-of-Fit Inference Coefficient Interpretation Activity Four types of variables: 1 Indicator (0,1) 2 Categorical 3 Ordinal 4 Interval How do we interpret a coefficient on each of these types of variables?
OLS Goodness-of-Fit Inference Notes on Interpretation Effect β 1 is constant across values of x
OLS Goodness-of-Fit Inference Notes on Interpretation Effect β 1 is constant across values of x That is not true when there are: Interaction terms (next week) Nonlinear transformations (e.g., x 2 ) Nonlinear regression models (e.g., logit/probit)
OLS Goodness-of-Fit Inference Notes on Interpretation Effect β 1 is constant across values of x That is not true when there are: Interaction terms (next week) Nonlinear transformations (e.g., x 2 ) Nonlinear regression models (e.g., logit/probit) Interpretations are sample-level Sample representativeness determines generalizability
OLS Goodness-of-Fit Inference Notes on Interpretation Effect β 1 is constant across values of x That is not true when there are: Interaction terms (next week) Nonlinear transformations (e.g., x 2 ) Nonlinear regression models (e.g., logit/probit) Interpretations are sample-level Sample representativeness determines generalizability Remember uncertainty These are estimates , not population parameters
OLS Goodness-of-Fit Inference Measurement Error in Regressor(s) We want effect of x , but we observe x ∗ , where x = x ∗ + w : y = β 0 + β 1 x ∗ + ǫ = β 0 + β 1 ( x − w ) + ǫ = β 0 + β 1 x + ( ǫ − β 1 w ) = β 0 + β 1 x + v
OLS Goodness-of-Fit Inference Measurement Error in Regressor(s) Produces attenuation : as measurement error increases, β 1 → 0 Our coefficients fit the observed data But they are biased estimates of our population equation This applies to all ˆ β in a multivariate regression Direction of bias is unknown
OLS Goodness-of-Fit Inference Measurement Error in Y Not necessarily a problem If random (i.e., uncorrelated with x ), it costs us precision If systematic , who knows?! If censored , see Lectures 11 and/or 12
OLS Goodness-of-Fit Inference Missing Data Missing data can be a big problem We will discuss it in Lecture 5
OLS Goodness-of-Fit Inference Confounding (Selection Bias) If x is not randomly assigned, potential outcomes are not independent of x Other factors explain why a unit i received their particular value x i In matching, we obtain this conditional independence by comparing units that are identical on all confounding variables
OLS Goodness-of-Fit Inference Omitted Variables E [ Y i | X i = 1 ] − E [ Y i | X i = 0 ] = � �� � Naive Effect E [ Y 1 i | X i = 1 ] − E [ Y 0 i | X i = 1 ] + E [ Y 0 i | X i = 1 ] − E [ Y 0 i | X i = 0 ] � �� � � �� � Treatment Effect on Treated (ATT) Selection Bias
OLS Goodness-of-Fit Inference Z A B X D Y C
OLS Goodness-of-Fit Inference Omitted Variable Bias We want to estimate: Y = β 0 + β 1 X + β 2 Z + ǫ We actually estimate: y = ˜ β 0 + ˜ ˜ β 1 x + ǫ = ˜ β 0 + ˜ β 1 x + ( 0 ∗ z ) + ǫ = ˜ β 0 + ˜ β 1 x + ν Bias: ˜ β 1 = ˆ β 1 + ˆ β 2 ˜ z = ˜ δ 0 + ˜ δ 1 , where ˜ δ 1 x
OLS Goodness-of-Fit Inference Size and Direction of Bias Bias: ˜ β 1 = ˆ β 1 + ˆ β 2 ˜ z = ˜ δ 0 + ˜ δ 1 , where ˜ δ 1 x Corr ( x , z ) < 0 Corr ( x , z ) > 0 β 2 < 0 Positive Negative β 2 > 0 Negative Positive
OLS Goodness-of-Fit Inference Aside: Three Meanings of “Endogeneity” Formally endogeneity is when Cov ( X , ǫ ) � = 0 1 Measurement error in regressors 2 Omitted variables associated with included regressors “Specification error” Confounding 3 Lack of temporal precedence
OLS Goodness-of-Fit Inference Example: Englebert What is his research question? What is his theory? What does the graph look like? What is his analysis?
OLS Goodness-of-Fit Inference Common Conditioning Strategies
OLS Goodness-of-Fit Inference Common Conditioning Strategies 1 Condition on nothing (“naive effect”)
OLS Goodness-of-Fit Inference Common Conditioning Strategies 1 Condition on nothing (“naive effect”) 2 Condition on some variables
OLS Goodness-of-Fit Inference Common Conditioning Strategies 1 Condition on nothing (“naive effect”) 2 Condition on some variables 3 Condition on all observables
OLS Goodness-of-Fit Inference Common Conditioning Strategies 1 Condition on nothing (“naive effect”) 2 Condition on some variables 3 Condition on all observables Which of these are good strategies?
OLS Goodness-of-Fit Inference What goes in our regression? Use theory to build causal models Often, a causal graph helps Some guidance:
OLS Goodness-of-Fit Inference What goes in our regression? Use theory to build causal models Often, a causal graph helps Some guidance: Include confounding variables
OLS Goodness-of-Fit Inference Z A B X D Y C
OLS Goodness-of-Fit Inference What goes in our regression? Use theory to build causal models Often, a causal graph helps Some guidance: Include confounding variables
Recommend
More recommend