ps 405 week 5 section ols regression and its assumptions
play

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. - PowerPoint PPT Presentation

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014 Todays plan Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions Basic set-up Scalar: Y i = 0 + 1 X 1 i +


  1. PS 405 – Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014

  2. Today’s plan Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions

  3. Basic set-up ◮ Scalar: Y i = β 0 + β 1 X 1 i + β 2 X 2 i + ...β K X Ki + ǫ i ◮ Matrix: Y i = X i β + ǫ i ◮ Y is a (quasi-)continuous outcome, Xs are independent variables, ǫ is a residual ◮ We’ll use matrix form and assume X could include k = 1 , 2 , ... variables ◮ Our goal: specify a model (pick Xs) and estimate parameters ( β 0 , β 1 , ...β K ) such that error is minimized

  4. Re-cap from last week Y i = X i β + ǫ i , where i = 1 , 2 , ... N and k = 1 , 2 , ... K . For each term, ◮ vector or matrix? ◮ size? ◮ why are some (not all) terms indexed by i ?

  5. Estimating OLS ◮ Collect data on Y and X. ◮ Estimate the model and obtain parameters: β 0 , β 1 , ...β K . ◮ Make predictions for each observation’s outcome ( ˆ Y ) via linear combination: Suppose our model is: Turnout i = β 0 + β 1 Competitiveness i + β 2 AdSpending i + ǫ i , where Turnout is measured 0-100, Competitiveness is a dummy, and AdSpending is measured 1-5. We estimate the model in R and get these coefficients: β 0 = 11 ,β C = 25, β AS = 6 . 25.

  6. Now we can predict turnout in any election given competitiveness and ad spending data. For a competitive election with lots of spending (5/5), the predicted level of turnout is: ˆ Y i = 11 + 25 ( 1 ) + 6 . 25 ( 5 ) = 67 . 25 % . Suppose true turnout in that election was 71%. Then u i = Y i − ˆ Y i = 71 − 67 . 25 = 3 . 75 % . Recall, OLS estimates parameters such that these errors are minimized over the whole dataset: N ( Y i − ˆ � Y i ) 2 min i = 1

  7. Estimating/Interpreting OLS in R ◮ Practice estimating a model using the USArrests dataset: library(datasets) summary(USArrests) murder.model<-lm(Murder ∼ Assault+Rape+UrbanPop, data=USArrests) summary(murder.model) ◮ Thanks to linearity assumption, we can interpret coefficients as the effect of a one-unit increase in X on Y. ◮ Thus, we MUST know units for X and Y to interpret. ◮ Check out description of variable codings here.

  8. R output Call: lm(formula = Murder ~ Assault + Rape + UrbanPop, data = USArrests) Residuals: Min 1Q Median 3Q Max -4.3990 -1.9127 -0.3444 1.2557 7.4279 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.276639 1.737997 1.885 0.0657 . Assault 0.039777 0.005912 6.729 2.33e-08 *** Rape 0.061399 0.055740 1.102 0.2764 UrbanPop -0.054694 0.027880 -1.962 0.0559 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

  9. Let’s look at fitted values: fitted.values(murder.model) predict.lm(murder.model,interval="confidence") plot(fitted.values(murder.model),USArrests$Murder) Residuals: resid(murder.model) plot(murder.model) Other helpful commands: coef(murder.model) murder.model$coef[1] confint(murder.model) Later: lots of diagnostics for checking assumptions

  10. Key point on interpretation ANOVA = does a factor (regardless of which category you’re in) predict outcome? OLS = does some variable, X, affect outcome relative to baseline (omitted category)? Helpful example: estimating treatment effects in experiments. DV: Policy Support (1-7) EE Treatment − 0.311 (0.263) − 0.609 ∗∗ (0.248) J Treatment − 0.621 ∗∗ (0.254) HA Treatment 5.508 ∗∗∗ (0.186) constant Observations 272 ∗ p < 0.1; ∗∗ p < 0.05; ∗∗∗ p < 0.01

  11. Gauss-Markov Under certain assumptions, OLS is the Best Linear Unbiased Estimator of β . Assumptions 1 1. Linearity 2. Homoskedasticity 3. Error terms are i.i.d 4. Strict Exogeneity 5. Errors are Normally Distributed 6. No (Perfect) Multicollinearity 1 Note: every regression text you read will express/refer to these differently.

  12. Assumption 1: Linearity ◮ Y is a linear function of the data: ˆ Y i = X i β ◮ Typically OK if DV is continuous. ◮ Categorical/limited DVs break linearity and require more advanced (non-linear) models, which you’ll learn in 407. ◮ Common DVs that break linearity: models for binary response (e.g., logit). Notice, function is non-linear: 1 ˆ Y i = 1 + e − X i β

  13. Assumption 2: Homoskedasticity ◮ homoskedasticity: constant error variance = errors approximately the same size for subgroups of data: var ( ǫ | X ) = σ 2 , where σ 2 is some constant. ◮ heteroskedasticity: non-constant error variance = errors differ across subgroups of data ◮ easily testable/fixable (later this quarter)

  14. Assumption 3: Error terms are i.i.d ◮ no correlation between error terms on different observations: E ( ǫ i ∗ ǫ j ) = 0 , i � = j ◮ common violation: autocorrelation ◮ easy fix: use time series models (not simple OLS)

  15. Assumption 4: Strict Exogeneity ◮ Many ways to express. Usually: E ( ǫ i | X i ) = 0 ◮ Jay will write it this way (same idea): X ⊥ ǫ ◮ Xs are determined outside the model, uncorrelated with error term ◮ challenging assumption for political scientists (e.g., democracy/GDP, media choice/political knowledge, etc....) ◮ possible solution: instrumental variables regression (next quarter)

  16. Assumption 5: Errors are Normally Distributed ◮ given your data and model, errors are Normal: ǫ ∼ N ( 0 , σ 2 ) , where σ 2 is some constant. ◮ depends on distribution of your variables and model ◮ easy problem to detect: normal probability plots: plot(murder.model,which=2) ◮ if violated, coefficients are OK, hypothesis testing invalid

  17. Assumption 6: No (Perfect) Multicollinearity ◮ multicollinearity: correlation among independent variables in a model (e.g., ideology, PID) ◮ perfect multicollinearity: two variables perfectly predict one another (e.g., dummies for male, female) = we can’t estimate effect of one relative to other ◮ challenging assumption for political scientists (espec. behavioralists) ◮ What does R do with perfectly multicollinear regressors?......

  18. dep.var<-rnorm(100,10,2) female<-rbinom(100,1,.51) male<-ifelse(female==1,0,1) perf.collin.model<-lm(dep.var ∼ female+male) summary(perf.collin.model) Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0323 0.2757 36.388 <2e-16 *** female -0.2203 0.3861 -0.571 0.57 male NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ Residual standard error: 1.93 on 98 degrees of freedom Multiple R-squared: 0.003313, Adjusted R-squared: -0.006858 F-statistic: 0.3257 on 1 and 98 DF, p-value: 0.5695

  19. Consequences of violating assumptions Note: from Yanna’s lecture (2/6/14)

Recommend


More recommend