Regression Diagnostics and Troubleshooting Jeffrey Arnold May 3, - PowerPoint PPT Presentation

Regression Diagnostics and Troubleshooting Jeffrey Arnold May 3, 2016

Question How do regression diagnostics fit into analysis?

Steps in Regression ◮ For any model 1. Run regression 2. Check for departures from CLR assumptions 3. Attempt to fix those problems ◮ Additionally, compare between models based on purpose, fit, and diagnostics

OLS assumptions 1. Linearity y = X β + ε 2. Iid sample y i , x ′ i ) iid sample 3. No perfect collinearity X has full rank 4. Zero conditional mean E ( ε | X ) =) 5. Homoskedasticity Var ( ε | X ) = σ 2 I N 6. Normality ε | X ∼ N (0 , σ 2 I N ) ◮ 1-4: unbiased and consistent β ◮ 1-5: asymptotic inference, BLUE ◮ 1-6: small sample inference

OLS Problems 1. Perfect collinearity: Cannot estimate OLS 2. Non-linearity: Biased β 3. Omitted variable bias: Biased β . 4. Correlated errors: Wrong SEs 5. Heteroskedasticity: Wrong SEs 6. Non-normality: Wrong SEs - p-values. 7. Outliers: Depends on where they come from

Topics for Today 1. Omitted Variable Bias 2. Measurement Error 3. Non-Normal Errors 4. Missing data

Omitted Variable Bias: Description ◮ The population is Y i = β 0 + β 1 X 1 , i + β 2 X 2 , i + ε i ◮ But we estimate a regression without X 2 y i = ˆ β 0 + ˆ β ( omit ) x 1 , i + ε i 1

Omitted Variable Bias: Problem Coefficient Bias Cov( X 2 , X 1 ) � ˆ β ( omit ) � E = β 1 + β 2 1 Var( X 1 ) Bias Components ◮ β 2 : Effect of omitted variable X 2 on Y Cov( X 2 , X 1 ) Var( X 1 ) : Association between X 2 and X 1 ◮

Omitted Variable Bias: Hueristic Diagnostic ◮ Heuristic: sensitivity of the coefficient to inclusion of controls ◮ If insensitive to inclusion of controls, OVB less plausible ◮ Note: sensitivity of coefficient not p -value. “These controls do not change the coefficient estimates meaningfully, and the stability of the estimates from columns 4 through 7 suggests that controlling for the model and age of the car accounts for most of the relevant selection.” (Lacetera et al. 2012)

Omitted Variable Bias: Diagnosing Statistic ◮ Suppose X and Z observed, and W unobserved in, Y = β 0 + β 1 X + β 2 Z + β 3 W + ε ◮ Statistic to assess importance of OVB ˆ δ = Cov( X , β 3 W ) β C Cov( X , β 2 Z ) = β NC − ˆ ˆ β C ◮ If Z representative of all controls, then large δ implies OVB implausible ◮ Example in Nunn and Wantchekon (2011)

Omitted Variable Bias: Reasoning about Bias If know omitted variable, may be able to reason about its effect Cov( X 1 , X 2 ) Cov( X 2 , Y ) > 0 Cov( X 2 , Y ) = 0 Cov( X 2 , Y ) < 0 > 0 + 0 - 0 0 0 0 < 0 - 0 +

Omitted Variable Bias: Solutions by Design ◮ OVB always a problem with methods relying on selection on observables ◮ Other methods (Matching, propensity scores) may be less model dependent, but still can have OVB ◮ Preference for methods relying on identification in other ways ◮ experiments ◮ instrumental variables ◮ regression discontinuity ◮ fixed effects/diff-in-diff

Measurement Error in X : Description ◮ We want to estimate Y i = β 0 + β 1 X 1 + β 2 X 2 + ǫ ◮ But we estimate Y i = β 0 + β 1 X ∗ 1 + β 2 X 2 + ǫ ◮ Where X ∗ 1 is X 1 with measurement error X ∗ i = X i + δ where E( delta ) = 0, and Var( δ ) = σ δ .

Measurement Error in X : Problem ◮ Similar to OVB ◮ For variable with the measurement error ◮ ˆ β 1 biased towards zero ( attenuation bias ) ◮ For other variables: ◮ ˆ β 2 biased towards OVB bias. ◮ When measurement error high, it’s as if that variable is not controlled for

Measurement error in Y ◮ Population is Y i = β 0 + β 1 X 1 , i + ǫ ◮ But we estimate Y i + δ i = β 0 + β 1 X 1 , i + ε i ◮ β not biased, but larger standard errors Y i = β 0 + β 1 X 1 , i + ( ǫ i + δ i ) where E( ǫ i + δ i ) = 0, and Var( ε i + δ i ) = σ 2 ε + σ 2 δ . ◮ If each δ i has different variances, then heteroskedasticity

Measurement Error: Solutions ◮ If in treatment variable: ◮ get better measure ◮ If in control variables: ◮ include multiple measures. Multicollinearity less problematic than measurement error. ◮ Models for measurement error: Instrumental variables, structural equation models, Bayesian models, multiple imputation.

Non-Normal Errors ◮ Usually not-problematic ◮ Does not bias coefficients ◮ Only affects standard errors, only for small samples ◮ But may indicate ◮ Model mis-specified ◮ E( Y | X ) is not a good summary ◮ Diagnose: QQ-plot of (Studentized) residuals

Missing Data in X Listwise Deletion ◮ Drop row with any missing values in Y or X ◮ Problem: If missingness correlated with X , coefficients biased Multiple Imputation ◮ Predict missing values from non-missing data ◮ Multiple imputation packages: Amelia , mice ◮ Almost always better than listwise deletion

More complicated Missing Data Problems ◮ MNAR: Missing not-at randrom in X . ◮ Values in X do not predict missingness ◮ Need to model the selection process ◮ Truncation or censored dependent variable: specific MLE models

Regression Diagnostics and Troubleshooting Jeffrey Arnold May 3, - PowerPoint PPT Presentation

Regression Diagnostics and Troubleshooting Jeffrey Arnold May 3, 2016 Question How do regression diagnostics fit into analysis? Steps in Regression For any model 1. Run regression 2. Check for departures from CLR assumptions 3. Attempt

TROUBLESHOOTING Performing basic Acronis Backup and Acronis Backup Cloud troubleshooting Acronis

Regression Diagnostics and the Forward Search 1 A. C. Atkinson, London School of Economics

Troubleshooting & Q&A 1 1 SeisComP3 Troubleshooting scrttv Real Time Trace Viewer

Application of Local Influence Diagnostics to the Buckley-James Model Nazrina Aziz 1 and Dong Q

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Bus diagnostics and troubleshooting Andreas Agostin MTL Instruments Pte Ltd Singapore On behalf

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Regression Diagnostics Introduction to Regression 1 Why do we need to do all this? Theory

Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Learning Models from Data with Measurement Error: Tackling Underreporting Roy Adams, Yuelong Ji,

Logit with multiple alternatives Michel Bierlaire Transport and Mobility Laboratory School of

Machine Learning Lecture 5 Support Vector Machines Justin Pearson 1 2020 1

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture14: Logistic regression

Midterm 2 Grade Distribution 30 25 Number of students 20 15 10 5 0 20 25 30 35 40 45

Identification of Wiener-Hammerstein systems with process noise using an Errors-in-Variables

Elana Fertig Jos Aravquia Hong Li Seung-Jong Baek Junjie Liu Brian Hunt Edward Ott Eugenia

Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut fr Numerische

Regression Diagnostics and Troubleshooting Jeffrey Arnold May 3, - PowerPoint PPT Presentation

Regression Diagnostics and Troubleshooting Jeffrey Arnold May 3, 2016 Question How do regression diagnostics fit into analysis? Steps in Regression For any model 1. Run regression 2. Check for departures from CLR assumptions 3. Attempt

TROUBLESHOOTING Performing basic Acronis Backup and Acronis Backup Cloud troubleshooting Acronis

Regression Diagnostics and the Forward Search 1 A. C. Atkinson, London School of Economics

Troubleshooting &amp; Q&amp;A 1 1 SeisComP3 Troubleshooting scrttv Real Time Trace Viewer

Application of Local Influence Diagnostics to the Buckley-James Model Nazrina Aziz 1 and Dong Q

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Bus diagnostics and troubleshooting Andreas Agostin MTL Instruments Pte Ltd Singapore On behalf

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Regression Diagnostics Introduction to Regression 1 Why do we need to do all this? Theory

Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Learning Models from Data with Measurement Error: Tackling Underreporting Roy Adams, Yuelong Ji,

Logit with multiple alternatives Michel Bierlaire Transport and Mobility Laboratory School of

Machine Learning Lecture 5 Support Vector Machines Justin Pearson 1 2020 1

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture14: Logistic regression

Midterm 2 Grade Distribution 30 25 Number of students 20 15 10 5 0 20 25 30 35 40 45

Identification of Wiener-Hammerstein systems with process noise using an Errors-in-Variables

Elana Fertig Jos Aravquia Hong Li Seung-Jong Baek Junjie Liu Brian Hunt Edward Ott Eugenia

Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut fr Numerische

Troubleshooting & Q&A 1 1 SeisComP3 Troubleshooting scrttv Real Time Trace Viewer

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and