PS 405 Week 8 Section: Non-Linear Transformations, Outliers, and - PowerPoint PPT Presentation

PS 405 – Week 8 Section: Non-Linear Transformations, Outliers, and Heteroskedasticity D.J. Flynn March 4, 2014

Announcements 1. Yanna reviewed everyone’s dataset for the final and they’re fine. Just make sure DV is (quasi-)continuous. 2. Today’s plan: briefly review transformations (Yanna is talking about them Thursday) and outliers (Jay does entire week on outliers/missing data). 3. Questions on the final problem set or anything else.

What the linearity assumption does (and does not) mean ◮ first assumption of OLS: linearity. ◮ formally, we say Y is a linear function of the data: ˆ Y i = X i β ◮ parameters/coefficients are linear ◮ we can transform the IVs and DV to improve our model (e.g., remove heteroskedasticity), but parameters must remain linear in order to use OLS ◮ lots of models eschew linearity. An example is the logit model: 1 ˆ Y i = 1 + e − X i β

Acceptable transformations 1 Y = α + β X 2 Y = α + β ( ln ( X )) ln ( Y ) = α + β X ln ( Y ) = α + β ln ( X ) 1 More on this in 407.

Unacceptable transformation Y = ln ( α + β X )

Transforming data ◮ key point: linear transformations change units of measure (e.g., ounces to pounds) but don’t change the distribution. Re-coding is a common example. So if right-skewed data are transformed linearly, the new data will still have right skew. ◮ same goes for relationships between 2+ variables: linear transformations won’t change anything ◮ non-linear transformations will change the distribution. Sometimes we use logs to make linear regression more appropriate. ◮ Example: Jacobson (1990): ...it is clear that linear models of campaign spending are inadequate becase diminishing returns must apply to campaign spending. Green and Krasno recognize this and offer an alternative model which uses log transformations...

Reasons for log transformations 2 ◮ make relationships more linear (Jacobson 1990) ◮ reduce heteroskedasticity or skew ◮ hard sciences do this for certain natural patterns (e.g., exponential processes) ◮ easier interpretation (%s) ◮ key point: transformations change interpretation of coefficients (e.g., in linear-log models, divide coefficient on logged variable by 100) 2 Yanna will talk more about logs.

Outliers Determining whether an outlier is influential: Influence = Leverage*Discrepancy, where leverage is the distance of a given x i from the center of a distribution (mean or centroid) and discrepancy is the distance of Y i from regression line when fitted without observation. In the end, we care about influence = is there one (or two or three) observations that are changing our entire estimated effect?

Quantifying influence 1. DFBETA 2. Cook’s Distance Subjective standard. Most say if either stat is > 1, then problematic.

You’ll need these... library(car) install.packages("nnet") library(nnet) install.packages("MASS") library(MASS) install.packages("stats") library(stats) install.packages("zoo") library(zoo)

DFBETA A measure of how much a coefficient changes with observation included vs. excluded, scaled by the standard error with the observation deleted. From Yanna’s lecture: a<-c(4, 3, 2, 1, 5, 2, 3, 4, 5, 1, 3, 2, 1, 1500) b<-c(1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1) c<-c(10, 11, 25, 20, 18, 17, 10, 11, 12, 33, 38, 12, 14, 17) plot(a) model<-lm(c ∼ a+b) dfbetasPlots(model) influence.measures(model) dfbetas(model)

dfbetas Plots 0.5 300 0.0 200 a b -0.5 100 -1.0 0 2 4 6 8 12 2 4 6 8 12 Index Index

Cook’s Distance Similar idea as DFBETA. Cook’s Distance quantifies how much coefficient moves within range of possible true values as a result of excluding a given observation. plot(model, which=4, cook.levels=cutoff)

Cook's distance 14 50000 40000 Cook's distance 30000 20000 10000 10 11 0 2 4 6 8 10 12 14 Obs. number lm(c ~ a + b)

Testing for heteroskedasticity Recap: heteroskedasticity is non-constant error variance = loss of efficiency Tests: 1. Breusch-Pagan/Cook-Weisbeg (“BP test”) 2. White’s test

BP Test ◮ we assume that error variances are equal (null), test alternative that they are unequal ◮ idea: regress squared residuals on IVs, see if they predict size of resids ◮ distribution is χ 2 , so critical value depends on degrees of freedom (it will tell you significance) ◮ some simulated heteroskedastic data is on BB if you want to practice ◮ command is easy: library(lmtest) bptest(model)

White’s test ◮ similar idea as BP Test, but instead regresses squared residuals on IVs, squared versions of IVs, and cross-products of regressors ◮ again, distribution is χ 2 , so critical value depends on degrees of freedom (it will tell you significance) ◮ there’s now a package for running White’s test: install.packages("bstats") library(bstats) white.test(model)

Questions?

PS 405 Week 8 Section: Non-Linear Transformations, Outliers, and - PowerPoint PPT Presentation

PS 405 Week 8 Section: Non-Linear Transformations, Outliers, and Heteroskedasticity D.J. Flynn March 4, 2014 Announcements 1. Yanna reviewed everyones dataset for the final and theyre fine. Just make sure DV is (quasi-)continuous.

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

I-405 Peak-Use Shoulder Lane Project Overview Barrett Hanson, P.E. Design Manager WSDOT

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Transformations and Matrices Transformations I Transformations are functions Matrices

Detecting Outliers under Detecting Outliers . . . What We Plan To Do Interval Uncertainty:

Correspondence Analysis and Moderate Outliers Anna Langovaya, Sonja Kuhnt TU Dortmund Ferbruar

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Transformations & Transformations & Coordinate Systems Coordinate Systems CSCD 472?

Review Transformations Scale Translate Rotate Combining Transformations

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Week 7: Regression Issues Standardized and Studentized residuals, outliers and leverage,

Diagnostics Internally studentized residuals, PRESS residuals or externally studentized

Identification in Macroeconomics by Emi Nakamura and Jn Steinsson Journal of Economic

The Summit and the decline and fall of internationalism Ed Conway LSE 100 lecture October 15

MA162: Finite mathematics . Jack Schmidt University of Kentucky September 24, 2012 Schedule:

A metalearning study for robust nonlinear regression Jan Kalina & Petra Vidnerov a The

Brownian Motion Variations and Brownian Motion with drift Today: Various variations of

Brownian Motion Recall the random walk { S n } n 0 under a probability measure P : S 0 = 0,

E F s , s t and, after some rearranging, 1 2 ( t s ) 2 . e ( W t W

PS 405 Week 8 Section: Non-Linear Transformations, Outliers, and - PowerPoint PPT Presentation

PS 405 Week 8 Section: Non-Linear Transformations, Outliers, and Heteroskedasticity D.J. Flynn March 4, 2014 Announcements 1. Yanna reviewed everyones dataset for the final and theyre fine. Just make sure DV is (quasi-)continuous.

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

I-405 Peak-Use Shoulder Lane Project Overview Barrett Hanson, P.E. Design Manager WSDOT

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Transformations and Matrices Transformations I Transformations are functions Matrices

Detecting Outliers under Detecting Outliers . . . What We Plan To Do Interval Uncertainty:

Correspondence Analysis and Moderate Outliers Anna Langovaya, Sonja Kuhnt TU Dortmund Ferbruar

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Transformations &amp; Transformations &amp; Coordinate Systems Coordinate Systems CSCD 472?

Review Transformations Scale Translate Rotate Combining Transformations

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Week 7: Regression Issues Standardized and Studentized residuals, outliers and leverage,

Diagnostics Internally studentized residuals, PRESS residuals or externally studentized

Identification in Macroeconomics by Emi Nakamura and Jn Steinsson Journal of Economic

The Summit and the decline and fall of internationalism Ed Conway LSE 100 lecture October 15

MA162: Finite mathematics . Jack Schmidt University of Kentucky September 24, 2012 Schedule:

A metalearning study for robust nonlinear regression Jan Kalina &amp; Petra Vidnerov a The

Brownian Motion Variations and Brownian Motion with drift Today: Various variations of

Brownian Motion Recall the random walk { S n } n 0 under a probability measure P : S 0 = 0,

E F s , s t and, after some rearranging, 1 2 ( t s ) 2 . e ( W t W

Transformations & Transformations & Coordinate Systems Coordinate Systems CSCD 472?

A metalearning study for robust nonlinear regression Jan Kalina & Petra Vidnerov a The