Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 1 Linear regression Linear regression: Outline Classical regression estimators 1 Classical outlier diagnostics 2 Regression M-estimators 3 The LTS estimator 4 Outlier detection 5 Regression S-estimators and MM-estimators 6 Regression with categorical predictors 7 Software 8 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 2
Linear regression Classical estimators The linear regression model The linear regression model says: y i = β 0 + β 1 x i 1 + . . . + β p x ip + ε i = x ′ i β + ε i with i.i.d. errors ε i ∼ N (0 , σ 2 ) , x i = (1 , x i 1 , . . . , x ip ) ′ and β = ( β 0 , β 1 , . . . , β p ) ′ . Denote the n × ( p + 1) matrix containing the predictors x i as X = ( x 1 , . . . , x n ) ′ , the vector of responses y = ( y 1 , . . . , y n ) ′ and the error vector ε = ( ε 1 , . . . , ε n ) ′ . Then: y = X β + ε Any regression estimate ˆ y = X ˆ β yields fitted values ˆ β and residuals r i = r i (ˆ β ) = y i − ˆ y i . Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 3 Linear regression Classical estimators The least squares estimator Least squares estimator n ˆ � r 2 β LS = argmin i ( β ) β i =1 If X has full rank, then the solution is unique and given by ˆ β LS = ( X ′ X ) − 1 X ′ y The usual unbiased estimator of the error variance is n 1 � i (ˆ σ 2 r 2 ˆ LS = β LS ) n − p − 1 i =1 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 4
Linear regression Classical estimators Outliers in regression Different types of outliers: vertical outlier good leverage point • • y • • • regular data • • • • • • • • • • • • • • • • • • •• bad leverage point • • • • x Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 5 Linear regression Classical estimators Outliers in regression regular observations : internal x i and well-fitting y i 1 vertical outliers : internal x i and non-fitting y i 2 good leverage points : outlying x i and well-fitting y i 3 bad leverage points : outlying x i and non-fitting y i 4 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 6
Linear regression Classical estimators Effect of vertical outliers Example: Telephone data set, which contains the number of international telephone calls (in tens of millions) from Belgium in the years 1950-1973. ● 20 ● ● 15 ● ● ● Calls 10 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 55 60 65 70 Year Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 7 Linear regression Classical estimators Effect of vertical outliers LS fit with and without the outliers: ● 20 ● ● 15 ● ● ● Calls 10 LS (all) 5 ● ● ● ● ● ● LS (reduced) ● ● ● ● ● ● ● ● ● ● ● ● 0 50 55 60 65 70 Year Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 8
Linear regression Classical estimators Effect of bad leverage points Stars data set: Hertzsprung-Russell diagram of the star cluster CYG OB1 (47 stars). Here X is the logarithm of a star’s surface temperature, and Y is the logarithm of its light intensity. ● 6.0 ● ● ● ● ● ● ● ● 5.5 ● ● ● ● ● ● log.light ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● ● 4.5 ● ● ● ● ● ● ● ● ● ● 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 log.Te Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 9 Linear regression Classical estimators Effect of bad leverage points LS fit with and without the giant stars: 34 ● 30 6.0 ● 20 ● 11 ● ● ● 9 ● ● ● 5.5 ● ● ● ● ● ● log.light ● LS (all) ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● 7 ● ● ● ● ● 4.5 ● LS (reduced) ● ● ● ● ● ● ● ● ● 14 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 log.Te Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 10
Linear regression Classical outlier diagnostics Classical outlier diagnostics Classical regression estimators 1 Classical outlier diagnostics 2 Regression M-estimators 3 The LTS estimator 4 Outlier detection 5 Regression S-estimators and MM-estimators 6 Regression with categorical predictors 7 Software 8 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 11 Linear regression Classical outlier diagnostics Standardized residuals This residual plot shows the standardized LS residuals r i (ˆ β LS ) ˆ σ LS Telephone data Stars data 3 3 ● 2 2 ● ● ● ● ● Standardized LS residual Standardized LS residual ● ● ● ● ● 1 ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 ● ● −3 −3 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 12
Linear regression Classical outlier diagnostics Studentized residuals Residual plot of the studentized LS residuals given by: remove observation ( x i , y i ) from the data set 1 ( i ) compute ˆ LS on the remaining data β 2 ( i ) y ( i ) i ˆ = x ′ compute the fitted value of y i given by ˆ β 3 i LS compute the “deleted residual”: 4 y ( i ) d i = y i − ˆ i the studentized residuals are r ∗ i = d i /s ( d j ) where s ( d i ) is the standard 5 deviation of all d j . The studentized residuals can be computed without refitting the model each time an observation is deleted. Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 13 Linear regression Classical outlier diagnostics Studentized residuals Telephone data Stars data 3 3 ● 2 2 ● ● ● ● ● ● ● Studentized LS residual Studentized LS residual ● ● ● ● 1 ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 ● ● −3 −3 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 14
Linear regression Classical outlier diagnostics Hat matrix The hat matrix H = X ( X ′ X ) − 1 X ′ transforms the observed response vector y into its LS estimate: ˆ y = H y or equivalently y i = h i 1 y 1 + h i 2 y 2 + . . . + h in y n . ˆ The element h ij of H thus measures the effect of the j th observation on ˆ y i , and the diagonal element h ii the effect of the i th observation on its own prediction. Since it holds that average ( h ii ) = ( p + 1) /n and 0 � h ii � 1 , it is sometimes suggested to call observation i a leverage point iff h ii > 2( p + 1) . n Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 15 Linear regression Classical outlier diagnostics Hat matrix Telephone data Stars data 0.20 0.20 ● ● ● ● 0.15 ● ● ● ● 0.15 Hat matrix diagonal Hat matrix diagonal ● ● 0.10 ● ● 0.10 ● ● ● ● ● ● ● ● ● 0.05 ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 16
Linear regression Classical outlier diagnostics Hat matrix It can be shown that there is a one-to-one correspondence between the squared Mahalanobis distance for object i and its h ii : 1 i + 1 n − 1 MD 2 h ii = n with � x n ) ′ S − 1 MD i = MD ( x i ) = ( x i − ¯ n ( x i − ¯ x n ) . From this expression we see that h ii measures the distance of x i to the center of the data points in the x -space. On the other hand, it shows that the h ii diagnostic is based on nonrobust estimates! Indeed, it often masks outlying x i . Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 17 Linear regression Classical outlier diagnostics Cook’s distance Cook’s distance D i measures the influence of the i th case on all n fitted values: y ( i ) ) ′ (ˆ y ( i ) ) D i = (ˆ y − ˆ y − ˆ . σ 2 ( p + 1)ˆ LS It is also equivalent to ( i ) ) ′ ( X ′ X )(ˆ ( i ) ) D i = (ˆ β − ˆ β − ˆ β β . σ 2 ( p + 1)ˆ LS In this sense D i measures the influence of the i th case on the regression coefficients. Often the cutoff values 1 or 4 /n are suggested. Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 18
Recommend
More recommend