Regularized Directions of Maximal Outlyingness Michiel Debruyne - - PowerPoint PPT Presentation

regularized directions of maximal outlyingness
SMART_READER_LITE
LIVE PREVIEW

Regularized Directions of Maximal Outlyingness Michiel Debruyne - - PowerPoint PPT Presentation

Regularized Directions of Maximal Outlyingness Michiel Debruyne Dept. of mathematics and computer science , Universiteit Antwerpen COMPSTAT 2010 August 23, 2010 Motivation Nowadays many robust methods are available to detect outliers in a


slide-1
SLIDE 1

Regularized Directions of Maximal Outlyingness

Michiel Debruyne

  • Dept. of mathematics and computer science, Universiteit Antwerpen

COMPSTAT 2010 August 23, 2010

slide-2
SLIDE 2

Motivation

Nowadays many robust methods are available to detect outliers in a multivariate, possibly high-dimensional data set (e.g. robust covariance estimators, robust PCA methods, . . .). Once an observation is flagged as an outlier, it is often interesting to know which variables contribute most to this outlyingness.

COMPSTAT2010 – p.1/26

slide-3
SLIDE 3

Motivation

Nowadays many robust methods are available to detect outliers in a multivariate, possibly high-dimensional data set (e.g. robust covariance estimators, robust PCA methods, . . .). Once an observation is flagged as an outlier, it is often interesting to know which variables contribute most to this outlyingness. Given observations 풙1, . . . , 풙푛 with 풙푖 ∈ ℝ푝. Given weights 푤푖 > 0 determining the outlyingness of 풙푖 (e.g. based on robust Mahalanobis distances). Suppose

푤푖 is small (so 풙푖 is outlying). Let 푘 < 푝.

Goal: select 푘 variables out of 푝 that contribute most to the outlyingness of 풙푖.

− → Variable selection for outliers.

COMPSTAT2010 – p.2/26

slide-4
SLIDE 4

Overview

  • 1. A simple idea.

(a) Outline. (b) Problems.

  • 2. Main proposal.
  • 3. Two algorithms

(a) Moderate dimension. (b) High dimension.

  • 4. Example.

COMPSTAT2010 – p.3/26

slide-5
SLIDE 5
  • 1. A simple idea

Denote ¯

풙푤 the weighted sample mean and and 푆푤 the weighted sample

covariance matrix. A typical measure of the outlyingness of 풙푖 is its squared robust Mahalanobis distance:

(풙푖 − ¯ 풙푤)푡푆−1

푤 (풙푖 − ¯

풙푤).

It is well known that this also equals the maximal standardized distance between the projection of 풙푖 and the projection of the weighted sample mean:

(풙푖 − ¯ 풙푤)푡푆−1

푤 (풙푖 − ¯

풙푤) = max

풂∈ℝ푝,∥풂∥=1

(풂푡풙푖 − 풂푡¯

푥푤

)2

풂푡푆푤풂 .

A simple idea is to check the coefficients of the direction 풂 for which the maximum on the right hand side is attained.

COMPSTAT2010 – p.4/26

slide-6
SLIDE 6
  • 1. A simple idea: example

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 51 a

Data

풂 = (0.99, 0.14) ⇒ 푋1 contributes most to the outlyingness of observation 51.

COMPSTAT2010 – p.5/26

slide-7
SLIDE 7
  • 1. A simple idea: problems

Note that

arg max

풂∈ℝ푝,∥풂∥=1

(풂푡풙푖 − 풂푡 ¯

풙푤

)2

풂푡푆푤풂 = 푆−1

푤 (풙푖 − ¯

풙푤) ∥푆−1

푤 (풙푖 − ¯

풙푤) ∥.

This direction of maximal outlyingness can be computed very easily, but Does not work in high dimensions (p>n). Even in moderate dimensions the curse of dimensionality causes trouble. Very dependent on the covariance structure.

COMPSTAT2010 – p.6/26

slide-8
SLIDE 8
  • 1. A simple idea: problems

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 a 51

Data COMPSTAT2010 – p.7/26

slide-9
SLIDE 9
  • 2. Main proposal

Result Let 푋푤 = (푤1(풙푡

1 − ¯

풙푡

푤), . . . , 푤푛(풙푡 푛 − ¯

풙푡

푤))푡.

Let 풚푤 = (푛 − 1) 풆푖

푤1 with 풆푖 the 푖th canonical basis vector.

Then the direction of maximal outlyingness can be written as a normed LS solution.

arg max

풂∈ℝ푝,∥풂∥=1

(풂푡풙푖 − 풂푡¯

푥푤

)2

풂푡푆푤풂 = 휽 ∥휽∥ with 휽 = arg min

휷∈ℝ푝 ∥풚푤 − 푋푤휷∥2

Proposal Add a 퐿1 type penalty:

풂(푡) = 휽(푡) ∥휽(푡)∥ with 휽(푡) = arg min

휷∈ℝ푝 ∥풚푤 − 푋푤휷∥2

subject to

푗=1

∣훽푗∣ ≤ 푡.

This yields a path of sparse directions of maximal outlyingness.

COMPSTAT2010 – p.8/26

slide-10
SLIDE 10
  • 2. Examples revisited

2 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 51 a

Data

* * * 0.0 0.2 0.4 0.6 0.8 1.0 0e+00 1e−04 2e−04 3e−04 4e−04 5e−04 6e−04 |beta|/max|beta| Standardized Coefficients * * *

LASSO

2 1 1 2

COMPSTAT2010 – p.9/26

slide-11
SLIDE 11
  • 2. Examples revisited

10 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 51 a

Data

* * * * * * * * * * * 0.0 0.2 0.4 0.6 0.8 1.0 −4e−04 −2e−04 0e+00 2e−04 4e−04 |beta|/max|beta| Standardized Coefficients * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

LASSO

5 7 9 3 6 2 4 8 1 4 6 8 10

COMPSTAT2010 – p.10/26

slide-12
SLIDE 12
  • 2. Examples revisited

30 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 51 a

Data

* * * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * 0.0 0.2 0.4 0.6 0.8 1.0 −0.004 −0.002 0.000 0.002 0.004 |beta|/max|beta| Standardized Coefficients ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * ** ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * *

LASSO

7 13 17 3 16 28 23 20 6 22 18 5 15 25 29 30 36 38 40 41

COMPSTAT2010 – p.11/26

slide-13
SLIDE 13
  • 2. Examples revisited

2 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 a 51

Data

* * * 0.0 0.2 0.4 0.6 0.8 1.0 −0.002 −0.001 0.000 0.001 |beta|/max|beta| Standardized Coefficients * * *

LASSO

2 1 1 2

COMPSTAT2010 – p.12/26

slide-14
SLIDE 14
  • 2. Examples revisited

10 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 a 51

Data

* * * * * * * * * * * 0.0 0.2 0.4 0.6 0.8 1.0 −0.003 −0.002 −0.001 0.000 0.001 0.002 |beta|/max|beta| Standardized Coefficients * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

LASSO

2 5 4 6 8 1 1 2 3 4 5 6 7 8 9 COMPSTAT2010 – p.13/26

slide-15
SLIDE 15
  • 2. Examples revisited

30 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 a 51

Data

* * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * 0.0 0.2 0.4 0.6 0.8 1.0 −0.010 −0.005 0.000 0.005 |beta|/max|beta| Standardized Coefficients * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * ** *** * * *

LASSO

10 7 2 13 12 14 15 9 21 23 1 21 32 35 36 COMPSTAT2010 – p.14/26

slide-16
SLIDE 16
  • 2. Forward versus backward

LASSO is essentially a forward method: starting from scratch variables are added to the model. This might lead to difficulties in situations where variables only contribute to the

  • utlyingness in combination with other highly correlated variables.

In that case the simple backward approach might be better.

COMPSTAT2010 – p.15/26

slide-17
SLIDE 17
  • 2. Forward versus backward

2 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 51 a

Data

* * * 0.0 0.2 0.4 0.6 0.8 1.0 −10 −5 5 |beta|/max|beta| Standardized Coefficients * * *

LASSO

2 1 2

COMPSTAT2010 – p.16/26

slide-18
SLIDE 18
  • 2. Forward versus backward

10 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 51 a

Data

* * * * * ** * * * * 0.0 0.2 0.4 0.6 0.8 1.0 −15 −10 −5 5 |beta|/max|beta| Standardized Coefficients * * * * * ** * * * * * * * * * ** * * * * * * * * * ** * * * * * * * * * ** * * * * * * * * * ** * * * * * * * * * ** * * * * * * * * * ** * * * * * * * * * ** * * * * * * * * * ** * * * *

LASSO

2 6 9 1 4 5 7 8 9 10

COMPSTAT2010 – p.17/26

slide-19
SLIDE 19
  • 2. Forward versus backward

30 dimensions

−2 2 4 6 8 10 −6 −4 −2 2 4 6 X1 X2 51 a

Data

** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** 0.0 0.2 0.4 0.6 0.8 1.0 −5 5 |beta|/max|beta| Standardized Coefficients ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** * * * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * ** ** * *** ** * * * * * * * * * * ** * * * * ** * * ** * * * **

LASSO

2 28 30 13 19 1 7 11 14 19 24 26 28 30 31 32 33

COMPSTAT2010 – p.18/26

slide-20
SLIDE 20
  • 3. An algorithm in moderate dimensions

If 푝 < 푛 we can combine the forward and backward approach to select 푘 << 푝 variables contributing most to the outlyingness of 풙푖.

  • 1. Compute the full LASSO path.
  • 2. For 푗 ∈ {0, . . . , 푘}

Let 풮푗 be the set of the 푗 variables taken first into the model by LASSO and the 푘 − 푗 variables with largest coefficients in the unregularized solution.

  • 3. Retain the set 풮푗 for which the robust Mahalanobis distance of 풙푖 is the

largest. This turns out to work very well.

COMPSTAT2010 – p.19/26

slide-21
SLIDE 21
  • 3. An algorithm in high dimensions

If 푝 > 푛 a backward approach is impossible, so the previous algorithm cannot be used. An interesting extension of the LASSO is the elastic net (Zou, Hastie, 2005) adding an additional 퐿2 type penalty. This can be useful e.g. in data with a lot of correlation between the variables.

  • 1. Compute the path

푎(푡) = 휽(푡) ∥휽(푡)∥ with 휽(푡) = arg min

휷∈ℝ푝 ∥풚푤 − 푋푤휷∥2 + 휆푗∥휷∥2

subject to

푗=1

∣휽푗∣ ≤ 푡.

Let 풮푗 be the set of 푘 variables selected by this elastic net for 휆푗,

푗 = 1, . . . , 푀.

  • 2. Select the set 풮푗 for which the outlyingness of 풙푖 is the largest.

COMPSTAT2010 – p.20/26

slide-22
SLIDE 22
  • 4. Example

The breast cancer data set by West et al. (2001) contains 푝 = 7129 gene expression profiles for 49 breast cancer patients. There are 25 ER+ cases and 24 ER- cases. Here we only consider the ER+ cases. A robust PCA algorithm reveals 4 outliers.

COMPSTAT2010 – p.21/26

slide-23
SLIDE 23
  • 4. Example

5 10 15 0e+00 1e+05 2e+05 3e+05 4e+05 5e+05 6e+05 Score distance Orthogonal distance V11 V19 V12 V12 V19 V17

ROBPCA

COMPSTAT2010 – p.22/26

slide-24
SLIDE 24
  • 4. Example

The breast cancer data set by West et al. (2001) contains 푝 = 7129 gene expression profiles for 49 breast cancer patients. There are 25 ER+ cases and 24 ER- cases. Here we only consider the ER+ cases. A robust PCA algorithm reveals 4 outliers. For each outlier we can search for the 10 genes that are contributing most to its

  • utlyingness.

COMPSTAT2010 – p.23/26

slide-25
SLIDE 25
  • 4. Example

The breast cancer data set by West et al. (2001) contains 푝 = 7129 gene expression profiles for 49 breast cancer patients. There are 25 ER+ cases and 24 ER- cases. Here we only consider the ER+ cases. A robust PCA algorithm reveals 4 outliers. For each outlier we can search for the 10 genes that are contributing most to its

  • utlyingness.

For 11, 12 and 19 we find genes that have no immediate biological interpretation. For 17 it turns out that 6 out of 10 selected variables also appear in the list of 20 genes by West et al. most relevant for differentiating between ER+ and ER-.

COMPSTAT2010 – p.24/26

slide-26
SLIDE 26
  • 4. Example

The breast cancer data set by West et al. (2001) contains 푝 = 7129 gene expression profiles for 49 breast cancer patients. There are 25 ER+ cases and 24 ER- cases. Here we only consider the ER+ cases. A robust PCA algorithm reveals 4 outliers. For each outlier we can search for the 10 genes that are contributing most to its

  • utlyingness.

For 11, 12 and 19 we find genes that have no immediate biological interpretation. For 17 it turns out that 6 out of 10 selected variables also appear in the list of 20 genes by West et al. most relevant for differentiating between ER+ and ER-. This confirms West et al.: for 11, 12 and 19 array hybridization failed, whereas 17 is a mislabeled observation.

COMPSTAT2010 – p.25/26

slide-27
SLIDE 27
  • 5. Conclusion

Summary: Given a robust procedure that detects outliers. How to select variables most relevant for the outlyingness of an outlier? The direction of maximal outlyingness is a normed solution of a least squares problem. By adding a LASSO type penalty a regularized path of sparse directions can be defined. In moderate dimensions: Graphical display. An automatic algorithm is proposed combining forward and backward selection. In high dimensions: Elastic net. Essentially forward.

COMPSTAT2010 – p.26/26