Regularized Directions of Maximal Outlyingness Michiel Debruyne Dept. of mathematics and computer science , Universiteit Antwerpen COMPSTAT 2010 August 23, 2010
Motivation Nowadays many robust methods are available to detect outliers in a multivariate, possibly high-dimensional data set (e.g. robust covariance estimators, robust PCA methods, . . . ). Once an observation is flagged as an outlier, it is often interesting to know which variables contribute most to this outlyingness. COMPSTAT2010 – p.1/26
Motivation Nowadays many robust methods are available to detect outliers in a multivariate, possibly high-dimensional data set (e.g. robust covariance estimators, robust PCA methods, . . . ). Once an observation is flagged as an outlier, it is often interesting to know which variables contribute most to this outlyingness. Given observations 풙 1 , . . . , 풙 푛 with 풙 푖 ∈ ℝ 푝 . Given weights 푤 푖 > 0 determining the outlyingness of 풙 푖 (e.g. based on robust Mahalanobis distances). Suppose 푤 푖 is small (so 풙 푖 is outlying). Let 푘 < 푝 . Goal: select 푘 variables out of 푝 that contribute most to the outlyingness of 풙 푖 . �− → Variable selection for outliers. COMPSTAT2010 – p.2/26
Overview 1. A simple idea. (a) Outline. (b) Problems. 2. Main proposal. 3. Two algorithms (a) Moderate dimension. (b) High dimension. 4. Example. COMPSTAT2010 – p.3/26
1. A simple idea Denote ¯ 풙 푤 the weighted sample mean and and 푆 푤 the weighted sample covariance matrix. A typical measure of the outlyingness of 풙 푖 is its squared robust Mahalanobis distance: 풙 푤 ) 푡 푆 − 1 ( 풙 푖 − ¯ 푤 ( 풙 푖 − ¯ 풙 푤 ) . It is well known that this also equals the maximal standardized distance between the projection of 풙 푖 and the projection of the weighted sample mean: ) 2 ( 풂 푡 풙 푖 − 풂 푡 ¯ 푥 푤 풙 푤 ) 푡 푆 − 1 ( 풙 푖 − ¯ 푤 ( 풙 푖 − ¯ 풙 푤 ) = max . 풂 푡 푆 푤 풂 풂 ∈ ℝ 푝 , ∥ 풂 ∥ =1 A simple idea is to check the coefficients of the direction 풂 for which the maximum on the right hand side is attained. COMPSTAT2010 – p.4/26
1. A simple idea: example Data 6 4 2 X2 a 0 51 −2 −4 −6 −2 0 2 4 6 8 10 X1 풂 = (0 . 99 , 0 . 14) ⇒ 푋 1 contributes most to the outlyingness of observation 51 . COMPSTAT2010 – p.5/26
1. A simple idea: problems Note that ( 풂 푡 풙 푖 − 풂 푡 ¯ ) 2 푆 − 1 풙 푤 푤 ( 풙 푖 − ¯ 풙 푤 ) arg max = 풙 푤 ) ∥ . ∥ 푆 − 1 풂 푡 푆 푤 풂 푤 ( 풙 푖 − ¯ 풂 ∈ ℝ 푝 , ∥ 풂 ∥ =1 This direction of maximal outlyingness can be computed very easily, but Does not work in high dimensions (p>n). Even in moderate dimensions the curse of dimensionality causes trouble. Very dependent on the covariance structure. COMPSTAT2010 – p.6/26
1. A simple idea: problems Data 6 4 2 X2 0 51 −2 −4 a −6 −2 0 2 4 6 8 10 X1 COMPSTAT2010 – p.7/26
2. Main proposal Result Let 푋 푤 = ( 푤 1 ( 풙 푡 풙 푡 푤 ) , . . . , 푤 푛 ( 풙 푡 풙 푡 푤 )) 푡 . 1 − ¯ 푛 − ¯ Let 풚 푤 = ( 푛 − 1) 풆 푖 푤 1 with 풆 푖 the 푖 th canonical basis vector. Then the direction of maximal outlyingness can be written as a normed LS solution. ) 2 ( 풂 푡 풙 푖 − 풂 푡 ¯ 푥 푤 휽 휷 ∈ ℝ 푝 ∥ 풚 푤 − 푋 푤 휷 ∥ 2 arg max = ∥ 휽 ∥ with 휽 = arg min 풂 푡 푆 푤 풂 풂 ∈ ℝ 푝 , ∥ 풂 ∥ =1 Proposal Add a 퐿 1 type penalty: 푝 휽 ( 푡 ) ∑ 휷 ∈ ℝ 푝 ∥ 풚 푤 − 푋 푤 휷 ∥ 2 풂 ( 푡 ) = ∥ 휽 ( 푡 ) ∥ with 휽 ( 푡 ) = arg min ∣ 훽 푗 ∣ ≤ 푡. subject to 푗 =1 This yields a path of sparse directions of maximal outlyingness. COMPSTAT2010 – p.8/26
2. Examples revisited LASSO 0 1 2 6e−04 1 * * 5e−04 4e−04 Standardized Coefficients 2 dimensions 3e−04 Data 2e−04 6 4 1e−04 2 X2 a 0 51 0e+00 −2 * * * −4 2 * −6 0.0 0.2 0.4 0.6 0.8 1.0 −2 0 2 4 6 8 10 |beta|/max|beta| X1 COMPSTAT2010 – p.9/26
2. Examples revisited LASSO 0 1 4 6 8 10 * 8 * 4e−04 * * * * * * * * * * * * 2e−04 4 * * * * * Standardized Coefficients 2 * 10 dimensions * * * * * * * * 6 * 0e+00 Data * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 6 * −2e−04 * 3 * 4 * 2 9 * −4e−04 X2 a 0 51 7 * −2 −4 5 * −6 0.0 0.2 0.4 0.6 0.8 1.0 −2 0 2 4 6 8 10 |beta|/max|beta| X1 COMPSTAT2010 – p.10/26
2. Examples revisited LASSO 0 15 25 29 30 36 38 40 41 0.004 * * 5 * * 18 * * * 22 * * * * * 0.002 * * * * * * * * * * * * * * * * * * * * * 6 * * * * Standardized Coefficients * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * 20 30 dimensions ** * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * ** * * * 0.000 ** * * * * * * * * ** ** * ** * * * * * * * * * * * * * * * ** * ** ** * * * * * * * * * * * * * * * * * * * * * * ** ** * * * * * * * * * * * * * ** ** ** * * * * * * * ** ** ** ** * * * * * * * * * * * * * * * * * * * * ** * ** * ** ** * * * * * * * * * * * 23 * * * * ** ** * ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * * ** ** * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** * ** * * * ** * * ** * ** * ** * ** ** ** * * ** ** * * * ** * ** * * * * * ** ** ** ** ** ** * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** * * ** ** ** * * * * * * * * * * * * * * ** * ** ** * * * * * * * * * * * * * ** ** * ** * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * ** ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Data ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 28 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 16 * * * * * * 6 * * * * * * −0.002 * * * * * * * 3 * * * 4 * * * * * * 17 * * * * 2 X2 a 0 51 −0.004 −2 13 * * −4 7 * * −6 0.0 0.2 0.4 0.6 0.8 1.0 −2 0 2 4 6 8 10 |beta|/max|beta| X1 COMPSTAT2010 – p.11/26
2. Examples revisited LASSO 0 1 2 1 * 0.001 Standardized Coefficients * 2 dimensions 0.000 * * * Data 6 −0.001 4 2 X2 0 51 −2 −0.002 −4 a 2 * −6 0.0 0.2 0.4 0.6 0.8 1.0 −2 0 2 4 6 8 10 |beta|/max|beta| X1 COMPSTAT2010 – p.12/26
2. Examples revisited LASSO 0 1 2 3 4 5 6 7 8 9 * * 1 * * 0.002 * * * * 0.001 * 10 dimensions Standardized Coefficients 8 * * * * * * * * Data * * 0.000 * 6 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 4 * * * * * * * * 6 * * 5 * * −0.001 4 * * * 2 * −0.002 * X2 0 51 −2 * * −0.003 −4 a * * 2 −6 0.0 0.2 0.4 0.6 0.8 1.0 −2 0 2 4 6 8 10 |beta|/max|beta| X1 COMPSTAT2010 – p.13/26
Recommend
More recommend