. . Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van Aelst and Gert Willems Department of Applied Mathematics and Computer Science Ghent University, Belgium COMPSTAT’2010
Linear discriminant analysis Linear discriminant analysis setting p -dimensional data set Group 1: x 11 . . . , x 1 n 1 ∈ Π 1 ∼ F 1 = F µ 1 , Σ Group 2: x 21 . . . , x 2 n 2 ∈ Π 2 ∼ F 2 = F µ 2 , Σ Common covariance matrix Σ P ( X ∈ Π 1 ) = P ( X ∈ Π 2 ) j Σ − 1 x − 1 d L j ( x ) = µ t 2 µ t j Σ − 1 µ j ; j = 1 , 2 ✤ ✜ Classify x ∈ R p into Π 1 if Linear Bayes rule: d L 1 ( x ) > d L 2 ( x ) ✣ ✢ and into Π 2 otherwise. Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 2
Linear discriminant analysis Linear discriminant analysis setting p -dimensional data set Group 1: x 11 . . . , x 1 n 1 ∈ Π 1 ∼ F 1 = F µ 1 , Σ Group 2: x 21 . . . , x 2 n 2 ∈ Π 2 ∼ F 2 = F µ 2 , Σ Common covariance matrix Σ P ( X ∈ Π 1 ) = P ( X ∈ Π 2 ) d L j ( x ) = µ t j Σ − 1 x − 1 2 µ t j Σ − 1 µ j ; j = 1 , 2 ✤ ✜ Classify x ∈ R p into Π 1 if Linear Bayes rule: d L 1 ( x ) > d L 2 ( x ) ✣ ✢ and into Π 2 otherwise. Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 3
Linear discriminant analysis Discriminant coordinate Direction a that best separates the two populations: a = Σ − 1 ( µ 1 − µ 2 ) The projection a t x is called the canonical variate or discriminant coordinate Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 4
Linear discriminant analysis Sample LDA Estimate the centers µ 1 and µ 2 and the scatter Σ from the data Standard LDA uses the sample means ¯ x 1 and ¯ x 2 , and the pooled sample covariance matrix S n = ( n 1 − 1 ) S 1 + ( n 2 − 1 ) S 2 n 1 + n 2 − 2 Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 5
Robust LDA Robust LDA Use robust estimators of the centers µ 1 and µ 2 and the common scatter Σ − → S-estimators − → MM-estimators Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 6
Robust LDA One-sample S-estimators Observations { x 1 , . . . , x n } ⊂ R p ✬ ✩ ρ 0 : [ 0 , ∞ [ → [ 0 , ∞ [ is bounded, increasing and smooth µ n and scatter � S-estimates of the location � Σ n minimize | C | sub- ject to ( ) ∑ n 1 1 [( x i − T ) t C − 1 ( x i − T )] ρ 0 = b 2 n i = 1 ✫ ✪ among all T ∈ R p and C ∈ PDS ( p ) (Davies 1987, Rousseeuw and Leroy 1987, Lopuhaä 1989) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 7
Robust LDA ρ functions A popular family of loss functions is the Tukey biweight (bisquare) family of ρ functions: t 2 2 − t 4 2 c 2 + t 6 if | t | ≤ c 6 c 4 ρ c ( t ) = c 2 if | t | ≥ c . 6 The constant c can be tuned for robustness (breakdown point) The choice of c also determines the efficiency of the S-estimator → Trade-off robustness vs efficiency Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 8
Robust LDA Tukey biweight ρ functions c= ∞ 2.0 c=3 1.5 ρ ( t ) 1.0 c=2 0.5 0.0 −4 −2 0 2 4 t Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 9
Robust LDA One-sample MM-estimates σ n = det ( � ✬ Σ n ) 1 / 2 p , the S-estimate of scale ✩ Put ˜ µ n and shape � Γ n mini- Then the MM-estimates of the location � mize ( ) n ∑ 1 1 [( x i − T ) t G − 1 ( x i − T )] 2 / ˜ ρ 1 σ n n i = 1 ✫ ✪ among all T ∈ R p and G ∈ PDS ( p ) for which det( G )=1 (Tatsuoka and Tyler 2000) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 10
Robust LDA ρ functions Both ρ 0 and ρ 1 are taken from the same family The constant c in ρ 0 can be tuned for robustness (breakdown point) MM-estimator inherits its robustness from the S-scale The constant c in ρ 1 can be tuned for efficiency of locations Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 11
Robust LDA Tukey biweight ρ functions p = 2 p = 5 2 6 5 1.5 4 ρ ρ 0 0 1 3 ρ 1 ρ 1 2 0.5 1 0 0 c 0 c 1 c 0 c 1 −7 0 7 −8 0 8 Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 12
Robust LDA Robust two-sample estimates Pool the scatter estimates � Σ 1 n 1 and � Σ 2 n 2 of both groups: Σ n = n 1 � Σ 1 n 1 + n 2 � Σ 2 n 2 � n 1 + n 2 Calculate simultaneous S-estimates of the two locations ✬ ✩ and the common scatter matrix: µ 2 n and � µ 1 n , � � Σ n minimize | C | subject to n j ( ) ∑ 2 ∑ 1 1 [( x ji − T j ) t C − 1 ( x ji − T j )] ρ 0 = b 2 n 1 + n 2 j = 1 i = 1 ✫ ✪ among all T 1 , T 2 ∈ R p and C ∈ PDS ( p ) (He and Fung 2000) Similarly, simultaneous MM-estimates can be calculated Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 13
Robust LDA Bootstrap inference Advantages of bootstrap Few assumptions Wide range of applications Bootstrapping robust estimators High computational cost Robustness not guaranteed Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 14
Robust LDA Bootstrap inference Advantages of bootstrap Few assumptions Wide range of applications Bootstrapping robust estimators High computational cost Robustness not guaranteed Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 15
Fast and robust bootstrap Fast and robust bootstrap principle For each bootstrap sample Calculate an approximation for the estimates Use the estimating equations Fast to compute approximations Inherit robustness of initial solution Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 16
Fast and robust bootstrap Fast and robust bootstrap Consider estimates that are the solution of a fixed point equation � Θ n = g n ( � Θ n ) For a bootstrap sample � n ( � Θ ∗ n = g ∗ Θ ∗ n ) consider the one-step approximation Θ 1 ⋆ � n ( � n = g ∗ Θ n ) Take a Taylor expansion about estimands Θ : Θ n = g n (Θ) + ∇ g n (Θ)( � � Θ n − Θ) + O P ( n − 1 ) which can be rewritten as: √ n ( � Θ n − Θ) = [ I − ∇ g n (Θ)] − 1 √ n ( g n (Θ) − Θ) + O P ( n − 1 / 2 ) We then obtain √ n ( � Θ n )] − 1 √ n ( g ∗ n − � Θ n ) = [ I −∇ g n ( � n ( � Θ n ) − � Θ n )+ O P ( n − 1 / 2 ) Θ ∗ which yields the FRB estimate Θ R ⋆ � n = � Θ n + [ I − ∇ g n ( � Θ n )] − 1 ( � Θ 1 ⋆ n − � Θ n ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 17
Fast and robust bootstrap Fast and robust bootstrap Consider estimates that are the solution of a fixed point equation � Θ n = g n ( � Θ n ) For a bootstrap sample � n ( � Θ ∗ n = g ∗ Θ ∗ n ) consider the one-step approximation Θ 1 ⋆ � n ( � n = g ∗ Θ n ) Take a Taylor expansion about estimands Θ : Θ n = g n (Θ) + ∇ g n (Θ)( � � Θ n − Θ) + O P ( n − 1 ) which can be rewritten as: √ n ( � Θ n − Θ) = [ I − ∇ g n (Θ)] − 1 √ n ( g n (Θ) − Θ) + O P ( n − 1 / 2 ) We then obtain √ n ( � Θ n )] − 1 √ n ( g ∗ n − � Θ n ) = [ I −∇ g n ( � n ( � Θ n ) − � Θ n )+ O P ( n − 1 / 2 ) Θ ∗ which yields the FRB estimate Θ R ⋆ � n = � Θ n + [ I − ∇ g n ( � Θ n )] − 1 ( � Θ 1 ⋆ n − � Θ n ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 18
Fast and robust bootstrap Properties of fast robust bootstrap Computational efficiency: The FRB estimates are solutions of a system of linear equations Robustness: The FRB estimates use the weights of the MM-estimates at the original sample Consistency: Under regularity conditions, the FRB distribution of � Θ n and the sample distribution of � Θ n converge to the same limiting distribution Smooth mappings: FRB commutes with smooth functions, such as a = Σ − 1 ( µ 1 − µ 2 ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 19
Fast and robust bootstrap Properties of fast robust bootstrap Computational efficiency: The FRB estimates are solutions of a system of linear equations Robustness: The FRB estimates use the weights of the MM-estimates at the original sample Consistency: Under regularity conditions, the FRB distribution of � Θ n and the sample distribution of � Θ n converge to the same limiting distribution Smooth mappings: FRB commutes with smooth functions, such as a = Σ − 1 ( µ 1 − µ 2 ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 20
Fast and robust bootstrap Variable selection in robust LDA Two group robust LDA Selection criterion: test for significance of the discriminant coordinate coefficients Use FRB distribution to estimate p-values Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 21
Examples Example: Biting Flies Two groups of 35 flies (Leptoconops torrens and Leptoconops carteri) Measurements of wing length wing width third palp length third palp width fourth palp length Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 22
Examples Biting Flies: outliers Wing width 2 Group 1 20 25 30 35 40 45 50 Wing width Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 23
Recommend
More recommend