selecting variables in two group robust linear
play

Selecting Variables in Two-Group Robust Linear Discriminant Analysis - PowerPoint PPT Presentation

. . Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van Aelst and Gert Willems Department of Applied Mathematics and Computer Science Ghent University, Belgium COMPSTAT2010 Linear discriminant


  1. . . Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van Aelst and Gert Willems Department of Applied Mathematics and Computer Science Ghent University, Belgium COMPSTAT’2010

  2. Linear discriminant analysis Linear discriminant analysis setting p -dimensional data set Group 1: x 11 . . . , x 1 n 1 ∈ Π 1 ∼ F 1 = F µ 1 , Σ Group 2: x 21 . . . , x 2 n 2 ∈ Π 2 ∼ F 2 = F µ 2 , Σ Common covariance matrix Σ P ( X ∈ Π 1 ) = P ( X ∈ Π 2 ) j Σ − 1 x − 1 d L j ( x ) = µ t 2 µ t j Σ − 1 µ j ; j = 1 , 2 ✤ ✜ Classify x ∈ R p into Π 1 if Linear Bayes rule: d L 1 ( x ) > d L 2 ( x ) ✣ ✢ and into Π 2 otherwise. Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 2

  3. Linear discriminant analysis Linear discriminant analysis setting p -dimensional data set Group 1: x 11 . . . , x 1 n 1 ∈ Π 1 ∼ F 1 = F µ 1 , Σ Group 2: x 21 . . . , x 2 n 2 ∈ Π 2 ∼ F 2 = F µ 2 , Σ Common covariance matrix Σ P ( X ∈ Π 1 ) = P ( X ∈ Π 2 ) d L j ( x ) = µ t j Σ − 1 x − 1 2 µ t j Σ − 1 µ j ; j = 1 , 2 ✤ ✜ Classify x ∈ R p into Π 1 if Linear Bayes rule: d L 1 ( x ) > d L 2 ( x ) ✣ ✢ and into Π 2 otherwise. Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 3

  4. Linear discriminant analysis Discriminant coordinate Direction a that best separates the two populations: a = Σ − 1 ( µ 1 − µ 2 ) The projection a t x is called the canonical variate or discriminant coordinate Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 4

  5. Linear discriminant analysis Sample LDA Estimate the centers µ 1 and µ 2 and the scatter Σ from the data Standard LDA uses the sample means ¯ x 1 and ¯ x 2 , and the pooled sample covariance matrix S n = ( n 1 − 1 ) S 1 + ( n 2 − 1 ) S 2 n 1 + n 2 − 2 Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 5

  6. Robust LDA Robust LDA Use robust estimators of the centers µ 1 and µ 2 and the common scatter Σ − → S-estimators − → MM-estimators Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 6

  7. Robust LDA One-sample S-estimators Observations { x 1 , . . . , x n } ⊂ R p ✬ ✩ ρ 0 : [ 0 , ∞ [ → [ 0 , ∞ [ is bounded, increasing and smooth µ n and scatter � S-estimates of the location � Σ n minimize | C | sub- ject to ( ) ∑ n 1 1 [( x i − T ) t C − 1 ( x i − T )] ρ 0 = b 2 n i = 1 ✫ ✪ among all T ∈ R p and C ∈ PDS ( p ) (Davies 1987, Rousseeuw and Leroy 1987, Lopuhaä 1989) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 7

  8. Robust LDA ρ functions A popular family of loss functions is the Tukey biweight (bisquare) family of ρ functions:  t 2 2 − t 4 2 c 2 + t 6  if | t | ≤ c 6 c 4 ρ c ( t ) =  c 2 if | t | ≥ c . 6 The constant c can be tuned for robustness (breakdown point) The choice of c also determines the efficiency of the S-estimator → Trade-off robustness vs efficiency Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 8

  9. Robust LDA Tukey biweight ρ functions c= ∞ 2.0 c=3 1.5 ρ ( t ) 1.0 c=2 0.5 0.0 −4 −2 0 2 4 t Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 9

  10. Robust LDA One-sample MM-estimates σ n = det ( � ✬ Σ n ) 1 / 2 p , the S-estimate of scale ✩ Put ˜ µ n and shape � Γ n mini- Then the MM-estimates of the location � mize ( ) n ∑ 1 1 [( x i − T ) t G − 1 ( x i − T )] 2 / ˜ ρ 1 σ n n i = 1 ✫ ✪ among all T ∈ R p and G ∈ PDS ( p ) for which det( G )=1 (Tatsuoka and Tyler 2000) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 10

  11. Robust LDA ρ functions Both ρ 0 and ρ 1 are taken from the same family The constant c in ρ 0 can be tuned for robustness (breakdown point) MM-estimator inherits its robustness from the S-scale The constant c in ρ 1 can be tuned for efficiency of locations Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 11

  12. Robust LDA Tukey biweight ρ functions p = 2 p = 5 2 6 5 1.5 4 ρ ρ 0 0 1 3 ρ 1 ρ 1 2 0.5 1 0 0 c 0 c 1 c 0 c 1 −7 0 7 −8 0 8 Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 12

  13. Robust LDA Robust two-sample estimates Pool the scatter estimates � Σ 1 n 1 and � Σ 2 n 2 of both groups: Σ n = n 1 � Σ 1 n 1 + n 2 � Σ 2 n 2 � n 1 + n 2 Calculate simultaneous S-estimates of the two locations ✬ ✩ and the common scatter matrix: µ 2 n and � µ 1 n , � � Σ n minimize | C | subject to n j ( ) ∑ 2 ∑ 1 1 [( x ji − T j ) t C − 1 ( x ji − T j )] ρ 0 = b 2 n 1 + n 2 j = 1 i = 1 ✫ ✪ among all T 1 , T 2 ∈ R p and C ∈ PDS ( p ) (He and Fung 2000) Similarly, simultaneous MM-estimates can be calculated Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 13

  14. Robust LDA Bootstrap inference Advantages of bootstrap Few assumptions Wide range of applications Bootstrapping robust estimators High computational cost Robustness not guaranteed Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 14

  15. Robust LDA Bootstrap inference Advantages of bootstrap Few assumptions Wide range of applications Bootstrapping robust estimators High computational cost Robustness not guaranteed Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 15

  16. Fast and robust bootstrap Fast and robust bootstrap principle For each bootstrap sample Calculate an approximation for the estimates Use the estimating equations Fast to compute approximations Inherit robustness of initial solution Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 16

  17. Fast and robust bootstrap Fast and robust bootstrap Consider estimates that are the solution of a fixed point equation � Θ n = g n ( � Θ n ) For a bootstrap sample � n ( � Θ ∗ n = g ∗ Θ ∗ n ) consider the one-step approximation Θ 1 ⋆ � n ( � n = g ∗ Θ n ) Take a Taylor expansion about estimands Θ : Θ n = g n (Θ) + ∇ g n (Θ)( � � Θ n − Θ) + O P ( n − 1 ) which can be rewritten as: √ n ( � Θ n − Θ) = [ I − ∇ g n (Θ)] − 1 √ n ( g n (Θ) − Θ) + O P ( n − 1 / 2 ) We then obtain √ n ( � Θ n )] − 1 √ n ( g ∗ n − � Θ n ) = [ I −∇ g n ( � n ( � Θ n ) − � Θ n )+ O P ( n − 1 / 2 ) Θ ∗ which yields the FRB estimate Θ R ⋆ � n = � Θ n + [ I − ∇ g n ( � Θ n )] − 1 ( � Θ 1 ⋆ n − � Θ n ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 17

  18. Fast and robust bootstrap Fast and robust bootstrap Consider estimates that are the solution of a fixed point equation � Θ n = g n ( � Θ n ) For a bootstrap sample � n ( � Θ ∗ n = g ∗ Θ ∗ n ) consider the one-step approximation Θ 1 ⋆ � n ( � n = g ∗ Θ n ) Take a Taylor expansion about estimands Θ : Θ n = g n (Θ) + ∇ g n (Θ)( � � Θ n − Θ) + O P ( n − 1 ) which can be rewritten as: √ n ( � Θ n − Θ) = [ I − ∇ g n (Θ)] − 1 √ n ( g n (Θ) − Θ) + O P ( n − 1 / 2 ) We then obtain √ n ( � Θ n )] − 1 √ n ( g ∗ n − � Θ n ) = [ I −∇ g n ( � n ( � Θ n ) − � Θ n )+ O P ( n − 1 / 2 ) Θ ∗ which yields the FRB estimate Θ R ⋆ � n = � Θ n + [ I − ∇ g n ( � Θ n )] − 1 ( � Θ 1 ⋆ n − � Θ n ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 18

  19. Fast and robust bootstrap Properties of fast robust bootstrap Computational efficiency: The FRB estimates are solutions of a system of linear equations Robustness: The FRB estimates use the weights of the MM-estimates at the original sample Consistency: Under regularity conditions, the FRB distribution of � Θ n and the sample distribution of � Θ n converge to the same limiting distribution Smooth mappings: FRB commutes with smooth functions, such as a = Σ − 1 ( µ 1 − µ 2 ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 19

  20. Fast and robust bootstrap Properties of fast robust bootstrap Computational efficiency: The FRB estimates are solutions of a system of linear equations Robustness: The FRB estimates use the weights of the MM-estimates at the original sample Consistency: Under regularity conditions, the FRB distribution of � Θ n and the sample distribution of � Θ n converge to the same limiting distribution Smooth mappings: FRB commutes with smooth functions, such as a = Σ − 1 ( µ 1 − µ 2 ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 20

  21. Fast and robust bootstrap Variable selection in robust LDA Two group robust LDA Selection criterion: test for significance of the discriminant coordinate coefficients Use FRB distribution to estimate p-values Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 21

  22. Examples Example: Biting Flies Two groups of 35 flies (Leptoconops torrens and Leptoconops carteri) Measurements of wing length wing width third palp length third palp width fourth palp length Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 22

  23. Examples Biting Flies: outliers Wing width 2 Group 1 20 25 30 35 40 45 50 Wing width Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 23

Recommend


More recommend