high dimensional classification methods for sparse
play

High-Dimensional Classification Methods for Sparse Signals and Their - PowerPoint PPT Presentation

High-Dimensional Classification Methods for Sparse Signals and Their Applications in Gene Expression Data Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati Biostatistics Epidemiology & Research Design


  1. High-Dimensional Classification Methods for Sparse Signals and Their Applications in Gene Expression Data Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati Biostatistics Epidemiology & Research Design Monthly Seminar Series Cincinnati Children’s Hospital Medical Center November 11, 2014 Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  2. Contents ◮ 1. Introduction Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  3. Contents ◮ 1. Introduction ◮ 2. Classification with Sparse Signals Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  4. Contents ◮ 1. Introduction ◮ 2. Classification with Sparse Signals ◮ 3. Feature Selection Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  5. Contents ◮ 1. Introduction ◮ 2. Classification with Sparse Signals ◮ 3. Feature Selection ◮ 4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  6. Contents ◮ 1. Introduction ◮ 2. Classification with Sparse Signals ◮ 3. Feature Selection ◮ 4. Simulation Results ◮ 5. Applications to Gene Expression Data Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  7. Contents ◮ 1. Introduction ◮ 2. Classification with Sparse Signals ◮ 3. Feature Selection ◮ 4. Simulation Results ◮ 5. Applications to Gene Expression Data ◮ 6. Conclusion Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  8. Contents ◮ 1. Introduction ◮ 2. Classification with Sparse Signals ◮ 3. Feature Selection ◮ 4. Simulation Results ◮ 5. Applications to Gene Expression Data ◮ 6. Conclusion ◮ 7. Selected Bibliography Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  9. 1. Introduction ◮ High-dimensional classification arises in many contemporary statistical problems. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  10. 1. Introduction ◮ High-dimensional classification arises in many contemporary statistical problems. ◮ • Bioinformatic: disease classification using microarray, proteomics, fMRI data. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  11. 1. Introduction ◮ High-dimensional classification arises in many contemporary statistical problems. ◮ • Bioinformatic: disease classification using microarray, proteomics, fMRI data. ◮ • Document or text classification: E-mail spam. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  12. 1. Introduction ◮ High-dimensional classification arises in many contemporary statistical problems. ◮ • Bioinformatic: disease classification using microarray, proteomics, fMRI data. ◮ • Document or text classification: E-mail spam. ◮ • Voice recognition, hand written recognition, etc. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  13. 1. Introduction Well known classification methods include: ◮ ♠ Logistic Regression Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  14. 1. Introduction Well known classification methods include: ◮ ♠ Logistic Regression ◮ ♠ Fisher discriminant analysis Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  15. 1. Introduction Well known classification methods include: ◮ ♠ Logistic Regression ◮ ♠ Fisher discriminant analysis ◮ ♠ Naive Bayes classifier Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  16. 1. Introduction Well known classification methods include: ◮ ♠ Logistic Regression ◮ ♠ Fisher discriminant analysis ◮ ♠ Naive Bayes classifier For high-dimensional data (i.e. when p >> n ), the above methods doesn’t work well. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  17. 1. Introduction Well known classification methods include: ◮ ♠ Logistic Regression ◮ ♠ Fisher discriminant analysis ◮ ♠ Naive Bayes classifier For high-dimensional data (i.e. when p >> n ), the above methods doesn’t work well. � Bickel and Levina (2004) showed that Fisher breaks down for high-dimensions and suggested Naive Bayes rule. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  18. 1. Introduction Well known classification methods include: ◮ ♠ Logistic Regression ◮ ♠ Fisher discriminant analysis ◮ ♠ Naive Bayes classifier For high-dimensional data (i.e. when p >> n ), the above methods doesn’t work well. � Bickel and Levina (2004) showed that Fisher breaks down for high-dimensions and suggested Naive Bayes rule. � Fan and Fan (2008) showed that even for Naive Bayes using all the features increases the error rate and suggested FAIR. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  19. 1. Introduction � Fan and Fan (2008) showed that the two-sample t-test can get important features. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  20. 1. Introduction � Fan and Fan (2008) showed that the two-sample t-test can get important features. � Fan and etal.(2012) showed that Naive Bayes increase error rates if there is correlation among the features. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  21. 1. Introduction � Fan and Fan (2008) showed that the two-sample t-test can get important features. � Fan and etal.(2012) showed that Naive Bayes increase error rates if there is correlation among the features. � My Works : Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  22. 1. Introduction � Fan and Fan (2008) showed that the two-sample t-test can get important features. � Fan and etal.(2012) showed that Naive Bayes increase error rates if there is correlation among the features. � My Works : • I will show that even under high-correlation Naive Bayes can perform better than Fisher. • I propose a generalized test statistic and give the condition under which it selects important features. Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  23. 2. Classification with Sparse Signals Fisher discriminant rule � � d Σ − 1 ( µ T δ F ( X , µ d , µ a , Σ) = 1 X − µ a ) > 0 , (1) Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  24. 2. Classification with Sparse Signals Fisher discriminant rule � � d Σ − 1 ( µ T δ F ( X , µ d , µ a , Σ) = 1 X − µ a ) > 0 , (1) with corresponding misclassification error rate � � ( µ T d Σ − 1 µ d ) 1 / 2 W ( δ F , θ ) = ¯ Φ . (2) 2 Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  25. 2. Classification with Sparse Signals Naive Bayes rule � � µ T d D − 1 ( δ NB ( X , µ d , µ a , D ) = 1 X − µ a ) > 0 , (3) Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

  26. 2. Classification with Sparse Signals Naive Bayes rule � � µ T d D − 1 ( δ NB ( X , µ d , µ a , D ) = 1 X − µ a ) > 0 , (3) whose misclassification error rate is d D − 1 µ d µ T � � W ( δ NB , θ ) = ¯ Φ . (4) 2( µ T d D − 1 Σ D − 1 µ d ) 1 / 2 Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

Recommend


More recommend