feature selection
play

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is - PowerPoint PPT Presentation

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is Feature? X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of.. What methods are out there?


  1. Feature Selection ZHI LI Fenyö’s Lab October 3, 2019

  2. What is Feature? X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of..

  3. What methods are out there? Depend on the way to combine selection algorithm and the model building Filter methods: Correlation Embedded methods: Regularization Wrapper methods: Forward selection

  4. Filter methods • Numeric Outcomes: • numeric predictors: • correlation (linear, nonlinear); distance • mutual information • categorical predictors: • t-statistics • ANOVA (multiple predictors) • mutual information • Categorical Outcomes: • numeric predictors: • ROC • Relief • mutual information • categorical predictors: • Chi-square • Fisher’s exact • Odds ratio

  5. <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> Minimizing the loss function, L (sum of squared errors/MSR): 5 w 1 = n P x i y i − ( P x i ) ( P y i ) = cov ( X, Y ) n 2 ˆ n ( P x 2 i ) − ( P x i ) 2 var ( X ) n 2 w 0 = 1 1 X X ˆ y i − ˆ w 1 x i n n Slide courtesy of Wilson with minor adaption 5

  6. Pearson’s correlation

  7. Feature Importance Measures Correlation Based Feature selection

  8. Spearman’s Correlation

  9. Spearman’s Correlation Numerator: Denominator: Courtesy of Glen_b from stackexchange

  10. Other non-linear Measures

  11. Other non-linear Measures • MIC

  12. Other non-linear Measures

  13. Filter methods

  14. Other non-linear Measures

  15. t-statistics (categorical Predictor)

  16. Categorical Outcome

  17. ROC

  18. Evaluation of Binary Classification Models Predicted 0 1 False True 0 18 Negative Positive Actual False True 1 Negative Positive • False Positive Rate = FP/(FP+TN) – fraction of label 0 predicted to be label 1 • Accuracy = (TP+TN)/total - fraction of correct prediction • Precision = TP/(TP+FP) – fraction of correct among positive predictions • Sensitivity = TP/(TP+FN) – fraction of correct predictions among label 1. Also called true positive rate and recall. • Specificity = TN/(TN+FP) – fraction of correct predictions among label 0 Slide courtesy of David 18

  19. Relief algorithm

  20. Relief algorithm

  21. More on categorical outcome with numeric predictors

  22. Categorical Outcome with Categorical Variable

  23. Categorical Outcome with Categorical Variable Chi-square test: Large number Easy to calculate Approximation Fisher’s exact: 2 x 2 chi-squared test Small number Hard to calculate Exact

  24. Fisher’s exact

  25. Consequences of Using Non-informative Predictors

  26. Embedded methods . 6 &0123 &3243##0$5 = &'' + ) * / + +,- . !"##$ = &'' + ) * / + +,- Slide courtesy of Anna

  27. Wrapper methods Stepwise Regression Forward selection Backward elimination Bidirectional elimination

  28. Forward selection Applied Predictive Modeling Chapter 19

  29. Backward Selection

  30. Filter Methods Akaike Information Criterion . Bayesian information criterion /0) 4 ) + 6 ∗ 89:(%) (/0 − 2 !"# = % log (* +,- CV (Covered Earlier by Anna)

  31. Feature Importance Measures Often model dependent Courtesy of Scott Lundberg

  32. A comparison of two methods for determining variable importance.

  33. Feature Importance Measures 1.Tree SHAP. A new proposed method 2.Saabas. An individualized heuristic feature attribution method. 3.mean(|Tree SHAP|). A global attribution method based on the average magnitude of the individualized Tree SHAP attributions. 4.Gain. The same method used above in XGBoost, and also equivalent to the Gini importance measure used in scikit-learn tree models. 5.Split count. Represents both the closely related “weight” and “cover” methods in XGBoost, but is computed using the “weight” method. 6.Permutation. The resulting drop in accuracy of the model when a single feature is randomly permuted in the test data set. Courtesy of Scott Lundberg

  34. Controlling Procedure (FWER and FDR)

  35. Summary Various tools to dissect the relationships of predictor and outcome are available and the selection of them are subject to the question in mind. Feature selection is mostly subject to model selection except for filtering method. Most of the time, simple plotting (scatterplot, boxplot, pca) can save you a ton of time to figure out what is relevant/robust or dubious/misleading. Low complexity is mostly preferred when the primary goal is interpreting the contribution of predictor to the outcome. Sometimes, multiplicity should be controlled for multiple hypothesis test.

  36. Thank You

Recommend


More recommend