Feature Selection ZHI LI Fenyö’s Lab October 3, 2019
What is Feature? X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of..
What methods are out there? Depend on the way to combine selection algorithm and the model building Filter methods: Correlation Embedded methods: Regularization Wrapper methods: Forward selection
Filter methods • Numeric Outcomes: • numeric predictors: • correlation (linear, nonlinear); distance • mutual information • categorical predictors: • t-statistics • ANOVA (multiple predictors) • mutual information • Categorical Outcomes: • numeric predictors: • ROC • Relief • mutual information • categorical predictors: • Chi-square • Fisher’s exact • Odds ratio
<latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> Minimizing the loss function, L (sum of squared errors/MSR): 5 w 1 = n P x i y i − ( P x i ) ( P y i ) = cov ( X, Y ) n 2 ˆ n ( P x 2 i ) − ( P x i ) 2 var ( X ) n 2 w 0 = 1 1 X X ˆ y i − ˆ w 1 x i n n Slide courtesy of Wilson with minor adaption 5
Pearson’s correlation
Feature Importance Measures Correlation Based Feature selection
Spearman’s Correlation
Spearman’s Correlation Numerator: Denominator: Courtesy of Glen_b from stackexchange
Other non-linear Measures
Other non-linear Measures • MIC
Other non-linear Measures
Filter methods
Other non-linear Measures
t-statistics (categorical Predictor)
Categorical Outcome
ROC
Evaluation of Binary Classification Models Predicted 0 1 False True 0 18 Negative Positive Actual False True 1 Negative Positive • False Positive Rate = FP/(FP+TN) – fraction of label 0 predicted to be label 1 • Accuracy = (TP+TN)/total - fraction of correct prediction • Precision = TP/(TP+FP) – fraction of correct among positive predictions • Sensitivity = TP/(TP+FN) – fraction of correct predictions among label 1. Also called true positive rate and recall. • Specificity = TN/(TN+FP) – fraction of correct predictions among label 0 Slide courtesy of David 18
Relief algorithm
Relief algorithm
More on categorical outcome with numeric predictors
Categorical Outcome with Categorical Variable
Categorical Outcome with Categorical Variable Chi-square test: Large number Easy to calculate Approximation Fisher’s exact: 2 x 2 chi-squared test Small number Hard to calculate Exact
Fisher’s exact
Consequences of Using Non-informative Predictors
Embedded methods . 6 &0123 &3243##0$5 = &'' + ) * / + +,- . !"##$ = &'' + ) * / + +,- Slide courtesy of Anna
Wrapper methods Stepwise Regression Forward selection Backward elimination Bidirectional elimination
Forward selection Applied Predictive Modeling Chapter 19
Backward Selection
Filter Methods Akaike Information Criterion . Bayesian information criterion /0) 4 ) + 6 ∗ 89:(%) (/0 − 2 !"# = % log (* +,- CV (Covered Earlier by Anna)
Feature Importance Measures Often model dependent Courtesy of Scott Lundberg
A comparison of two methods for determining variable importance.
Feature Importance Measures 1.Tree SHAP. A new proposed method 2.Saabas. An individualized heuristic feature attribution method. 3.mean(|Tree SHAP|). A global attribution method based on the average magnitude of the individualized Tree SHAP attributions. 4.Gain. The same method used above in XGBoost, and also equivalent to the Gini importance measure used in scikit-learn tree models. 5.Split count. Represents both the closely related “weight” and “cover” methods in XGBoost, but is computed using the “weight” method. 6.Permutation. The resulting drop in accuracy of the model when a single feature is randomly permuted in the test data set. Courtesy of Scott Lundberg
Controlling Procedure (FWER and FDR)
Summary Various tools to dissect the relationships of predictor and outcome are available and the selection of them are subject to the question in mind. Feature selection is mostly subject to model selection except for filtering method. Most of the time, simple plotting (scatterplot, boxplot, pca) can save you a ton of time to figure out what is relevant/robust or dubious/misleading. Low complexity is mostly preferred when the primary goal is interpreting the contribution of predictor to the outcome. Sometimes, multiplicity should be controlled for multiple hypothesis test.
Thank You
Recommend
More recommend