Feature Selection ZHI LI Fenys Lab October 3, 2019 What is - PowerPoint PPT Presentation

Feature Selection ZHI LI Fenyö’s Lab October 3, 2019

What is Feature? X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of..

What methods are out there? Depend on the way to combine selection algorithm and the model building Filter methods: Correlation Embedded methods: Regularization Wrapper methods: Forward selection

Filter methods • Numeric Outcomes: • numeric predictors: • correlation (linear, nonlinear); distance • mutual information • categorical predictors: • t-statistics • ANOVA (multiple predictors) • mutual information • Categorical Outcomes: • numeric predictors: • ROC • Relief • mutual information • categorical predictors: • Chi-square • Fisher’s exact • Odds ratio

<latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="dj3hkpHxv1FJXYQLYTvJwAm8HYA=">ACJ3icbVDLSsNAFJ34rPUVdelmsAhuLIkIulGKblxWsA9oQphMJ+3QySTMTKwh5G/c+CtuBXRpX/ipM1CWw8MHM45lzv3+DGjUlnWl7GwuLS8slpZq65vbG5tmzu7bRklApMWjlgkuj6ShFOWoqRrqxICj0Gen4o+vC79wTIWnE71QaEzdEA04DipHSkmdeOkOksrFn5fACOoFAOLPzjOeOTEKYehQewyIBx549Zz941DNrVt2aAM4TuyQ1UKLpma9OP8JSLjCDEnZs61YuRkSimJG8qTSBIjPEID0tOUo5BIN5vcmcNDrfRhEAn9uIT9fdEhkIp09DXyRCpoZz1CvE/r5eo4NzNKI8TRTieLgoSBlUEi9JgnwqCFUs1QVhQ/VeIh0i3oXS1V2CPXvyPGmf1G2rbt+e1hpXZR0VsA8OwBGwRlogBvQBC2AwSN4Bm/g3XgyXowP43MaXTDKmT3wB8b3D2lKpak=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> <latexit sha1_base64="8NMR3DQykHLC3AXxNd72Q1bRGjw=">ACg3icbVHLSsNAFJ3EV62vqks3g0WpqDUpgm4KohuXClYrTQ2T6aQdnEzCzE01hPyIn+XOv3Fao/i6cOFwzn3MnBskgmtwnDfLnpmdm1+oLFaXldW12rG7c6ThVlHRqLWHUDopngknWAg2DdRDESBYLdBY8XE/1uzJTmsbyBLGH9iAwlDzklYCi/9uKNCORPhe/iNvZCRWgusafTCD/7HGcmD7EnWAiNT9JTfDiCvW9k9kUWufxZ/NAqlcN/hzy0iq+1NB43ugf3ZsaYqEZ3r/BrdafpTAP/BW4J6qiMK7/26g1imkZMAhVE657rJNDPiQJOBSuqXqpZQugjGbKegZJETPfzqYcF3jHMAIexMikBT9nvHTmJtM6iwFRGBEb6tzYh/9N6KYSn/ZzLJAUm6ceiMBUYjw5CB5wxSiIzABCFTdvxXREjCNgzlY1Jri/v/wX3LartN0r4/rZ+elHRW0hbZRA7noBJ2hS3SFOohayNq1jizHnrP37Z9/FqW2XPJvoRdvsdpxfBNw=</latexit> Minimizing the loss function, L (sum of squared errors/MSR): 5 w 1 = n P x i y i − ( P x i ) ( P y i ) = cov ( X, Y ) n 2 ˆ n ( P x 2 i ) − ( P x i ) 2 var ( X ) n 2 w 0 = 1 1 X X ˆ y i − ˆ w 1 x i n n Slide courtesy of Wilson with minor adaption 5

Pearson’s correlation

Feature Importance Measures Correlation Based Feature selection

Spearman’s Correlation

Spearman’s Correlation Numerator: Denominator: Courtesy of Glen_b from stackexchange

Other non-linear Measures

Other non-linear Measures • MIC

Filter methods

t-statistics (categorical Predictor)

Categorical Outcome

Evaluation of Binary Classification Models Predicted 0 1 False True 0 18 Negative Positive Actual False True 1 Negative Positive • False Positive Rate = FP/(FP+TN) – fraction of label 0 predicted to be label 1 • Accuracy = (TP+TN)/total - fraction of correct prediction • Precision = TP/(TP+FP) – fraction of correct among positive predictions • Sensitivity = TP/(TP+FN) – fraction of correct predictions among label 1. Also called true positive rate and recall. • Specificity = TN/(TN+FP) – fraction of correct predictions among label 0 Slide courtesy of David 18

Relief algorithm

More on categorical outcome with numeric predictors

Categorical Outcome with Categorical Variable

Categorical Outcome with Categorical Variable Chi-square test: Large number Easy to calculate Approximation Fisher’s exact: 2 x 2 chi-squared test Small number Hard to calculate Exact

Fisher’s exact

Consequences of Using Non-informative Predictors

Embedded methods . 6 &0123 &3243##0$5 = &'' + ) * / + +,- . !"##$ = &'' + ) * / + +,- Slide courtesy of Anna

Wrapper methods Stepwise Regression Forward selection Backward elimination Bidirectional elimination

Forward selection Applied Predictive Modeling Chapter 19

Backward Selection

Filter Methods Akaike Information Criterion . Bayesian information criterion /0) 4 ) + 6 ∗ 89:(%) (/0 − 2 !"# = % log (* +,- CV (Covered Earlier by Anna)

Feature Importance Measures Often model dependent Courtesy of Scott Lundberg

A comparison of two methods for determining variable importance.

Feature Importance Measures 1.Tree SHAP. A new proposed method 2.Saabas. An individualized heuristic feature attribution method. 3.mean(|Tree SHAP|). A global attribution method based on the average magnitude of the individualized Tree SHAP attributions. 4.Gain. The same method used above in XGBoost, and also equivalent to the Gini importance measure used in scikit-learn tree models. 5.Split count. Represents both the closely related “weight” and “cover” methods in XGBoost, but is computed using the “weight” method. 6.Permutation. The resulting drop in accuracy of the model when a single feature is randomly permuted in the test data set. Courtesy of Scott Lundberg

Controlling Procedure (FWER and FDR)

Summary Various tools to dissect the relationships of predictor and outcome are available and the selection of them are subject to the question in mind. Feature selection is mostly subject to model selection except for filtering method. Most of the time, simple plotting (scatterplot, boxplot, pca) can save you a ton of time to figure out what is relevant/robust or dubious/misleading. Low complexity is mostly preferred when the primary goal is interpreting the contribution of predictor to the outcome. Sometimes, multiplicity should be controlled for multiple hypothesis test.

Thank You

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is - PowerPoint PPT Presentation

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is Feature? X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of.. What methods are out there?

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Structures, Unification Some grammatical phenomena Linguistic features Feature

Insights into the Treatment of SBD With Imaging Richard J. Schwab, M.D. Professor of Medicine

Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where

Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics

Cluster approach for EEG analysis: predicting upcoming sensorimotor event. Maria Luiza Rangel

Estimating CIs on proportions How confident can I be in my estimate? (e.g., 0 of 10 vs. 0

The intersection axiom of conditional independence : some new results Richard D. Gill

When its better to ask forgiveness than get permission Chris Thompson, Maritza Johnson, Serge

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is - PowerPoint PPT Presentation

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is Feature? X (Independent) Variable Predictor Covariate Face; leg; tail; Hair texture/style Impeachment Trade war Whatever you can think of.. What methods are out there?

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Structures, Unification Some grammatical phenomena Linguistic features Feature

Insights into the Treatment of SBD With Imaging Richard J. Schwab, M.D. Professor of Medicine

Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where

Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics

Cluster approach for EEG analysis: predicting upcoming sensorimotor event. Maria Luiza Rangel

Estimating CIs on proportions How confident can I be in my estimate? (e.g., 0 of 10 vs. 0

The intersection axiom of conditional independence : some new results Richard D. Gill

When its better to ask forgiveness than get permission Chris Thompson, Maritza Johnson, Serge

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani