Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist
Selecting the correct feat u res : Red u ces o v er � � ing Impro v es acc u rac y Increases interpretabilit y Red u ces training time 1 h � ps ://www. anal y ticsindiamag . com /w hat - are - feat u re - selection - techniq u es - in - machine - learning / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Feat u re selection methods Filter : Rank feat u res based on statistical performance Wrapper : Use an ML method to e v al u ate performance Embedded : Iterati v e model training to e x tract feat u res Feat u re importance : tree - based ML models PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Compare and contrast methods Method Use an ML model Select best s u bset Can o v er � t Filter No No No Wrapper Yes Yes Sometimes Embedded Yes Yes Yes Feat u re importance Yes Yes Yes PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Correlation coefficient statistical tests Feat u re / Response Contin u o u s Categorical Contin u o u s Pearson ' s Correlation LDA Categorical ANOVA Chi - Sq u are PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Filter f u nctions F u nction ret u rns df.corr() Pearson ' s correlation matri x sns.heatmap(corr_object) heatmap plot abs() absol u te v al u e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Wrapper methods 1. For w ard selection ( LARS - least angle regression ) Starts w ith no feat u res , adds one at a time 2. Back w ard elimination Starts w ith all feat u res , eliminates one at a time 3. For w ard selection / back w ard elimination combination ( bidirectional elimination ) 4. Rec u rsi v e feat u re elimination RFECV PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Embedded methods 1. Lasso Regression 2. Ridge Regression 3. ElasticNet PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Tree - based feat u re importance methods Random Forest --> sklearn.ensemble.RandomForestRegressor E x tra Trees --> sklearn.ensemble.ExtraTreesRegressor A � er model � t --> tree_mod.feature_importances_ PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
F u nction ret u rns sklearn.svm.SVR s u pport v ector regression estimator sklearn.feature_selection.RFECV rec u rsi v e feat u re elimination w ith cross -v al rfe_mod.support_ boolean arra y of selected feat u res ref_mod.ranking_ feat u re ranking , selected =1 sklearn.linear_model.LinearRegression linear model estimator sklearn.linear_model.LarsCV least angle regression w ith cross -v al LarsCV.score r - sq u ared score LarsCV.alpha_ estimated reg u lari z ation parameter PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON
Regression : reg u lari z ation P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist
Reg u lari z ation algorithms Ridge regression Lasso regression ElasticNet regression PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Ordinar y least sq u ares 1 h � ps :// en .w ikipedia . org /w iki / Linear _ regression # Simple _ and _ m u ltiple _ linear _ regression PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Ridge loss f u nction 1 h � ps :// gerardnico . com / data _ mining / ridge _ regression # t u ning _ parameter _ math _ lambdamath PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Lasso loss f u nction 1 h � ps :// stats . stacke x change . com / q u estions /155192/w h y- discrepanc y- bet w een - lasso - and - randomforest PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Ridge v s lasso Reg u lari z ation L 1 ( Lasso ) L 2 ( Ridge ) penali z es s u m of absol u te v al u e of coe � cients s u m of sq u ares of coe � cients sol u tions sparse non - sparse n u mber of sol u tions m u ltiple one feat u re selection y es no rob u st to o u tliers ? y es no comple x pa � erns ? no y es PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
ElasticNet PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Reg u lari z ation w ith Boston ho u sing data Feat u res CHAS NOX RM Coe � cient estimates 2.7 -17.8 3.8 Reg u lari z ed coe � cient estimates 0 0 0.95 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Reg u lari z ation f u nctions # Lasso estimator # ElasticNet estimator with cross-validation sklearn.linear_model.Lasso sklearn.linear_model.ElasticNetCV # Lasso estimator with cross-validation # Train/test split sklearn.linear_model.LassoCV sklearn.model_selection.train_test_split # Ridge estimator # Mean squared error sklearn.linear_model.Ridge sklearn.metrics.mean_squared_error(y_test, predict(X_test)) # Ridge estimator with cross-validation # Best regularization parameter sklearn.linear_model.RidgeCV mod_cv.alpha_ # ElasticNet estimator # Array of log values sklearn.linear_model.ElasticNet alphas=np.logspace(-6, 6, 13) PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON
Classification : feat u re engineering P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist
Feat u re engineering ...w h y? E x tracts additional information from the data Creates additional rele v ant feat u res One of the most e � ecti v e w a y s to impro v e predicti v e models PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Benefits of feat u re engineering Increased predicti v e po w er of the learning algorithm Makes y o u r machine learning models perform e v en be � er ! PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
T y pes of feat u re engineering Indicator v ariables Interaction feat u res Feat u re representation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Indicator v ariables Threshold indicator age : high school v s college M u ltiple feat u res u sed as a � ag Special e v ents black Frida y Christmas Gro u ps of classes w ebsite tra � c paid � ag Google ad w ords {4}} Facebook ads PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Interaction feat u res S u m Di � erence Prod u ct Q u otient Other mathematical combos PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Feat u re representation Datetime stamps Da y of w eek Ho u r of da y Gro u ping categorical le v els into ' Other ' Transform categorical to d u mm y v ariables ( k - 1) binar y col u mns PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Different categorical le v els Training data : model trained w ith [ red , bl u e , green ] Test data : model test w ith [ red , green , y ello w] additional color not seen in training one color missing Rob u st one - hot encoding 1 h � ps :// blog . cambridgespark . com / rob u st - one - hot - encoding - in - p y thon -3 e 29 bfcec 77 e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Debt to income ratio Monthly Debt Annual Income/12 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Feat u re engineering f u nctions F u nction ret u rns sklearn.linear_model.LogisticRegression logistic regression sklearn.model_selection.train_test_split train / test split f u nction sns.countplot(x='Loan Status', data=data) bar plot df.drop(['Feature 1', 'Feature 2'], axis=1) drops list of feat u res df["Loan Status"].replace({'Paid': 0, 'Not Paid': 1}) Loan Status as integers pd.get_dummies() k - 1 binar y feat u res sklearn.metrics.accuracy_score(y_test, predict(X_test)) model acc u rac y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
An e x cellent t u torial : Datacamp article : categorical data PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON
Ensemble methods P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist
Ensemble learning techniq u es B ootstrap Agg regation Boosting Model stacking PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Error meas u rement PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Short trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Tall trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Fat trees PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Linear model PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Bias Linear relationship ass u mption ( incorrect ) High bias Under � � ing Poor model generali z ation Increasing comple x it y decreases bias PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Comple x model PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Variance High comple x it y models : High v ariance O v er � � ing Poor model generali z ation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON
Recommend
More recommend