Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist Selecting the correct feat u res : Red u ces o v er ing Impro v es acc u rac y Increases

  1. Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  2. Selecting the correct feat u res : Red u ces o v er � � ing Impro v es acc u rac y Increases interpretabilit y Red u ces training time 1 h � ps ://www. anal y ticsindiamag . com /w hat - are - feat u re - selection - techniq u es - in - machine - learning / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  3. Feat u re selection methods Filter : Rank feat u res based on statistical performance Wrapper : Use an ML method to e v al u ate performance Embedded : Iterati v e model training to e x tract feat u res Feat u re importance : tree - based ML models PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  4. Compare and contrast methods Method Use an ML model Select best s u bset Can o v er � t Filter No No No Wrapper Yes Yes Sometimes Embedded Yes Yes Yes Feat u re importance Yes Yes Yes PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  5. Correlation coefficient statistical tests Feat u re / Response Contin u o u s Categorical Contin u o u s Pearson ' s Correlation LDA Categorical ANOVA Chi - Sq u are PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  6. Filter f u nctions F u nction ret u rns df.corr() Pearson ' s correlation matri x sns.heatmap(corr_object) heatmap plot abs() absol u te v al u e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  7. Wrapper methods 1. For w ard selection ( LARS - least angle regression ) Starts w ith no feat u res , adds one at a time 2. Back w ard elimination Starts w ith all feat u res , eliminates one at a time 3. For w ard selection / back w ard elimination combination ( bidirectional elimination ) 4. Rec u rsi v e feat u re elimination RFECV PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  8. Embedded methods 1. Lasso Regression 2. Ridge Regression 3. ElasticNet PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  9. Tree - based feat u re importance methods Random Forest --> sklearn.ensemble.RandomForestRegressor E x tra Trees --> sklearn.ensemble.ExtraTreesRegressor A � er model � t --> tree_mod.feature_importances_ PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  10. F u nction ret u rns sklearn.svm.SVR s u pport v ector regression estimator sklearn.feature_selection.RFECV rec u rsi v e feat u re elimination w ith cross -v al rfe_mod.support_ boolean arra y of selected feat u res ref_mod.ranking_ feat u re ranking , selected =1 sklearn.linear_model.LinearRegression linear model estimator sklearn.linear_model.LarsCV least angle regression w ith cross -v al LarsCV.score r - sq u ared score LarsCV.alpha_ estimated reg u lari z ation parameter PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  12. Regression : reg u lari z ation P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  13. Reg u lari z ation algorithms Ridge regression Lasso regression ElasticNet regression PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  14. Ordinar y least sq u ares 1 h � ps :// en .w ikipedia . org /w iki / Linear _ regression # Simple _ and _ m u ltiple _ linear _ regression PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  15. Ridge loss f u nction 1 h � ps :// gerardnico . com / data _ mining / ridge _ regression # t u ning _ parameter _ math _ lambdamath PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  16. Lasso loss f u nction 1 h � ps :// stats . stacke x change . com / q u estions /155192/w h y- discrepanc y- bet w een - lasso - and - randomforest PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  17. Ridge v s lasso Reg u lari z ation L 1 ( Lasso ) L 2 ( Ridge ) penali z es s u m of absol u te v al u e of coe � cients s u m of sq u ares of coe � cients sol u tions sparse non - sparse n u mber of sol u tions m u ltiple one feat u re selection y es no rob u st to o u tliers ? y es no comple x pa � erns ? no y es PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON


  19. Reg u lari z ation w ith Boston ho u sing data Feat u res CHAS NOX RM Coe � cient estimates 2.7 -17.8 3.8 Reg u lari z ed coe � cient estimates 0 0 0.95 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  20. Reg u lari z ation f u nctions # Lasso estimator # ElasticNet estimator with cross-validation sklearn.linear_model.Lasso sklearn.linear_model.ElasticNetCV # Lasso estimator with cross-validation # Train/test split sklearn.linear_model.LassoCV sklearn.model_selection.train_test_split # Ridge estimator # Mean squared error sklearn.linear_model.Ridge sklearn.metrics.mean_squared_error(y_test, predict(X_test)) # Ridge estimator with cross-validation # Best regularization parameter sklearn.linear_model.RidgeCV mod_cv.alpha_ # ElasticNet estimator # Array of log values sklearn.linear_model.ElasticNet alphas=np.logspace(-6, 6, 13) PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  22. Classification : feat u re engineering P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  23. Feat u re engineering ...w h y? E x tracts additional information from the data Creates additional rele v ant feat u res One of the most e � ecti v e w a y s to impro v e predicti v e models PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  24. Benefits of feat u re engineering Increased predicti v e po w er of the learning algorithm Makes y o u r machine learning models perform e v en be � er ! PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  25. T y pes of feat u re engineering Indicator v ariables Interaction feat u res Feat u re representation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  26. Indicator v ariables Threshold indicator age : high school v s college M u ltiple feat u res u sed as a � ag Special e v ents black Frida y Christmas Gro u ps of classes w ebsite tra � c paid � ag Google ad w ords {4}} Facebook ads PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  27. Interaction feat u res S u m Di � erence Prod u ct Q u otient Other mathematical combos PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  28. Feat u re representation Datetime stamps Da y of w eek Ho u r of da y Gro u ping categorical le v els into ' Other ' Transform categorical to d u mm y v ariables ( k - 1) binar y col u mns PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  29. Different categorical le v els Training data : model trained w ith [ red , bl u e , green ] Test data : model test w ith [ red , green , y ello w] additional color not seen in training one color missing Rob u st one - hot encoding 1 h � ps :// blog . cambridgespark . com / rob u st - one - hot - encoding - in - p y thon -3 e 29 bfcec 77 e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  30. Debt to income ratio Monthly Debt Annual Income/12 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  31. Feat u re engineering f u nctions F u nction ret u rns sklearn.linear_model.LogisticRegression logistic regression sklearn.model_selection.train_test_split train / test split f u nction sns.countplot(x='Loan Status', data=data) bar plot df.drop(['Feature 1', 'Feature 2'], axis=1) drops list of feat u res df["Loan Status"].replace({'Paid': 0, 'Not Paid': 1}) Loan Status as integers pd.get_dummies() k - 1 binar y feat u res sklearn.metrics.accuracy_score(y_test, predict(X_test)) model acc u rac y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  32. An e x cellent t u torial : Datacamp article : categorical data PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  34. Ensemble methods P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  35. Ensemble learning techniq u es B ootstrap Agg regation Boosting Model stacking PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON






  41. Bias Linear relationship ass u mption ( incorrect ) High bias Under � � ing Poor model generali z ation Increasing comple x it y decreases bias PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON


  43. Variance High comple x it y models : High v ariance O v er � � ing Poor model generali z ation PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON


