wh y generate feat u res
play

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC - PowerPoint PPT Presentation

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e Feat u re Engineering FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON


  1. Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e

  2. Feat u re Engineering FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  3. Different t y pes of data Contin u o u s : either integers ( or w hole n u mbers ) or � oats ( decimals ) Categorical : one of a limited set of v al u es , e . g . gender , co u ntr y of birth Ordinal : ranked v al u es , o � en w ith no detail of distance bet w een them Boolean : Tr u e / False v al u es Datetime : dates and times FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  4. Co u rse str u ct u re Chapter 1: Feat u re creation and e x traction Chapter 2: Engineering mess y data Chapter 3: Feat u re normali z ation Chapter 4: Working w ith te x t feat u res FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  5. Pandas import pandas as pd df = pd.read_csv(path_to_csv_file) print(df.head()) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  6. Dataset SurveyDate \ 0 2018-02-28 20:20:00 1 2018-06-28 13:26:00 2 2018-06-06 03:37:00 3 2018-05-09 01:06:00 4 2018-04-12 22:41:00 FormalEducation 0 Bachelor's degree (BA. BS. B.Eng.. etc.) 1 Bachelor's degree (BA. BS. B.Eng.. etc.) 2 Bachelor's degree (BA. BS. B.Eng.. etc.) 3 Some college/university study ... 4 Bachelor's degree (BA. BS. B.Eng.. etc.) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  7. Col u mn names print(df.columns) Index(['SurveyDate', 'FormalEducation', 'ConvertedSalary', 'Hobby', 'Country', 'StackOverflowJobsRecommend', 'VersionControl', 'Age', 'Years Experience', 'Gender', 'RawSalary'], dtype='object') FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  8. Col u mn t y pes print(df.dtypes) SurveyDate object FormalEducation object ConvertedSalary float64 ... Years Experience int64 Gender object RawSalary object dtype: object FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  9. Selecting specific data t y pes only_ints = df.select_dtypes(include=['int']) print(only_ints.columns) Index(['Age', 'Years Experience'], dtype='object') FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  10. Lets get going ! FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

  11. Dealing w ith Categorical Variables FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e

  12. Encoding categorical feat u res FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  13. Encoding categorical feat u res FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  14. Encoding categorical feat u res One - hot encoding D u mm y encoding FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  15. One - hot encoding pd.get_dummies(df, columns=['Country'], prefix='C') C_France C_India C_UK C_USA 0 0 1 0 0 1 0 0 0 1 2 0 0 1 0 3 0 0 1 0 4 1 0 0 0 FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  16. D u mm y encoding pd.get_dummies(df, columns=['Country'], drop_first=True, prefix='C') C_India C_UK C_USA 0 1 0 0 1 0 0 1 2 0 1 0 3 0 1 0 4 0 0 0 FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  17. One - hot v s . d u mmies One - hot encoding : E x plainable feat u res D u mm y encoding : Necessar y information w itho u t d u plication FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  18. Inde x Se x 0 Male 1 Female 2 Male Inde x Male Female Inde x Male 0 1 0 0 1 1 0 1 1 0 2 1 0 2 1 FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  19. Limiting y o u r col u mns counts = df['Country'].value_counts() print(counts) 'USA' 8 'UK' 6 'India' 2 'France' 1 Name: Country, dtype: object FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  20. Limiting y o u r col u mns mask = df['Country'].isin(counts[counts < 5].index) df['Country'][mask] = 'Other' print(pd.value_counts(colors)) 'USA' 8 'UK' 6 'Other' 3 Name: Country, dtype: object FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  21. No w y o u deal w ith categorical v ariables FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

  22. N u meric v ariables FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON Robert O ' Callaghan Director of Data Science , Ordergroo v e

  23. T y pes of n u meric feat u res Age Price Co u nts Geospatial data FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  24. Does si z e matter ? FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  25. Binari z ing n u meric v ariables df['Binary_Violation'] = 0 df.loc[df['Number_of_Violations'] > 0, 'Binary_Violation'] = 1 FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  26. Binari z ing n u meric v ariables FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  27. Binning n u meric v ariables import numpy as np df['Binned_Group'] = pd.cut( df['Number_of_Violations'], bins=[-np.inf, 0, 2, np.inf], labels=[1, 2, 3] ) FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  28. Binning n u meric v ariables FEATURE ENGINEERING FOR MACHINE LEARNING IN PYTHON

  29. Lets start practicing ! FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

Recommend


More recommend