dimensionalit y red u ction feat u re e x traction
play

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G - PowerPoint PPT Presentation

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist Uns u per v ised learning methods Principal component anal y sis ( PCA ) -->


  1. Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  2. Uns u per v ised learning methods Principal component anal y sis ( PCA ) --> Lesson 3.1 Sing u lar v al u e decomposition ( SVD ) --> Lesson 3.1 Cl u stering / gro u ping --> Lesson 3.3 E x plorator y data mining PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  3. Dimensionalit y red u ction != feat u re selection 1 2 h � ps :// slidepla y er . com / slide /9699240/ h � ps ://www. anal y tics v idh y a . com / blog /2016/03/ practical - g u ide - principal - component - anal y sis - p y thon / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  4. C u rse of dimensionalit y 1 h � ps ://www.v isiond u mm y. com /2014/04/ c u rse - dimensionalit y- a � ect - classi � cation / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  5. 1- D search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  6. 2- D search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  7. 3- D search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  8. Dimensionalit y red u ction methods PCA SVD PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  9. PCA PCA Relationship bet w een X and y Calc u lated b y � nding principal a x es Translates , rotates and scales Lo w er - dimensional projection of the data 1 h � ps :// scikit - learn . org / stable / mod u les / decomposition . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  10. SVD SVD Linear algebra and v ector calc u l u s Decomposes data matri x into three matrices Res u lts in ' sing u lar ' v al u es Variance in data appro x imatel y eq u als SS of sing u lar v al u es 1 h � ps :// gala xy datatech . com /2018/07/15/ sing u lar -v al u e - decomposition / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  11. Dimension red u ction f u nctions F u nction / method ret u rns sklearn.decomposition.PCA principal component anal y sis sklearn.decomposition.TruncatedSVD sing u lar v al u e decomposition PCA/SVD.fit_transform(X) � ts and transforms data PCA/SVD.explained_variance_ratio_ v ariance e x plained b y PCs Other matri x decomposition algorithms PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  12. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  13. Dimensionalit y red u ction : v is u ali z ation techniq u es P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  14. Wh y dimensionalit y red u ction ? 1. Speed u p ML training 2. Vis u ali z ation 3. Impro v es acc u rac y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  15. Vis u ali z ation techniq u es PCA t - SNE PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  16. Vis u ali z ing w ith PCA 1 h � ps :// districtdatalabs . sil v rback . com / principal - component - anal y sis -w ith - p y thon PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  17. Scree plot 1 h � ps :// to w ardsdatascience . com / a - step - b y- step - e x planation - of - principal - component - anal y sis - b 836 fb 9 c 97 e 2 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  18. t - SNE Probabilistic Pairs of data points Lo w- dimensional embedding Plot embeddings PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  19. Vis u ali z ing w ith t - SNE # t-sne with loan data from sklearn.manifold import TSNE # t-sne viz import seaborn as sns plt.figure(figsize=(16,10)) sns.scatterplot( loans = pd.read_csv('loans_dataset.csv') x="t-SNE-PC-one", y="t-SNE-PC-two", hue="Loan Status", # Feature matrix palette=sns.color_palette(["grey","blue"]), X = loans.drop('Loan Status', axis=1) data=loans, legend="full", tsne = TSNE(n_components=2, verbose=1, perplexity=40) alpha=0.3 tsne_results = tsne.fit_transform(X) ) loans['t-SNE-PC-one'] = tsne_results[:,0] loans['t-SNE-PC-two'] = tsne_results[:,1] 1 h � ps :// scikit - learn . org / stable / mod u les / generated / sklearn . manifold . TSNE . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  20. Vis u ali z ing w ith t - SNE PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  21. PCA v s t - SNE digits data 1 h � ps :// to w ardsdatascience . com /v is u alising - high - dimensional - datasets -u sing - pca - and - t - sne - in - p y thon - 8 ef 87 e 7915 b PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  22. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  23. Cl u stering anal y sis : selecting the right cl u stering algorithm P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  24. Cl u stering algorithms Feat u res >> Obser v ations Model training more challenging Rel y on distance calc u lations Most commonl y u sed u ns u per v ised techniq u e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  25. Practical applications of cl u stering C u stomer segmentation Doc u ment classi � cation Ins u rance / transaction fra u d detection Image segmentation Anomal y detection Man y more ... PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  26. Distance metrics : Manhattan ( ta x icab ) distance 1 h � ps :// en .w ikipedia . org /w iki / Ta x icab _ geometr y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  27. Distance metrics : E u clidian distance 1 h � p :// rosalind . info / glossar y/ e u clidean - distance / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  28. K - means 1. Initial centroids 2. Assign each obser v ation to nearest centroid 3. Create ne w centroids 4. Repeat steps 2 and 3 1 h � p :// sherr y to w ers . com /2013/10/24/ k - means - cl u stering / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  29. Hierarchical agglomerati v e cl u stering 1 h � ps ://www. datano v ia . com / en / lessons / agglomerati v e - hierarchical - cl u stering / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  30. Agglomerati v e cl u stering linkage Ward linkage Ma x im u m / complete linkage A v erage linkage Single linkage PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  31. Selecting a cl u stering algorithm Cl u ster stabilit y assessment K - means and HC u se E u clidian distance Inter - and intra - cl u ster distances " An appropriate dissimilarit y meas u re is far more important in obtaining s u ccess w ith cl u stering than choice of cl u stering algorithm ." - from Elements of Statistical Learning 1 h � ps :// slidepla y er . com / slide /8363774/ PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  32. Cl u stering f u nctions F u nction / method ret u rns sklearn.cluster.Kmeans K - Means cl u stering algorithm sklearn.cluster.AgglomerativeClustering Agglomerati v e cl u stering algorithm kmeans.inertia_ SS distances of obser v ations to closest cl u ster center scipy.cluster.hierarchy as sch Hierachical cl u stering for dendrograms sch.dendrogram() Dendrogram f u nction PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  33. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  34. Cl u stering anal y sis : choosing the optimal n u mber of cl u sters P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  35. Methods for optimal k Silho u e � e method Elbo w method PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  36. Silho u ette coefficient Composed of 2 scores Mean distance bet w een each obser v ation and all others : in the same cl u ster in the nearest cl u ster PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  37. Silho u ette coefficient v al u es Bet w een -1 and 1 1 near others in same cl u ster v er y far from others in other cl u sters -1 not near others in same cl u ster close to others in other cl u sters 0 denotes o v erlapping cl u sters PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  38. Silho u ette score 1 h � ps :// scikit - learn . org / stable / a u to _ e x amples / cl u ster / plot _ kmeans _ silho u e � e _ anal y sis . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  39. Elbo w method 1 h � ps ://www. datano v ia . com / en / lessons / determining - the - optimal - n u mber - of - cl u sters -3- m u st - kno w- methods / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  40. Optimal k selection f u nctions F u nction / method ret u rns sklearn.cluster.KMeans K - Means cl u stering algorithm sklearn.metrics.silhouette_score score bet w een -1 and 1 as meas u re of cl u ster stabilit y kmeans.inertia_ SS distances of obser v ations to closest cl u ster center range(start, stop) list of v al u es beginning w ith start , u p to b u t not incl u ding stop list.append(kmeans.inertia_) appends inertia v al u e to list PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  41. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

Recommend


More recommend