vis u ali z ing hierarchies
play

Vis u ali z ing hierarchies U N SU P E R VISE D L E AR N IN G IN - PowerPoint PPT Presentation

Vis u ali z ing hierarchies U N SU P E R VISE D L E AR N IN G IN P YTH ON Benjamin Wilson Director of Research at lateral . io Vis u ali z ations comm u nicate insight " t - SNE " : Creates a 2 D map of a dataset ( later ) "


  1. Vis u ali z ing hierarchies U N SU P E R VISE D L E AR N IN G IN P YTH ON Benjamin Wilson Director of Research at lateral . io

  2. Vis u ali z ations comm u nicate insight " t - SNE " : Creates a 2 D map of a dataset ( later ) " Hierarchical cl u stering " ( this v ideo ) UNSUPERVISED LEARNING IN PYTHON

  3. A hierarch y of gro u ps Gro u ps of li v ing things can form a hierarch y Cl u sters are contained in one another UNSUPERVISED LEARNING IN PYTHON

  4. E u ro v ision scoring dataset Co u ntries ga v e scores to songs performed at the E u ro v ision 2016 2 D arra y of scores Ro w s are co u ntries , col u mns are songs 1 h � p ://www. e u ro v ision . t v/ page / res u lts UNSUPERVISED LEARNING IN PYTHON

  5. Hierarchical cl u stering of v oting co u ntries UNSUPERVISED LEARNING IN PYTHON

  6. Hierarchical cl u stering E v er y co u ntr y begins in a separate cl u ster At each step , the t w o closest cl u sters are merged Contin u e u ntil all co u ntries in a single cl u ster This is " agglomerati v e " hierarchical cl u stering UNSUPERVISED LEARNING IN PYTHON

  7. The dendrogram of a hierarchical cl u stering Read from the bo � om u p Vertical lines represent cl u sters UNSUPERVISED LEARNING IN PYTHON

  8. The dendrogram of a hierarchical cl u stering Read from the bo � om u p Vertical lines represent cl u sters UNSUPERVISED LEARNING IN PYTHON

  9. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  10. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  11. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  12. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  13. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  14. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  15. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  16. Hierarchical cl u stering w ith SciP y Gi v en samples ( the arra y of scores ), and country_names import matplotlib.pyplot as plt from scipy.cluster.hierarchy import linkage, dendrogram mergings = linkage(samples, method='complete') dendrogram(mergings, labels=country_names, leaf_rotation=90, leaf_font_size=6) plt.show() UNSUPERVISED LEARNING IN PYTHON

  17. Let ' s practice ! U N SU P E R VISE D L E AR N IN G IN P YTH ON

  18. Cl u ster labels in hierarchical cl u stering U N SU P E R VISE D L E AR N IN G IN P YTH ON Benjamin Wilson Director of Research at lateral . io

  19. Cl u ster labels in hierarchical cl u stering Not onl y a v is u ali z ation tool ! Cl u ster labels at an y intermediate stage can be reco v ered For u se in e . g . cross - tab u lations UNSUPERVISED LEARNING IN PYTHON

  20. Intermediate cl u sterings & height on dendrogram E . g . at height 15: B u lgaria , C y pr u s , Greece are one cl u ster R u ssia and Moldo v a are another Armenia in a cl u ster on its o w n UNSUPERVISED LEARNING IN PYTHON

  21. Dendrograms sho w cl u ster distances Height on dendrogram = distance bet w een merging cl u sters E . g . cl u sters w ith onl y C y pr u s and Greece had distance appro x. 6 UNSUPERVISED LEARNING IN PYTHON

  22. Dendrograms sho w cl u ster distances Height on dendrogram = distance bet w een merging cl u sters E . g . cl u sters w ith onl y C y pr u s and Greece had distance appro x. 6 This ne w cl u ster distance appro x. 12 from cl u ster w ith onl y B u lgaria UNSUPERVISED LEARNING IN PYTHON

  23. Intermediate cl u sterings & height on dendrogram Height on dendrogram speci � es ma x. distance bet w een merging cl u sters Don ' t merge cl u sters f u rther apart than this ( e . g . 15) UNSUPERVISED LEARNING IN PYTHON

  24. Distance bet w een cl u sters De � ned b y a " linkage method " In " complete " linkage : distance bet w een cl u sters is ma x. distance bet w een their samples Speci � ed v ia method parameter , e . g . linkage ( samples , method =" complete ") Di � erent linkage method , di � erent hierarchical cl u stering ! UNSUPERVISED LEARNING IN PYTHON

  25. E x tracting cl u ster labels Use the fcluster() f u nction Ret u rns a N u mP y arra y of cl u ster labels UNSUPERVISED LEARNING IN PYTHON

  26. E x tracting cl u ster labels u sing fcl u ster from scipy.cluster.hierarchy import linkage mergings = linkage(samples, method='complete') from scipy.cluster.hierarchy import fcluster labels = fcluster(mergings, 15, criterion='distance') print(labels) [ 9 8 11 20 2 1 17 14 ... ] UNSUPERVISED LEARNING IN PYTHON

  27. Aligning cl u ster labels w ith co u ntr y names Gi v en a list of strings country_names : import pandas as pd pairs = pd.DataFrame({'labels': labels, 'countries': country_names} print(pairs.sort_values('labels')) countries labels 5 Belarus 1 40 Ukraine 1 ... 36 Spain 5 8 Bulgaria 6 19 Greece 6 10 Cyprus 6 28 Moldova 7 ... UNSUPERVISED LEARNING IN PYTHON

  28. Let ' s practice ! U N SU P E R VISE D L E AR N IN G IN P YTH ON

  29. t - SNE for 2- dimensional maps U N SU P E R VISE D L E AR N IN G IN P YTH ON Benjamin Wilson Director of Research at lateral . io

  30. t - SNE for 2- dimensional maps t - SNE = " t - distrib u ted stochastic neighbor embedding " Maps samples to 2 D space ( or 3 D ) Map appro x imatel y preser v es nearness of samples Great for inspecting datasets UNSUPERVISED LEARNING IN PYTHON

  31. t - SNE on the iris dataset Iris dataset has 4 meas u rements , so samples are 4- dimensional t - SNE maps samples to 2 D space t - SNE didn ' t kno w that there w ere di � erent species ... y et kept the species mostl y separate UNSUPERVISED LEARNING IN PYTHON

  32. Interpreting t - SNE scatter plots "v ersicolor " and "v irginica " harder to disting u ish from one another Consistent w ith k - means inertia plot : co u ld arg u e for 2 cl u sters , or for 3 UNSUPERVISED LEARNING IN PYTHON

  33. t - SNE in sklearn 2 D N u mP y arra y samples print(samples) [[ 5. 3.3 1.4 0.2] [ 5. 3.5 1.3 0.3] [ 4.9 2.4 3.3 1. ] [ 6.3 2.8 5.1 1.5] ... [ 4.9 3.1 1.5 0.1]] List species gi v ing species of labels as n u mber (0, 1, or 2) print(species) [0, 0, 1, 2, ..., 0] UNSUPERVISED LEARNING IN PYTHON

  34. t - SNE in sklearn import matplotlib.pyplot as plt from sklearn.manifold import TSNE model = TSNE(learning_rate=100) transformed = model.fit_transform(samples) xs = transformed[:,0] ys = transformed[:,1] plt.scatter(xs, ys, c=species) plt.show() UNSUPERVISED LEARNING IN PYTHON

  35. t - SNE has onl y fit _ transform () Has a fit_transform() method Sim u ltaneo u sl y � ts the model and transforms the data Has no separate fit() or transform() methods Can ' t e x tend the map to incl u de ne w data samples M u st start o v er each time ! UNSUPERVISED LEARNING IN PYTHON

  36. t - SNE learning rate Choose learning rate for the dataset Wrong choice : points b u nch together Tr y v al u es bet w een 50 and 200 UNSUPERVISED LEARNING IN PYTHON

  37. Different e v er y time t - SNE feat u res are di � erent e v er y time Piedmont w ines , 3 r u ns , 3 di � erent sca � er plots ! ... ho w e v er : The w ine v arieties (= colors ) ha v e same position relati v e to one another UNSUPERVISED LEARNING IN PYTHON

  38. Let ' s practice ! U N SU P E R VISE D L E AR N IN G IN P YTH ON

Recommend


More recommend