machine learning 2
play

Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM Byron C. Wallace Clustering Four Types of Clustering 1. Centroid-based (K-means, K-medoids) Notion of Clusters: Voronoi tesselation Four Types of Clustering 2. Density-based


  1. Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM Byron C. Wallace

  2. Clustering

  3. Four Types of Clustering 1. Centroid-based (K-means, K-medoids) Notion of Clusters: Voronoi tesselation

  4. Four Types of Clustering 2. Density-based (DBSCAN, OPTICS) Notion of Clusters: Connected regions of high density

  5. Four Types of Clustering 3. Connectivity-based (Hierarchical) Notion of Clusters: Cut off dendrogram at some depth

  6. Four Types of Clustering 4. Distribution-based (Mixture Models) Notion of Clusters: Distributions on features

  7. Hierarchical Clustering

  8. Dendrogram ( a.k.a. a similarity tree ) Terminal Branch Terminal Branch Root Root Similarity of A and B is Internal Branch Internal Branch represented as height 
 Internal Node Internal Node of lowest shared 
 Leaf Leaf internal node

  9. Dendrogram ( a.k.a. a similarity tree ) Similarity of A and B is D(A,B) represented as height 
 of lowest shared 
 internal node (Bovine: 0.69395, (Spider Monkey: 0.390, (Gibbon:0.36079,(Orang: 0.33636, (Gorilla: 0.17147, 
 (Chimp: 0.19268, Human: 0.11927): 0.08386): 0.06124): 0.15057): 0.54939);

  10. Dendrogram ( a.k.a. a similarity tree ) D(A,B) Natural when measuring 
 genetic similarity, distance 
 to common ancestor (Bovine: 0.69395, (Spider Monkey: 0.390, (Gibbon:0.36079,(Orang: 0.33636, (Gorilla: 0.17147, 
 (Chimp: 0.19268, Human: 0.11927): 0.08386): 0.06124): 0.15057): 0.54939);

  11. Example: Iris data Iris Setosa Iris versicolor Iris virginica https://en.wikipedia.org/wiki/Iris_flower_data_set

  12. Hierarchical Clustering ( Euclidian Distance ) https://en.wikipedia.org/wiki/Iris_flower_data_set

  13. Edit Distance Distance Patty and Selma Change dress color, 1 point Change earring shape, 1 point Change hair part, 1 point D(Patty, Selma) = 3 Distance Marge and Selma Change dress color, 1 point Add earrings, 1 point Decrease height, 1 point Take up smoking, 1 point Lose weight, 1 point D(Marge,Selma) = 5 Can be defined for any set of discrete features

  14. Edit Distance for Strings • Transform string Q into string C , using only Similarity “Peter” and “Piotr”? Substitution , Insertion and Deletion . Substitution 1 Unit • Assume that each of these operators has a Insertion 1 Unit cost associated with it. Deletion 1 Unit • The similarity between two strings can be D ( Peter , Piotr ) is 3 defined as the cost of the cheapest transformation from Q to C. Peter Substitution (i for e) Piter Insertion (o) Pioter Deletion (e) Pedro Piotr Peter Piotr Piero Pyotr Petros Pietro Pierre

  15. Hierarchical Clustering ( Edit Distance ) Pedro (Portuguese) Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian) Cristovao (Portuguese) Christoph (German), Christophe (French), Cristobal (Spanish), Cristoforo (Italian), Kristoffer (Scandinavian), Krystof (Czech), Christopher (English) Miguel (Portuguese) Michalis (Greek), Michael (English), Mick (Irish) Cristovao Pedro Miguel Christoph n Piotr r Petros o Pierre o Peter Peka r Michalis Michael Mick Christopher e Cristobal Cristoforo Kristoffer f r o t a e r r h a o t t e d d p e s y e i a e y P P o d i P e r P t s K s P i r i r C h C

  16. Meaningful Patterns Edit distance yields clustering according to geography Slide from Eamonn Keogh Pedro ( Portuguese/Spanish ) Petros ( Greek ), Peter ( English ), Piotr ( Polish ), Peadar (Irish), Pierre ( French ), Peder ( Danish ), Peka (Hawaiian), Pietro ( Italian ), Piero ( Italian Alternative ), Petr (Czech), Pyotr ( Russian )

  17. Spurious Patterns In general clusterings will only be as meaningful as your distance metric spurious; there is no connection between the two South Georgia & Serbia & St. Helena & U.K. AUSTRALIA ANGUILLA FRANCE NIGER INDIA IRELAND BRAZIL South Sandwich Montenegro Dependencies Islands (Yugoslavia)

  18. Spurious Patterns In general clusterings will only be as meaningful as your distance metric spurious; there is no connection between the two South Georgia & Serbia & St. Helena & U.K. AUSTRALIA ANGUILLA FRANCE NIGER INDIA IRELAND BRAZIL South Sandwich Montenegro Dependencies Islands (Yugoslavia) Former UK colonies No relation

  19. “Correct” Number of Clusters to determine the “correct”

  20. “Correct” Number of Clusters to determine the “correct” Determine number of clusters by looking at distance

  21. Detecting Outliers The single isolated branch is suggestive of a data point that is very different to all others Outlier

  22. Bottom up vs. Top down Bottom-up ( agglomerative): Each item starts as its own cluster; greedily merge

  23. Bottom up vs. Top down Bottom-up ( agglomerative): Each item starts as its own cluster; greedily merge Top-down ( divisive ): Start with one big cluster (all data); recursively split

  24. Distance Matrix We begin with a distance matrix which contains the distances between every pair of objects in our 0 8 8 7 7 database. 0 2 4 4 0 3 3 D( , ) = 8 0 1 D( , ) = 1 0

  25. Bottom-up (Agglomerative Clustering) … merges… … merges… Consider Choose … all possible the best merges… 25

  26. Bottom-up (Agglomerative Clustering) … merges… Consider Choose all possible … the best merges… Consider Choose … all possible the best merges… 25

  27. Bottom-up (Agglomerative Clustering) Consider Choose all possible … the best merges… Consider Choose all possible … the best merges… Consider Choose … all possible the best merges… 25

  28. Bottom-up (Agglomerative Clustering) Consider Choose all possible … … the best merges… merges… Consider Choose all possible … the best … merges… merges… Consider Choose … all possible the best … merges… merges… 25

  29. Bottom-up (Agglomerative Clustering) Can you now implement this? Consider Choose all possible … … the best merges… merges… Consider Choose all possible … the best … merges… merges… Consider Choose … all possible the best … merges… merges… 25

  30. Bottom-up (Agglomerative Clustering) Distances between examples (can calculate using metric) Consider Choose all possible … … the best merges… merges… Consider Choose all possible … the best … merges… merges… Consider Choose … all possible the best … merges… merges… 25

  31. Bottom-up (Agglomerative Clustering) How do we calculate the 
 distance to a cluster? Consider Choose all possible … … the best merges… merges… Consider Choose all possible … the best … merges… merges… Consider Choose … all possible the best … merges… merges… 25

  32. Clustering Criteria Single link: 
 d ( A , B ) = min a ∈ A , b ∈ B d ( a , b ) (Closest point) Complete link: 
 d ( A , B ) = max a ∈ A , b ∈ B d ( a , b ) (Furthest point) 1 X Group average: 
 d ( A , B ) = d ( a , b ) | A || B | (Average distance) a ∈ A , b ∈ B µ X = 1 X Centroid: 
 d ( A , B ) = d ( µ A , µ B ) x | X | (Distance of average) x ∈ X

  33. Hierarchical Clustering Summary + No need to specify number of clusters + Hierarchical structure maps nicely onto 
 human intuition in some domains - Scaling: Time complexity at least O ( n 2 ) 
 in number of examples - Heuristic search method : 
 Local optima are a problem - Interpretation of results is (very) subjective

  34. Hierarchical Clustering Summary + No need to specify number of clusters + Hierarchical structure maps nicely onto 
 human intuition in some domains - Scaling: Time complexity at least O ( n 2 ) 
 in number of examples - Heuristic search method : 
 Local optima are a problem - Interpretation of results is (very) subjective

  35. Hierarchical Clustering Summary + No need to specify number of clusters + Hierarchical structure maps nicely onto 
 human intuition in some domains - Scaling: Time complexity at least O ( n 2 ) 
 in number of examples - Heuristic search method : 
 Local optima are a problem - Interpretation of results is (very) subjective

  36. Hierarchical Clustering Summary + No need to specify number of clusters + Hierarchical structure maps nicely onto 
 human intuition in some domains - Scaling: Time complexity at least O ( n 2 ) 
 in number of examples - Heuristic search method : 
 Local optima are a problem - Interpretation of results is (very) subjective

  37. Hierarchical Clustering Summary + No need to specify number of clusters + Hierarchical structure maps nicely onto 
 human intuition in some domains - Scaling: Time complexity at least O ( n 2 ) 
 in number of examples - Heuristic search method : 
 Local optima are a problem - Interpretation of results is (very) subjective

  38. � � � � Evaluation? 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Random DBSCAN 0.6 0.6 Points 0.5 0.5 y y 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x 1 1 0.9 0.9 K-means Complete 0.8 0.8 Link 0.7 0.7 0.6 0.6 0.5 0.5 y y 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x

Recommend


More recommend