department of computer science csci 5622 machine learning
play

Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 Learning objectives Learn about general clustering Learn about the K-Means


  1. Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1

  2. Learning objectives • Learn about general clustering • Learn about the K-Means algorithm • Learn about Gaussian Mixture Models 2

  3. Supervised learning Unsupervised learning Data: X Labels: Y Data: X Latent structure: Z 3

  4. Clustering • One important unsupervised method is clustering • Goal: Organize data in classes 4

  5. Clustering applications – Microarray Gene Expression data From: “Skin layer-specific transcriptional profiles in normal and recessive yellow (Mc1re/Mc1re) mice'' by April and Barsh in Pigment Cell Research (2006) 5

  6. Clustering applications – Medical Imaging 6

  7. Clustering applications – Community detection 7

  8. News Media 8

  9. Clustering • One important unsupervised method is clustering • Goal: Organize data in classes • Classes are hard to define • Different data representation may lead to different clusterings 9

  10. Clustering • One important unsupervised method is clustering • Goal: Organize data in classes • Data have high in-class similarity • Data have low out-of-class similarity 10

  11. Clustering - Similarity 11

  12. Clustering - Similarity 12

  13. K-Means • Simplest clustering method • Iterative in nature • Reasonably fast • Very popular in practice (though with more bells and whistles) • Requires real-valued data 13

  14. K-Means 14

  15. K-Means 15

  16. 16

  17. 17

  18. 18

  19. 19

  20. 20

  21. 21

  22. 22

  23. More K-means • Animations: http://shabal.in/visuals/kmeans/4.html 23

  24. K-Means in numbers 24

  25. K-Means in numbers 25

  26. K-Means in numbers 26

  27. K-Means in numbers 27

  28. K-Means in numbers 28

  29. K-Means in numbers 29

  30. K-Means in numbers 30

  31. K-Means in numbers 31

  32. K-Means in numbers 32

  33. K-Means in numbers 33

  34. K-Means in numbers 34

  35. K-Means in numbers 35

  36. K-Means 36

  37. K-Means 37

  38. K-Means • Weaknesses • Doesn't really work with categorical data • Usually only converges to local minimum • Have to determine number of clusters • Can be sensitive to outliers • Only generates convex clusters 38

  39. K-means - Weaknesses • Doesn't really work with categorical data 39

  40. K-means - Weaknesses • Doesn't really work with categorical data • Fix : Do K-Modes instead 40

  41. K-means - Weaknesses • Usually only converges to local minimum 41

  42. K-means - Weaknesses • Usually only converges to local minimum • Fix : Do several runs with random inits. and choose best 42

  43. K-means - Weaknesses • Have to determine number of clusters 43

  44. K-means - Weaknesses • Have to determine number of clusters • Fix: Use the elbow method Run K-Means for different values of k and look at loss function 44

  45. 45

  46. 46

  47. 47

  48. 48

  49. 49

  50. 50

  51. Gaussian Mixture Models 51

  52. Gaussian Mixture Models 52

  53. Gaussian Mixture Models 53

  54. Gaussian Mixture Models 54

  55. Gaussian Mixture Models 55

  56. Gaussian Mixture Models 56

  57. Gaussian Mixture Models 57

  58. Gaussian Mixture Models 58

  59. Gaussian Mixture Models 59

  60. Recap • K-means is the most commonly used clustering algorithm • We learned the Gaussian Mixture Model’s generative story • We will learn EM-algorithm next week 60

Recommend


More recommend