clustering algorithms
play

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter - PowerPoint PPT Presentation

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018 Clustering Feature 2 Feature 1 Clustering cluster #1 Feature 2 cluster #2 Feature 1 Clustering Why should we look for clusters? cluster #1 Feature


  1. Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

  2. Clustering Feature 2 Feature 1

  3. Clustering cluster #1 Feature 2 cluster #2 Feature 1

  4. Clustering Why should we look for clusters? cluster #1 Feature 2 cluster #2 Feature 1

  5. Clustering

  6. K-means Input: measured features, and the number of clusters, k . The algorithm will classify all the objects in the sample into k clusters. Feature 2 Feature 1

  7. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 Feature 1

  8. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 Two centroids are randomly placed Feature 1

  9. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 The objects are associated to the closest cluster centroid (Euclidean distance). Feature 1

  10. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 New cluster centroids are computed using the average location of the cluster members. Feature 1

  11. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 The objects are associated to the closest cluster centroid (Euclidean distance). Feature 1

  12. K-means The algorithm places randomly k points that represent the centroids of the clusters. (I) The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it. Feature 2 The process stops when the objects that are associated with a given class do not change. Feature 1

  13. The anatomy of K-means cluster Internal choices and/or internal cost function: centroids (I) Initial centroids are randomly selected from the set of examples. (II) The global cost function that is minimized by K-means: cluster Euclidean members distance

  14. The anatomy of K-means cluster Internal choices and/or internal cost function: centroids (I) Initial centroids are randomly selected from the set of examples. (II) The global cost function that is minimized by K-means: cluster Euclidean members distance k=3, and two di ff erent random placements of centroids

  15. The anatomy of K-means Input dataset: a list of objects with measured features. For which datasets should we use K-means? Feature 2 Feature 2 Feature 1 Feature 1

  16. The anatomy of K-means Input dataset: a list of objects with measured features. What happens when we have an outlier in the dataset? Feature 2 outlier! Feature 1

  17. The anatomy of K-means Input dataset: a list of objects with measured features. What happens when we have an outlier in the dataset? Feature 2 outlier! Feature 1

  18. The anatomy of K-means Input dataset: a list of objects with measured features. What happens when the features have di ff erent physical units? input dataset K-means output

  19. The anatomy of K-means Input dataset: a list of objects with measured features. What happens when the features have di ff erent physical units? How can we avoid this? input dataset K-means output

  20. The anatomy of K-means Hyper-parameters: the number of clusters, k. Can we find the optimal k using the cost function? k=2 k=3 k=5

  21. The anatomy of K-means Hyper-parameters: the number of clusters, k. Can we find the optimal k using the cost function? k=2 k=3 k=5 Minimal cost function Elbow Number of clusters

  22. Questions?

  23. Hierarchal Clustering or, how to visualize complicated similarity measures Correa-Gallego+ 2016

  24. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Feature 2 Feature 1

  25. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 Feature 1

  26. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  27. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  28. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  29. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  30. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  31. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  32. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. Feature 2 distance Feature 1 Dendrogram

  33. Hierarchal Clustering Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method . Initialization: each object is a cluster of size 1. The process stops when all the objects are merged into a single cluster Feature 2 distance Feature 1 Dendrogram

Recommend


More recommend