clustering and dimensionality reduction
play

Clustering and Dimensionality Reduction Stony Brook University - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised Supervised


  1. Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016

  2. Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data?

  3. Supervised vs. Unsupervised Supervised ● Predicting an outcome: ● Loss function used to characterize quality of prediction

  4. Expected value of y (something we Supervised vs. Unsupervised are trying to predict) based on X (our features or “evidence” for what y should be) Supervised ● Predicting an outcome: ● Loss function used to characterize quality of prediction

  5. Supervised vs. Unsupervised Supervised ● Predicting an outcome ● Loss function used to characterize quality of prediction Unsupervised ● No outcome to predict ● Goal: Infer properties of without a supervised loss function. ● Often larger data. ● Don’t need to worry about conditioning on another variable.

  6. Concept, In Matrix Form: columns: p features f1, f2, f3, f4, … fp o1 o2 o3 … rows: N observations oN

  7. Concept, In Matrix Form: f1, f2, f3, f4, … fp o1 o2 o3 … oN

  8. Dimensionality reduction Try to best represent but with on p’ columns. Concept, In Matrix Form: f1, f2, f3, f4, … fp c1, c2, c3, c4, … cp’ o1 o1 o2 o2 o3 o3 … … oN oN

  9. Clustering: Group observations based Concept, In Matrix Form: on the features (i.e. like reducing the number of observations into K groups). f1, f2, f3, f4, … fp o1 Cluster 1 o2 o3 … Cluster 2 Cluster 3 oN

  10. Concept: in 2-d (clustering) Feature 2 each point is an observation Feature 1

  11. Concept: in 2-d (clustering) Feature 2 Feature 1

  12. Clustering Typical formalization: Given: ● set of points ● distance metric (Euclidean, cosine, etc…) ● number of clusters (not always provided) Do: Group observations together that are similar. Ideally, ● Members of same cluster are the “same”. ● Members of different clusters are “different”. Keep in mind: usually many more than 2 dimensions.

  13. Clustering Often many dimensions and no clean separation.

  14. Supposes Clustering observations have a “true” cluster. Often many dimensions and no clean separation.

  15. K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model). Euclidean Distance:

  16. K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model). Euclidean Distance: centers = a random selection of k cluster centers until centers converge: 1. For all x i , find the closest center (according to d ) 2. Recalculate centers based on mean of euclidean distance

  17. K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model). Euclidean Distance: centers = a random selection of k cluster centers until centers converge: 1. For all x i , find the closest center (according to d ) 2. Recalculate centers based on mean of euclidean distance Example: http://shabal.in/visuals/kmeans/6.html

  18. K-Means Clustering Understanding K-Means (source: Scikit-Learn)

  19. The Curse of Dimensionality Problems with high-dimensional spaces: 1. All points (i.e. observations) are nearly equally far apart. 2. The angle between vectors are almost always 90 degrees (i.e. they are orthogonal).

  20. Hierarchical Clustering f1, f2, f3, f4, … fp o1 Cluster 1 o2 o3 … Cluster 2 Cluster 3 Cluster 4 oN

  21. Hierarchical Clustering f1, f2, f3, f4, … fp o1 Cluster 1 o2 o3 Cluster 5 … Cluster 2 Cluster 6 Cluster 3 Cluster 4 oN

  22. Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ● Divisive (top down): ○ Start with one cluster and recursively split it

  23. Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ● Divisive (top down): ○ Start with one cluster and recursively split it ● Regular K-Means is “Point assignment clustering”: ○ Maintain a set of clusters ○ Points belong to “nearest” cluster

  24. Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one

  25. Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ○ Stop when reaching a threshold in ■ Distance between points in cluster, or ■ Maximum distance of points from “center” ■ Maximum number of points

  26. Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ○ Stop when reaching a threshold in ■ Distance between points in cluster, or ■ Maximum distance from “center” ■ Maximum number of points In Euclidean space

  27. Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one But what if we have no “centroid”? (such as when using cosine distance)

  28. Clustering: Applications

  29. Clustering: Applications

  30. Clustering: Applications

  31. Clustering: Applications (musicmachinery.com)

  32. Clustering: Applications (musicmachinery.com)

  33. Concept: Dimensionality Reduction in 3-D, 2-D, and 1-D Data (or, at least, what we want from the data) may be accurately represented with less dimensions.

  34. Dimensionality reduction Try to best represent but with on p’ columns. Concept, In Matrix Form: f1, f2, f3, f4, … fp c1, c2, c3, c4, … cp’ o1 o1 o2 o2 o3 o3 … … oN oN

  35. Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns through addition). 1 -2 3 Q: What is the rank of this matrix? 2 -3 5 1 1 0

  36. Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns). 1 -2 3 Q: What is the rank of this matrix? 2 -3 5 1 1 0 A: 2. The 1st is just the sum of the second two columns 1 -2 … we can represent as linear combination of 2 vectors: 2 -3 1 1

  37. Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns). 1 -2 3 Q: What is the rank of this matrix? 2 -3 5 1 1 0 A: 2. The 1st is just the sum of the second two columns 1 -2 … we can represent as linear combination of 2 vectors: 2 -3 1 1

  38. Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors”

  39. Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” p p ≈ n X n

  40. Dimensionality Reduction - PCA - Example T X [nxp] = U [nxr] D [rxr] V [pxr] Users to movies matrix

  41. Dimensionality Reduction - PCA - Example T X [nxp] = U [nxr] D [rxr] V [pxr]

  42. Dimensionality Reduction - PCA - Example X [mxn] = U [mxr] D [rxr] V T [nxr]

  43. Dimensionality Reduction - PCA - Example X [mxn] = U [mxr] D [rxr] V T [nxr] V =

  44. Dimensionality Reduction - PCA - Example X [mxn] = U [mxr] D [rxr] V T [nxr] (UD) T =

  45. Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” Projection (dimensionality reduced space) in 3 dimensions: T ) (U [nx3] D [3x3] V [px3] To reduce features in new dataset: X new V = X new_small

  46. Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] U, D, and V are unique D: always positive

  47. Dimensionality Reduction v. Clustering Clustering: Group n observations into k clusters Soft Clustering: Assign observations to k clusters with some weight or probability. Dimensionality Reduction: Assign m features to p components with some weight or probability.

Recommend


More recommend