Unsupervised learning: latent space analysis and clustering Yifeng - PowerPoint PPT Presentation

Introduction to Machine Learning Unsupervised learning: latent space analysis and clustering Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Tom Mitchell, David Sontag, Ziv Bar-Joseph Yifeng Tao Carnegie Mellon University 1

Outline o Dimension reduction/latent space analysis o PCA o ICA o t-SNE o Clustering o K-means o GMM o Hierarchical/agglomerative clustering Yifeng Tao Carnegie Mellon University 2

Unsupervised mapping to lower dimension o Instead of choosing subset of features, create new features (dimensions) defined as functions over all features o Don’t consider class labels, just the data points Yifeng Tao Carnegie Mellon University 3

Principle Components Analysis o Given data points in d -dimensional space, project into lower dimensional space while preserving as much information as possible o E.g., find best planar approximation to 3D data o E.g., find best planar approximation to 10 4 D data o In particular, choose projection that minimizes the squared error in reconstructing original data [Slide from Tom Mitchell ] Yifeng Tao Carnegie Mellon University 4

PCA: Find Projections to Minimize Reconstruction Error o Assume data is set of d-dimensional vectors, where n -th vector is o We can represent these in terms of any d orthogonal basis vectors [Slide from Tom Mitchell ] Yifeng Tao Carnegie Mellon University 5

PCA o Note we get zero error if M=d, so all error is due to missing components. [Slide from Tom Mitchell ] Yifeng Tao Carnegie Mellon University 6

PCA o More strict derivation in Bishop book. [Slide from Tom Mitchell ] Yifeng Tao Carnegie Mellon University 7

PCA Example [Slide from Tom Mitchell ] Yifeng Tao Carnegie Mellon University 8

[Slide from Tom Mitchell ] Yifeng Tao Carnegie Mellon University 11

Independent Components Analysis o PCA seeks directions < Y 1 ... Y M > in feature space X that minimize reconstruction error o ICA seeks directions < Y 1 ... Y M > that are most statistically independent . I.e., that minimize I(Y), the mutual information between the Y j : where H(Y) is the entropy of Y o Widely used in signal processing [Slide from Tom Mitchell ] Yifeng Tao Carnegie Mellon University 12

ICA example o Both PCA and ICA try to find a set of vectors, a basis, for the data. So you can write any point (vector) in your data as a linear combination of the basis. o In PCA the basis you want to find is the one that best explains the variability of your data. o In ICA the basis you want to find is the one in which each vector is an independent component of your data. [Slide from https://www.quora.com/What-is-the-difference-between-PCA-and-ICA ] Yifeng Tao Carnegie Mellon University 13

t-Distributed Stochastic Neighbor Embedding (t-SNE) o Nonlinear dimensionality reduction technique o Manifold learning [Figure from https://scikit-learn.org/stable/auto_examples/manifold/plot_t_sne_perplexity.html#sphx-glr-auto-examples-manifold-plot-t-sne-perplexity-py ] Yifeng Tao Carnegie Mellon University 14

t-SNE à o o Two stages: o First, t-SNE constructs a probability distribution over pairs of high- dimensional objects in such a way that similar objects have a high probability of being picked while dissimilar points have an extremely small probability of being picked. o Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback-Leibler divergence between the two distributions with respect to the locations of the points in the map. o Minimized using gradient descent [Slide from https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding ] Yifeng Tao Carnegie Mellon University 15

t-SNE example o Visualizing MNIST [Figure from https://lvdmaaten.github.io/tsne/ ] Yifeng Tao Carnegie Mellon University 16

Clustering o Unsupervised learning o Requires data, but no labels o Detect patterns e.g. in o Group emails or search results o Customer shopping patterns o Regions of images o Useful when don’t know what you’re looking for [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 17

Clustering o Basic idea: group together similar instances o Example: 2D point patterns [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 18

o The clustering result can be quite different based on different rules. [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 19

Distance measure o What could “similar” mean? o One option: small Euclidean distance (squared) o Clustering results are crucially dependent on the measure of similarity (or distance) between “points” to be clustered o What properties should a distance measure have? o Symmetric o D(A,B)=D(B,A) o Otherwise, we can say A looks like B but B does not look like A o Positivity, and self-similarity o D(A, B) >= 0, and D(A, B)=0 iff A=B o Otherwise there will different objects that we can not tell apart o Triangle inequality o D(A, B) + D(B, C) >= D(A, C) o Otherwise one can say “A is like B, B is like C, but A is not like C at all” [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 20

Clustering algorithms o Partition algorithms o K-means o Mixture of Gaussian o Spectral Clustering (in graph, not discussed in this lecture.) o Hierarchical algorithms o Bottom up - agglomerative o Top down – divisive (not discussed in this lecture.) [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 21

Clustering examples o Image segmentation o Goal: Break up the image into meaningful or perceptually similar regions [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 22

Clustering examples o Clustering gene expression data Yifeng Tao Carnegie Mellon University 23

K-Means o An iterative clustering algorithm o Initialize: Pick K random points as cluster centers o Alternate: o Assign data points to closest cluster center o Change the cluster center to the average of its assigned points o Stop when no points’ assignments change [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 24

K-Means o An iterative clustering algorithm o Initialize: Pick K random points as cluster centers o Alternate: o Assign data points to closest cluster center o Change the cluster center to the average of its assigned points o Stop when no points’ assignments change [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 25

K-means clustering: Example o Pick K random points as cluster centers (means) o Shown here for K =2 [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 26

K-means clustering: Example o Iterative Step 1 o Assign data points to closest cluster center [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 27

K-means clustering: Example o Iterative Step 2 o Change the cluster center to the average of the assigned points [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 28

K-means clustering: Example o Repeat until convergence [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 29

K-means clustering: Example [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 30

[Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 31

[Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 32

Properties of K-means algorithm o Guaranteed to converge in a finite number of iterations o Running time per iteration: o Assign data points to closest cluster center o O(KN) time o Change the cluster center to the average of its assigned points o O(N) [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 33

K-means convergence [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 34

Example: K-Means for Segmentation [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 35

Example: K-Means for Segmentation [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 36

Initialization o K-means algorithm is a heuristic o Requires initial means o It does matter what you pick! o What can go wrong? o Various schemes for preventing this kind of thing: variance-based split / merge, initialization heuristics o E.g., multiple initialization, k-means++ [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 37

K-Means Getting Stuck o A local optimum: [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 38

K-means not able to properly cluster o Spectral clustering will help in this case. [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 39

Changing the features (distance function) can help [Slide from David Sontag ] Yifeng Tao Carnegie Mellon University 40

Reconsidering “ hard assignments ”? o Clusters may overlap o Some clusters may be “wider” than others o Distances can be deceiving [Slide from Ziv Bar-Joseph ] Yifeng Tao Carnegie Mellon University 41

Gaussian Mixture Models [Slide from Ziv Bar-Joseph ] Yifeng Tao Carnegie Mellon University 42

Unsupervised learning: latent space analysis and clustering Yifeng - PowerPoint PPT Presentation

Introduction to Machine Learning Unsupervised learning: latent space analysis and clustering Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Tom Mitchell, David Sontag, Ziv Bar-Joseph Yifeng Tao Carnegie

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Chapter 7: Clustering (Unsupervised Data Organization) 7.1 Hierarchical Clustering 7.2 Flat

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Lecture 11 Jan-Willem van de Meent Clustering Clustering Unsupervised learning (no labels

Lecture 10 Jan-Willem van de Meent Clustering Clustering Unsupervised learning (no labels

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Empirical Analysis of Latent Space Embedding David Mount and Eunhui Park Department of Computer

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima

INFO 1998: Introduction to Machine Learning Lecture 9: Clustering and Unsupervised Learning INFO

Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation Tahrima Rahman, Shasha

A Journey to Latent Class Analysis (LCA) Jeff Pitblado StataCorp LLC 2017 Italian Stata Users

Generative models for social network data Kevin S. Xu (University of Toledo) James R. Foulds

Demystifying Relational Latent Representations Sebastijan Dumani, Hendrik Blockeel DTAI, KU

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Convergence of latent mixing measures in finite and infinite mixture models Long Nguyen

Modeling nonignorable missingness in multidimensional latent class IRT models Silvia Bacci 1 ,

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model