Clustering and Dimensionality Reduction Preview Clustering K - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction

Preview • Clustering – K -means clustering – Mixture models – Hierarchical clustering • Dimensionality reduction – Principal component analysis – Multidimensional scaling – Isomap

Unsupervised Learning • Problem: Too much data! • Solution: Reduce it • Clustering: Reduce number of examples • Dimensionality reduction: Reduce number of dimensions

Clustering • Given set of examples • Divide them into subsets of “similar” examples • How to measure similarity? • How to evaluate quality of results?

K -Means Clustering • Pick random examples as initial means • Repeat until convergence: – Assign each example to nearest mean – New mean = Average of examples assigned to it

K -Means Works If . . . • Clusters are spherical • Clusters are well separated • Clusters are of similar volumes • Clusters have similar numbers of points

Mixture Models n c � P ( x ) = P ( c i ) P ( x | c i ) i =1 Objective function: Log likelihood of data Naive Bayes: P ( x | c i ) = � n d j =1 P ( x j | c i ) AutoClass: Naive Bayes with various x j models Mixture of Gaussians: P ( x | c i ) = Multivariate Gaussian In general: P ( x | c i ) can be any distribution

Mixtures of Gaussians p(x) x � � 2 � 1 − 1 � x − µ i P ( x | µ i ) = √ 2 πσ 2 exp 2 σ

The EM Algorithm Initialize parameters ignoring missing information Repeat until convergence: E step: Compute expected values of unobserved variables, assuming current parameter values M step: Compute new parameter values to maximize probability of data (observed & estimated) (Also: Initialize expected values ignoring missing info)

EM for Mixtures of Gaussians Initialization: Choose means at random, etc. E step: For all examples x k : P ( µ i | x k ) = P ( µ i ) P ( x k | µ i ) P ( µ i ) P ( x k | µ i ) = � P ( x k ) i ′ P ( µ i ′ ) P ( x k | µ i ′ ) M step: For all components c i : n e 1 � P ( c i ) = P ( µ i | x k ) n e k =1 � n e k =1 x k P ( µ i | x k ) = µ i � n e k =1 P ( µ i | x k ) k =1 ( x k − µ i ) 2 P ( µ i | x k ) � n e σ 2 = i � n e k =1 P ( µ i | x k )

Mixtures of Gaussians (cont.) • K-means clustering ≺ EM for mixtures of Gaussians • Mixtures of Gaussians ≺ Bayes nets • Also good for estimating joint distribution of continuous variables

Hierarchical Clustering • Agglomerative clustering – Start with one cluster per example – Merge two nearest clusters (Criteria: min, max, avg, mean distance) – Repeat until all one cluster – Output dendrogram • Divisive clustering – Start with all in one cluster – Split into two (e.g., by min-cut) – Etc.

Dimensionality Reduction • Given data points in d dimensions • Convert them to data points in r < d dimensions • With minimal loss of information

Principal Component Analysis Goal: Find r -dim projection that best preserves variance 1. Compute mean vector µ and covariance matrix Σ of original points 2. Compute eigenvectors and eigenvalues of Σ 3. Select top r eigenvectors 4. Project points onto subspace spanned by them: y = A ( x − µ ) where y is the new point, x is the old one, and the rows of A are the eigenvectors

Multidimensional Scaling Goal: Find projection that best preserves inter-point distances Point in d dimensions x i Corresponding point in r < d dimensions y i δ ij Distance between x i and x j d ij Distance between y i and y j � 2 � d ij − δ ij � • Define (e.g.) E ( y ) = δ ij i,j • Find y i ’s that minimize E by gradient descent • Invariant to translations, rotations and scalings

Isomap Goal: Find projection onto nonlinear manifold 1. Construct neighborhood graph G : For all x i , x j If distance( x i , x j ) < ǫ Then add edge ( x i , x j ) to G 2. Compute shortest distances along graph δ G ( x i , x j ) (e.g., by Floyd’s algorithm) 3. Apply multidimensional scaling to δ G ( x i , x j )

Summary • Clustering – K -means clustering – Mixture models – Hierarchical clustering • Dimensionality reduction – Principal component analysis – Multidimensional scaling – Isomap

Clustering and Dimensionality Reduction Preview Clustering K - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction Preview Clustering K -means clustering Mixture models Hierarchical clustering Dimensionality reduction Principal component analysis Multidimensional scaling Isomap

Preview Preview Preview Preview Preview Enhan En hancing cing my y Preview Preview Pres

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Localization III Localization Local optimization: Global optimization:

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Lecture 26: MDS / Canonical Forms COMPSCI/MATH 290-04 Chris Tralie, Duke University 4/19/2016

Sequential data analysis with TraMineR, Part 2 Gilbert Ritschard Department of Econometrics and

Workshop 15: Q-mode MVA Murray Logan 06 Aug 2016 R-mode analyses preserve euclidean

GH: definition Z,f,g d Z d GH ( X, Y ) = inf H ( f ( X ) , g ( Y )) 1 The Elad-Kimmel approach

Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea

Machine Learning in Conceptual Spaces Two Learning Processes Lucas Bechberger