Clustering and Dimensionality Reduction Stony Brook University - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016

Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data?

Supervised vs. Unsupervised Supervised ● Predicting an outcome: ● Loss function used to characterize quality of prediction

Expected value of y (something we Supervised vs. Unsupervised are trying to predict) based on X (our features or “evidence” for what y should be) Supervised ● Predicting an outcome: ● Loss function used to characterize quality of prediction

Supervised vs. Unsupervised Supervised ● Predicting an outcome ● Loss function used to characterize quality of prediction Unsupervised ● No outcome to predict ● Goal: Infer properties of without a supervised loss function. ● Often larger data. ● Don’t need to worry about conditioning on another variable.

Concept, In Matrix Form: columns: p features f1, f2, f3, f4, … fp o1 o2 o3 … rows: N observations oN

Concept, In Matrix Form: f1, f2, f3, f4, … fp o1 o2 o3 … oN

Dimensionality reduction Try to best represent but with on p’ columns. Concept, In Matrix Form: f1, f2, f3, f4, … fp c1, c2, c3, c4, … cp’ o1 o1 o2 o2 o3 o3 … … oN oN

Clustering: Group observations based Concept, In Matrix Form: on the features (i.e. like reducing the number of observations into K groups). f1, f2, f3, f4, … fp o1 Cluster 1 o2 o3 … Cluster 2 Cluster 3 oN

Concept: in 2-d (clustering) Feature 2 each point is an observation Feature 1

Concept: in 2-d (clustering) Feature 2 Feature 1

Clustering Typical formalization: Given: ● set of points ● distance metric (Euclidean, cosine, etc…) ● number of clusters (not always provided) Do: Group observations together that are similar. Ideally, ● Members of same cluster are the “same”. ● Members of different clusters are “different”. Keep in mind: usually many more than 2 dimensions.

Clustering Often many dimensions and no clean separation.

Supposes Clustering observations have a “true” cluster. Often many dimensions and no clean separation.

K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model). Euclidean Distance:

K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model). Euclidean Distance: centers = a random selection of k cluster centers until centers converge: 1. For all x i , find the closest center (according to d ) 2. Recalculate centers based on mean of euclidean distance

K-Means Clustering Clustering: Group similar observations, often over unlabeled data. K-means: A “prototype” method (i.e. not based on an algebraic model). Euclidean Distance: centers = a random selection of k cluster centers until centers converge: 1. For all x i , find the closest center (according to d ) 2. Recalculate centers based on mean of euclidean distance Example: http://shabal.in/visuals/kmeans/6.html

K-Means Clustering Understanding K-Means (source: Scikit-Learn)

The Curse of Dimensionality Problems with high-dimensional spaces: 1. All points (i.e. observations) are nearly equally far apart. 2. The angle between vectors are almost always 90 degrees (i.e. they are orthogonal).

Hierarchical Clustering f1, f2, f3, f4, … fp o1 Cluster 1 o2 o3 … Cluster 2 Cluster 3 Cluster 4 oN

Hierarchical Clustering f1, f2, f3, f4, … fp o1 Cluster 1 o2 o3 Cluster 5 … Cluster 2 Cluster 6 Cluster 3 Cluster 4 oN

Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ● Divisive (top down): ○ Start with one cluster and recursively split it

Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ● Divisive (top down): ○ Start with one cluster and recursively split it ● Regular K-Means is “Point assignment clustering”: ○ Maintain a set of clusters ○ Points belong to “nearest” cluster

Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one

Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ○ Stop when reaching a threshold in ■ Distance between points in cluster, or ■ Maximum distance of points from “center” ■ Maximum number of points

Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one ○ Stop when reaching a threshold in ■ Distance between points in cluster, or ■ Maximum distance from “center” ■ Maximum number of points In Euclidean space

Hierarchical Clustering ● Agglomerative (bottom up): ○ Initially, each point is a cluster ○ Repeatedly combine the two “nearest” clusters into one But what if we have no “centroid”? (such as when using cosine distance)

Clustering: Applications

Clustering: Applications (musicmachinery.com)

Concept: Dimensionality Reduction in 3-D, 2-D, and 1-D Data (or, at least, what we want from the data) may be accurately represented with less dimensions.

Dimensionality reduction Try to best represent but with on p’ columns. Concept, In Matrix Form: f1, f2, f3, f4, … fp c1, c2, c3, c4, … cp’ o1 o1 o2 o2 o3 o3 … … oN oN

Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns through addition). 1 -2 3 Q: What is the rank of this matrix? 2 -3 5 1 1 0

Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns). 1 -2 3 Q: What is the rank of this matrix? 2 -3 5 1 1 0 A: 2. The 1st is just the sum of the second two columns 1 -2 … we can represent as linear combination of 2 vectors: 2 -3 1 1

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors”

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” p p ≈ n X n

Dimensionality Reduction - PCA - Example T X [nxp] = U [nxr] D [rxr] V [pxr] Users to movies matrix

Dimensionality Reduction - PCA - Example T X [nxp] = U [nxr] D [rxr] V [pxr]

Dimensionality Reduction - PCA - Example X [mxn] = U [mxr] D [rxr] V T [nxr]

Dimensionality Reduction - PCA - Example X [mxn] = U [mxr] D [rxr] V T [nxr] V =

Dimensionality Reduction - PCA - Example X [mxn] = U [mxr] D [rxr] V T [nxr] (UD) T =

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” Projection (dimensionality reduced space) in 3 dimensions: T ) (U [nx3] D [3x3] V [px3] To reduce features in new dataset: X new V = X new_small

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] U, D, and V are unique D: always positive

Dimensionality Reduction v. Clustering Clustering: Group n observations into k clusters Soft Clustering: Assign observations to k clusters with some weight or probability. Dimensionality Reduction: Assign m features to p components with some weight or probability.

Clustering and Dimensionality Reduction Stony Brook University - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised Supervised

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Unsupervised Learning There is no direct ground truth for the quantity of interest

Overview of statistical learning theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Statistical

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know

Joint SVBRDF Recovery and Synthesis From a Single Image using an Unsupervised Generative

A Comprehensive Study of Deep Learning for Side-Channel Analysis c Masure 1,3 ecile Dumas 1

Training Strategies CS 6355: Structured Prediction 1 So far we saw What is structured output

Clustering and Dimensionality Reduction Stony Brook University - PowerPoint PPT Presentation

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised Supervised

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Unsupervised Learning There is no direct ground truth for the quantity of interest

Overview of statistical learning theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Statistical

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

Introduction to Machine Learning ML-Basics: Losses &amp; Risk Minimization Learning goals Know

Joint SVBRDF Recovery and Synthesis From a Single Image using an Unsupervised Generative

A Comprehensive Study of Deep Learning for Side-Channel Analysis c Masure 1,3 ecile Dumas 1

Training Strategies CS 6355: Structured Prediction 1 So far we saw What is structured output

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know