Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE - PowerPoint PPT Presentation

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Maria-Florina Balcan

Unsupervised Learning • Discovering hidden structure in data • Last time: K-Means Clustering – What is the objective optimized? – How can we improve initialization? – What is the right value of K? • Today: how can we learn better representations of our data points?

Dimensionality Reduction • Goal: extract hidden lower-dimensional structure from high dimensional datasets • Why? – To visualize data more easily – To remove noise in data – To lower resource requirements for storing/processing data – To improve classification/clustering

Examples of data points in D dimensional space that can be effectively represented in a d-dimensional subspace (d < D)

Principal Component Analysis • Goal: Find a projection of the data onto directions that maximize variance of the original data set – Intuition: those are directions in which most information is encoded • Definition: Principal Components are orthogonal directions that capture most of the variance in the data

PCA: finding principal components • 1 st PC – Projection of data points along 1 st PC discriminates data most along any one direction • 2 nd PC – next orthogonal direction of greatest variability • And so on…

PCA: notation • Data points – Represented by matrix X of size DxN – Let’s assume data is centered • Principal components are d vectors: 𝑤 1 , 𝑤 2 , … 𝑤 𝑒 – 𝑤 𝑗 . 𝑤 𝑘 = 0, 𝑗 ≠ 𝑘 and 𝑤 𝑗 . 𝑤 𝑗 = 1 • The sample variance data projected on vector v 𝑜 (𝑤 𝑈 𝑦 𝑗 ) 2 = 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 is 1 𝑜 𝑗=1

PCA formally • Finding vector that maximizes sample variance of projected data: 𝑏𝑠𝑕𝑛𝑏𝑦 𝑤 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 such that 𝑤 𝑈 𝑤 = 1 • A constrained optimization problem  Lagrangian folds constraint into objective: 𝑏𝑠𝑕𝑛𝑏𝑦 𝑤 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 − 𝜇𝑤 𝑈 𝑤  Solutions are vectors v such that 𝑌𝑌 𝑈 𝑤 = 𝜇𝑤  i.e. eigenvectors of 𝑌𝑌 𝑈 (sample covariance matrix)

PCA formally • The eigenvalue 𝜇 denotes the amount of variability captured along dimension 𝑤 – Sample variance of projection 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 = 𝜇 • If we rank eigenvalues from large to small – The 1 st PC is the eigenvector of 𝑌𝑌 𝑈 associated with largest eigenvalue – The 2 nd PC is the eigenvector of 𝑌𝑌 𝑈 associated with 2 nd largest eigenvalue – …

Alternative interpretation of PCA • PCA finds vectors v such that projection on to these vectors minimizes reconstruction error

Resulting PCA algorithm

How to choose the hyperparameter K? • i.e. the number of dimensions • We can ignore the components of smaller significance

An example: Eigenfaces

PCA pros and cons • Pros – Eigenvector method – No tuning of the parameters – No local optima • Cons – Only based on covariance (2 nd order statistics) – Limited to linear projections

What you should know • Formulate K-Means clustering as an optimization problem • Choose initialization strategies for K-Means • Understand the impact of K on the optimization objective • Why and how to perform Principal Components Analysis

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE - PowerPoint PPT Presentation

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Maria-Florina Balcan Unsupervised Learning Discovering hidden structure in data Last time: K-Means Clustering What is the

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

WELCOME! SMEI Virtual Series Setting up for September: Exploring strategies for music teaching

Elastic deformations on the plane and approximations (lecture VVI) Aldo Pratelli Department

Visual Displays. Some evidence through artificial and real data K. Fern andez-Aguirre M.A.

Illinois State Finance A Study in Failure COMMERCIAL CLUB MEETING Tuesday, January 12, 2010

Chapter 8. Principal-Components Analysis Neural Networks and

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent

Factor Analysis and Related Methods James H. Steiger Vanderbilt University Primary Goals for