unsupervised learning
play

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE - PowerPoint PPT Presentation

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Maria-Florina Balcan Unsupervised Learning Discovering hidden structure in data Last time: K-Means Clustering What is the


  1. Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Maria-Florina Balcan

  2. Unsupervised Learning • Discovering hidden structure in data • Last time: K-Means Clustering – What is the objective optimized? – How can we improve initialization? – What is the right value of K? • Today: how can we learn better representations of our data points?

  3. Dimensionality Reduction • Goal: extract hidden lower-dimensional structure from high dimensional datasets • Why? – To visualize data more easily – To remove noise in data – To lower resource requirements for storing/processing data – To improve classification/clustering

  4. Examples of data points in D dimensional space that can be effectively represented in a d-dimensional subspace (d < D)

  5. Principal Component Analysis • Goal: Find a projection of the data onto directions that maximize variance of the original data set – Intuition: those are directions in which most information is encoded • Definition: Principal Components are orthogonal directions that capture most of the variance in the data

  6. PCA: finding principal components • 1 st PC – Projection of data points along 1 st PC discriminates data most along any one direction • 2 nd PC – next orthogonal direction of greatest variability • And so on…

  7. PCA: notation • Data points – Represented by matrix X of size DxN – Let’s assume data is centered • Principal components are d vectors: 𝑤 1 , 𝑤 2 , … 𝑤 𝑒 – 𝑤 𝑗 . 𝑤 𝑘 = 0, 𝑗 ≠ 𝑘 and 𝑤 𝑗 . 𝑤 𝑗 = 1 • The sample variance data projected on vector v 𝑜 (𝑤 𝑈 𝑦 𝑗 ) 2 = 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 is 1 𝑜 𝑗=1

  8. PCA formally • Finding vector that maximizes sample variance of projected data: 𝑏𝑠𝑕𝑛𝑏𝑦 𝑤 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 such that 𝑤 𝑈 𝑤 = 1 • A constrained optimization problem  Lagrangian folds constraint into objective: 𝑏𝑠𝑕𝑛𝑏𝑦 𝑤 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 − 𝜇𝑤 𝑈 𝑤  Solutions are vectors v such that 𝑌𝑌 𝑈 𝑤 = 𝜇𝑤  i.e. eigenvectors of 𝑌𝑌 𝑈 (sample covariance matrix)

  9. PCA formally • The eigenvalue 𝜇 denotes the amount of variability captured along dimension 𝑤 – Sample variance of projection 𝑤 𝑈 𝑌𝑌 𝑈 𝑤 = 𝜇 • If we rank eigenvalues from large to small – The 1 st PC is the eigenvector of 𝑌𝑌 𝑈 associated with largest eigenvalue – The 2 nd PC is the eigenvector of 𝑌𝑌 𝑈 associated with 2 nd largest eigenvalue – …

  10. Alternative interpretation of PCA • PCA finds vectors v such that projection on to these vectors minimizes reconstruction error

  11. Resulting PCA algorithm

  12. How to choose the hyperparameter K? • i.e. the number of dimensions • We can ignore the components of smaller significance

  13. An example: Eigenfaces

  14. PCA pros and cons • Pros – Eigenvector method – No tuning of the parameters – No local optima • Cons – Only based on covariance (2 nd order statistics) – Limited to linear projections

  15. What you should know • Formulate K-Means clustering as an optimization problem • Choose initialization strategies for K-Means • Understand the impact of K on the optimization objective • Why and how to perform Principal Components Analysis

Recommend


More recommend