Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16: Dimensionality Reduction Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1
Midterm A. Review session B. Flipped classroom C. Go over the example midterm D. Clustering! 2
Learning objectives • Understand what unsupervised learning is for • Learn principal component analysis • Learn singular value decomposition 3
Supervised learning Unsupervised learning Data: X Labels: Y Data: X 4
Supervised learning Unsupervised learning Data: X Labels: Y Data: X Latent structure: Z 5
When do we need unsupervised learning? 6
When do we need unsupervised learning? • Acquiring labels is expensive • You may not even know what labels to acquire 7
When do we need unsupervised learning? • Exploratory data analysis • Learn patterns/representations that can be useful for supervised learning (representation learning) • Generate data • … 8
When do we need unsupervised learning? https://qz.com/1090267/artificial-intelligence-can-now-show-you- how-those-pants-will-fit/ 9
Unsupervised learning • Dimensionality reduction • Clustering • Topic modeling 10
Unsupervised learning • Dimensionality reduction • Clustering • Topic modeling 11
Principal Component Analysis - Motivation 12
Principal Component Analysis - Motivation Data’s features almost certainly correlated 13
Principal Component Analysis - Motivation Makes it hard to see hidden structure 14
Principal Component Analysis - Motivation To make this easier, let try to reduce this to 1-dimension 15
Principal Component Analysis - Motivation We need to shift our perspective Change the definition of up-down-left-right Choose new features as linear combinations of old features Change of feature-basis 16
Principal Component Analysis - Motivation We need to shift our perspective Change the definition of up-down-left-right Choose new features as linear combinations of old features Change of feature-basis Important: Center and normalize data before performing PCA We will assume that this has already been done in this lecture. 17
Principal Component Analysis - Motivation Proceed incrementally: • If we could choose one combination to describe data? • Which combination leads to the least loss of information? • Once we've found that one, look for another one, perpendicular to the first, the retains the next most amount of information- • Repeat until done (or good enough) 18
Principal Component Analysis - Motivation 19
Principal Component Analysis - Motivation 20
Principal Component Analysis - Motivation 21
Principal Component Analysis - Motivation 22
Principal Component Analysis - Motivation 23
Principal Component Analysis - Motivation 24
Principal Component Analysis - Motivation 25
Principal Component Analysis - Motivation 26
Principal Component Analysis - Motivation 27
Principal Component Analysis - Motivation 28
Principal Component Analysis - Motivation 29
Principal Component Analysis - Motivation 30
Principal Component Analysis - Motivation 31
Principal Component Analysis - Motivation The best vector to project onto is called the 1 st principal component What properties should it have? 32
Principal Component Analysis - Motivation The best vector to project onto is called the 1 st principal component What properties should it have? • Should capture largest variance in data • Should probably be a unit vector 33
Principal Component Analysis - Motivation The best vector to project onto is called the 1 st principal component What properties should it have? • Should capture largest variance in data • Should probably be a unit vector After we’ve found the first, look the second which: • Captures largest amount of leftover variance • Should probably be a unit vector • Should be orthogonal to the one that came before it 34
Principal Component Analysis - Motivation 35
Principal Component Analysis - Motivation 36
Principal Component Analysis - Motivation Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information. 37
Principal Component Analysis - Motivation Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information. So far: All we’ve done is a change of basis on the feature space. But when do we reduce the dimension? 38
Principal Component Analysis - Motivation But when do we reduce the dimension? Picture data points in a 3D feature space What if the points lied mostly along a single vector? 39
Principal Component Analysis - Motivation The other two principal components are still there But they do not carry much information 40
Principal Component Analysis - Motivation The other two principal components are still there But they do not carry much information Throw them away and work with low dimensional representation! Reduce 3D data to 1D 41
Principal Component Analysis – The How 42
Principal Component Analysis – The How 43
Principal Component Analysis – The How 44
Principal Component Analysis – The How 45
Principal Component Analysis – The How But how do we find w ? 46
Principal Component Analysis – The How But how do we find w ? 47
Principal Component Analysis – The How 48
Principal Component Analysis – The How 49
Principal Component Analysis – The How 50
Principal Component Analysis – The How 51
Principal Component Analysis – The How 52
Principal Component Analysis – The How 53
Principal Component Analysis – The How 54
Principal Component Analysis – The How 55
Principal Component Analysis – The How 56
Principal Component Analysis – The How 57
Principal Component Analysis – The How 58
Principal Component Analysis – The How 59
PCA – Dimensionality reduction Questions: • How do we reduce dimensionality? • How much stuff should we keep? 60
PCA – Dimensionality reduction 61
PCA – Dimensionality reduction 62
Quiz 63
PCA - applications 64
PCA - applications 65
PCA - applications 66
PCA - applications 67
PCA - applications 68
PCA - applications 69
Connecting PCA and SVD 70
SVD Applications 71
Wrap up Dimensionality reduction can be a useful way to • explore data • visualize data • represent data 72
Recommend
More recommend