Introduction to Machine Learning CMU-10701 Principal Component - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabás Póczos & Aarti Singh

Contents  Motivation  PCA algorithms  Applications Some of these slides are taken from • Karl Booksh Research group • Tom Mitchell • Ron Parr 2

Motivation 3

PCA Applications • Data Visualization • Data Compression • Noise Reduction 4

Data Visualization Example: • Given 53 blood and urine samples (features) from 65 people. • How can we visualize the measurements? 5

Data Visualization • Matrix format (65x53) H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC H-MCHC A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000 A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000 Instances A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000 A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000 A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000 A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000 A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000 A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000 A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000 Features Difficult to see the correlations between the features... 6

Data Visualization • Spectral format (65 curves, one for each person) 1000 900 800 700 600 Value 500 400 300 200 100 0 0 10 20 30 40 50 60 measurement Measurement Difficult to compare the different patients... 7

Data Visualization • Spectral format (53 pictures, one for each feature) 1.8 1.6 1.4 1.2 H-Bands 1 0.8 0.6 0.4 0.2 0 0 10 20 30 40 50 60 70 Person Difficult to see the correlations between the features... 8

Data Visualization Bi-variate Tri-variate 550 500 4 450 400 3 C-LDH M-EPI 350 2 300 1 250 200 0 600 150 100200300400500 400 100 200 C-LDH 50 0 0 0 50 150 250 350 450 C-Triglycerides C-Triglycerides How can we visualize the other variables??? … difficult to see in 4 or higher dimensional spaces... 9

Data Visualization • Is there a representation better than the coordinate axes? • Is it really necessary to show all the 53 dimensions? • … what if there are strong correlations between the features? • How could we find the smallest subspace of the 53-D space that keeps the most information about the original data? • A solution: Principal Component Analysis 10

PCA Algorithms 11

Principal Component Analysis PCA: Orthogonal projection of the data onto a lower-dimension linear space that...  maximizes variance of projected data (purple line)  minimizes the mean squared distance between • data point and • projections (sum of blue lines) 12

Principal Component Analysis Idea:  Given data points in a d-dimensional space, project them into a lower dimensional space while preserving as much information as possible. • Find best planar approximation of 3D data • Find best 12-D approximation of 10 4 -D data  In particular, choose projection that minimizes squared error in reconstructing the original data. 13

Principal Component Analysis Properties:  PCA Vectors originate from the center of mass.  Principal component #1: points in the direction of the largest variance .  Each subsequent principal component • is orthogonal to the previous ones, and • points in the directions of the largest variance of the residual subspace 14

2D Gaussian dataset 15

1 st PCA axis 16

2 nd PCA axis 17

PCA algorithm I (sequential) Given the centered data { x 1 , …, x m }, compute the principal vectors: m 1   T 2 1 st PCA vector w arg max {( w x ) } 1 i  m w 1  i 1 To find w 1 , maximize the variance of projection of x m 1    T T 2 2 nd PCA vector w arg max {[ w ( x w w x )] } 2 i 1 1 i  m w 1  i 1 x’ PCA reconstruction w To find w 2 , we maximize x x- x’ w 2 the variance of the w 1 projection in the residual subspace x’=w 1 ( w 1 T x ) w 18

PCA algorithm I (sequential) Given w 1 ,…, w k-1 , we calculate w k principal vector as before: Maximize the variance of projection of x  m k 1 1     T T 2 w arg max {[ w ( x w w x )] } k i j j i  m w 1   i 1 j 1 k th PCA vector x’ PCA reconstruction w We maximize the variance x of the projection in the w 1 residual subspace w 1 ( w 1 T x ) w 2 ( w 2 T x ) x’=w 1 ( w 1 T x ) +w 2 ( w 2 T x ) w 2 19

PCA algorithm II (sample covariance matrix) • Given data { x 1 , …, x m }, compute covariance matrix  m m 1 1        where T x x ( x x )( x x ) i i m m   i 1 i 1 • PCA basis vectors = the eigenvectors of  • Larger eigenvalue  more important eigenvectors 20

PCA algorithm II (sample covariance matrix) PCA algorithm( X , k ): top k eigenvalues/eigenvectors % X = N  m data matrix, % … each data point x i = column vector, i=1..m m 1  x x  • i m  i 1 • X  subtract mean x from each column vector x i in X •   X X T … covariance matrix of X • {  i , u i } i=1..N = eigenvectors/eigenvalues of  ...  1   2  …   N • Return {  i , u i } i=1.. k % top k PCA components 21

Animation Power iteration 2: Power iteration 1: v=Sigma*v;  v PCA1T *Sigma*v PCA1 v=v/sqrt(v'*v); Sigma 2 =Sigma-  *v PCA1 *v PCA1 T ; ) v PCA1 v=Sigma 2 *v; v=v/sqrt(v'*v); ) v PCA2 22

PCA algorithm III (SVD of the data matrix) Singular Value Decomposition of the centered data matrix X . X features  samples = USV T X U S V T = sig. significant significant noise noise noise 23 samples

PCA algorithm III • Columns of U • the principal vectors, { u (1) , …, u ( k) } • orthogonal and has unit norm – so U T U = I • Can reconstruct the data using linear combinations of { u (1) , …, u ( k) } • Matrix S • Diagonal • Shows importance of each eigenvector • Columns of V T • The coefficients for reconstructing the samples 24

Applications 25

Face Recognition  Want to identify specific person, based on facial image  Robust to glasses, lighting,…  Can’t just use the given 256 x 256 pixels 26

Applying PCA: Eigenfaces Method A: Build a PCA subspace for each person and check which subspace can reconstruct the test image the best Method B: Build one PCA database for the whole dataset and then classify based on the weights. 27

Applying PCA: Eigenfaces  Example data set: Images of faces • Eigenface approach [Turk & Pentland], [Sirovich & Kirby]  Each face x is … • 256  256 values (luminance at location) x 1 , …, x m • x in  256  256 (view as 64K dim vector) real values 256 x 256  Form X = [ x 1 , …, x m ] centered data X = mtx  Compute  = XX T  Problem:  is 64K  64K … HUGE!!! m faces 28

Computational Complexity  Suppose m instances, each of size N • Eigenfaces: m=500 faces, each of size N=64K  Given N  N covariance matrix , can compute • all N eigenvectors/eigenvalues in O(N 3 ) • first k eigenvectors/eigenvalues in O(k N 2 )  But if N=64K, EXPENSIVE! 29

A Clever Workaround • Note that m<<64K x 1 , …, x m • Use L =X T X instead of  =XX T • If v is eigenvector of L real values 256 x 256 then Xv is eigenvector of  X = Proof: L v =  v X T X v =  v X (X T X v) = X(  v) =  Xv m faces (XX T )X v =  ( Xv )  ( Xv) =  ( Xv ) 30

Principle Components (Method B) 31

Reconstructing… (Method B)  … faster if train with… • only people w/out glasses • same lighting conditions 32

Shortcomings  Requires carefully controlled data: • All faces centered in frame • Same size • Some sensitivity to angle  Method is completely knowledge free • (sometimes this is good!) • Doesn’t know that faces are wrapped around 3D objects (heads) • Makes no effort to preserve class distinctions 33

Happiness subspace (method A) 34

Disgust subspace (method A) 35

Facial Expression Recognition Movies 36

Image Compression

Original Image  Divide the original 372x492 image into patches: • Each patch is an instance that contains 12x12 pixels on a grid  Consider each as a 144-D vector 40

L 2 error and PCA dim 41

PCA compression: 144D ) 60D 42

16 most important eigenvectors 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 12 12 12 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 12 12 12 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 12 12 12 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 12 12 12 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 44

Introduction to Machine Learning CMU-10701 Principal Component - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos & Aarti Singh Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research group

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

TOTEM Switching the NRTI Backbone to Tenofovir DF-Emtricitabine TOTEM: Study Design Study

Reduction in Total Ischemic Events in the Reduction of Cardiovascular Events with Icosapent

Highway 7 & Wooddale Highway 7 & Wooddale Avenue Vapor Avenue Vapor Study Background

Emerging microenvironmental approaches for enhanced bioremediation Bioremediation - Expanding the

Michael J. Frank Laboratory for Neural Computation and Cognition Brown University Reinforcement

ANDES Trimester meeting September 3 rd , 2010 (teleconference) Draft agenda : Identification

Travel Sentiment Study Wave 13 JUNE 9, 2020 COVID-19 TRAVEL SENTIMENT STUDY WAVE 13 Fielded

Building Software Agents for Building Software Agents for Planning Monitoring, and Planning

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Machine Learning CMU-10701 Principal Component - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos & Aarti Singh Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research group

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

TOTEM Switching the NRTI Backbone to Tenofovir DF-Emtricitabine TOTEM: Study Design Study

Reduction in Total Ischemic Events in the Reduction of Cardiovascular Events with Icosapent

Highway 7 &amp; Wooddale Highway 7 &amp; Wooddale Avenue Vapor Avenue Vapor Study Background

Emerging microenvironmental approaches for enhanced bioremediation Bioremediation - Expanding the

Michael J. Frank Laboratory for Neural Computation and Cognition Brown University Reinforcement

ANDES Trimester meeting September 3 rd , 2010 (teleconference) Draft agenda : Identification

Travel Sentiment Study Wave 13 JUNE 9, 2020 COVID-19 TRAVEL SENTIMENT STUDY WAVE 13 Fielded

Building Software Agents for Building Software Agents for Planning Monitoring, and Planning

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Highway 7 & Wooddale Highway 7 & Wooddale Avenue Vapor Avenue Vapor Study Background