Dimension Reduction using PCA and SVD Plan of Class Starting the - PowerPoint PPT Presentation

Dimension Reduction using PCA and SVD

Plan of Class • Starting the machine Learning part of the course. • Based on Linear Algebra. • If your linear algebra is rusty, check out the pages on “Resources/Linear Algebra” • This class will all be theory. • Next class will be on doing PCA in Spark. • HW3 will open on friday, be due the following friday.

Dimensionality reduction Why reduce the number of features in a data set? 1 It reduces storage and computation time. 2 High-dimensional data often has a lot of redundancy. 3 Remove noisy or irrelevant features. Example: are all the pixels in an image equally informative? x ∈ R 784 28 × 28 = 784pixels. A vector � If we were to choose a few pixels to discard, which would be the prime candidates? Those with lowest variance...

Eliminating low variance coordinates Example: MNIST. What fraction of the total variance is contained in the 100 (or 200, or 300) coordinates with lowest variance? We can easily drop 300-400 pixels... Can we eliminate more? Yes! By using features that are combinations of pixels instead of single pixels.

Covariance (a quick review) Suppose X has mean µ X and Y has mean µ Y . • Covariance cov( X , Y ) = E [( X − µ X )( Y − µ Y )] = E [ XY ] − µ X µ Y Maximized when X = Y , in which case it is var( X ). In general, it is at most std( X )std( Y ).

Covariance: example 1 cov( X , Y ) = E [( X − µ X )( Y − µ Y )] = E [ XY ] − µ X µ Y Pr ( x , y ) µ X = 0 x y − 1 − 1 1 / 3 µ Y = − 1 / 3 − 1 1 1 / 6 var( X ) = 1 1 − 1 1 / 3 var( Y ) = 8 / 9 1 1 1 / 6 cov( X , Y ) = 0 In this case, X , Y are independent. Independent variables always have zero covariance.

Covariance: example 2 cov( X , Y ) = E [( X − µ X )( Y − µ Y )] = E [ XY ] − µ X µ Y x y Pr ( x , y ) − 1 − 10 1 / 6 µ X = 0 − 1 10 1 / 3 µ Y = 0 1 − 10 1 / 3 var( X ) = 1 1 10 1 / 6 var( Y ) = 100 cov( X , Y ) = − 10 / 3 In this case, X and Y are negatively correlated.

Example: MNIST approximate a digit from class j as the class av- erage plus k corrections: k � � x ≈ µ j + a i � v j , i i =1 • µ j ∈ R 784 class mean vector • � v j , 1 , . . . , � v j , k are the principal directions .

The effect of correlation Suppose we wanted just one feature for the following data. This is the direction of maximum variance .

Two types of projection Projection onto a 1-d line in R 2 : Projection onto R :

Projection: formally What is the projection of x ∈ R p onto direction u ∈ R p (where � u � = 1)? As a one-dimensional value: x p � x · u = u · x = u T x = u i x i . i =1 As a p -dimensional vector: u ( x · u ) u = uu T x x · u “Move x · u units in direction u ” � 2 � What is the projection of x = onto the following directions? 3 • The coordinate direction e 1 ? Answer: 2 � 1 � √ • The direction ? Answer: − 1 / 2 − 1

matrix notation I A notation that allows a simple representation of multiple projections v ∈ R d can be represented, in matrix notation, as A vector � • A column vector:   v 1   v 2   v =   . .   . v d • A row vector: � � v T = v 1 v 2 · · · v d

matrix notation II By convension an inner product is represented by a row vector followed by a a column vector:   v 1   d v 2 � � u 1 �   u 2 · · · u d  =  .  u i v i .  . i =1 v d While a column vector followd by a row vector represents an outer product which is a matrix:     v 1 u 1 v 1 u 2 v 1 · · · u m v 1   v 2 � �     . . ... ... · · · = . .   u 1 u 2 u m   . . . .   . · · · u 1 v n u 2 v n u m v n v n

Projection onto multiple directions Want to project x ∈ R p into the k -dimensional subspace defined by vectors u 1 , . . . , u k ∈ R p . This is easiest when the u i ’s are orthonormal : • They each have length one. • They are at right angles to each other: u i · u j = 0 whenever i � = j Then the projection, as a k -dimensional vector, is   �   ← − − − − − u 1 − − − − − →     ← − − − − − u 2 − − − − − →       ( x · u 1 , x · u 2 , . . . , x · u k ) =   x .    .   .  � ← − − − − − u k − − − − − → � �� call this U T As a p -dimensional vector, the projection is ( x · u 1 ) u 1 + ( x · u 2 ) u 2 + · · · + ( x · u k ) u k = UU T x .

Projection onto multiple directions: example Suppose data are in R 4 and we want to project onto the first two coordinates.     1 0     0 1     Take vectors u 1 =  , u 2 = (notice: orthonormal)    0 0 0 0 � ← � � 1 � − − − − − u 1 − − − − − → 0 0 0 U T = Then write = ← − − − − − u 2 − − − − − → 0 1 0 0 The projection of x ∈ R 4 , The projection of x as a as a 2-d vector, is 4-d vector is   � � x 1 x 1 U T x =   x 2 x 2   UU T x =   0 0 But we’ll generally project along non-coordinate directions.

The best single direction Suppose we need to map our data x ∈ R p into just one dimension: for some unit direction u ∈ R p x �→ u · x What is the direction u of maximum variance? Theorem : Let Σ be the p × p covariance matrix of X . The variance of X in direction u is given by u T Σ u . • Suppose the mean of X is µ ∈ R p . The projection u T X has mean E ( u T X ) = u T E X = u T µ. • The variance of u T X is var( u T X ) = E ( u T X − u T µ ) 2 = E ( u T ( X − µ )( X − µ ) T u ) = u T E ( X − µ )( X − µ ) T u = u T Σ u . Another theorem: u T Σ u is maximized by setting u to the first eigenvector of Σ. The maximum value is the corresponding eigenvalue .

Best single direction: example This direction is the first eigenvector of the 2 × 2 covariance matrix of the data.

The best k -dimensional projection Let Σ be the p × p covariance matrix of X . Its eigendecomposition can be computed in O ( p 3 ) time and consists of: • real eigenvalues λ 1 ≥ λ 2 ≥ · · · ≥ λ p • corresponding eigenvectors u 1 , . . . , u p ∈ R p that are orthonormal: that is, each u i has unit length and u i · u j = 0 whenever i � = j . Theorem : Suppose we want to map data X ∈ R p to just k dimensions, while capturing as much of the variance of X as possible. The best choice of projection is: x �→ ( u 1 · x , u 2 · x , . . . , u k · x ) , where u i are the eigenvectors described above. Projecting the data in this way is principal component analysis (PCA).

Example: MNIST Contrast coordinate projections with PCA:

MNIST: image reconstruction Reconstruct this original image from its PCA projection to k dimensions. k = 200 k = 150 k = 100 k = 50 Q: What are these reconstructions exactly? A: Image x is reconstructed as UU T x , where U is a p × k matrix whose columns are the top k eigenvectors of Σ.

What are eigenvalues and eigenvectors? There are several steps to understanding these. 1 Any matrix M defines a function (or transformation ) x �→ Mx . 2 If M is a p × q matrix, then this transformation maps vector x ∈ R q to vector Mx ∈ R p . 3 We call it a linear transformation because M ( x + x ′ ) = Mx + Mx ′ . 4 We’d like to understand the nature of these transformations. The easiest case is when M is diagonal :       2 0 0 x 1 2 x 1       0 − 1 0 − x 2 x 2 = 0 0 10 10 x 3 x 3 � �� x M Mx In this case, M simply scales each coordinate separately. 5 What about more general matrices that are symmetric but not necessarily diagonal? They also just scale coordinates separately, but in a different coordinate system .

Eigenvalue and eigenvector: definition Let M be a p × p matrix. We say u ∈ R p is an eigenvector if M maps u onto the same direction, that is, Mu = λ u for some scaling constant λ . This λ is the eigenvalue associated with u . Question: What are the eigenvectors and eigenvalues of:   2 0 0   ? M = 0 − 1 0 0 0 10 Answer: Eigenvectors e 1 , e 2 . e 3 , with corresponding eigenvalues 2 , − 1 , 10. Notice that these eigenvectors form an orthonormal basis.

Eigenvectors of a real symmetric matrix Theorem. Let M be any real symmetric p × p matrix. Then M has • p eigenvalues λ 1 , . . . , λ p • corresponding eigenvectors u 1 , . . . , u p ∈ R p that are orthonormal We can think of u 1 , . . . , u p as being the axes of the natural coordinate system for understanding M . Example: consider the matrix � 3 � 1 M = 1 3 It has eigenvectors � 1 � � − 1 � 1 1 u 1 = √ , u 2 = √ 1 1 2 2 and corresponding eigenvalues λ 1 = 4 and λ 2 = 2. (Check)

Dimension Reduction using PCA and SVD Plan of Class Starting the - PowerPoint PPT Presentation

Dimension Reduction using PCA and SVD Plan of Class Starting the machine Learning part of the course. Based on Linear Algebra. If your linear algebra is rusty, check out the pages on Resources/Linear Algebra This class will

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee,

Discriminative Feature Extraction and Dimension Reduction - PCA & LDA Berlin Chen, 2004

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Dimension Reduction with Heavy Tails Gabriel Kuhn Munich University of Technology

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling

Dimension Reduction and High-Dimensional Data Estimation and Inference with Application to

Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences

Dimension Reduction: Analysis and Algorithms Raz Kupferman Institute of Mathematics The Hebrew

Reduced-Rank Singular Value Decomposition for Dimension Reduction with High-Dimensional Data

Active Manifolds: A Geometric Approach to Dimension Reduction for Sensitivity Analysis Anthony

Nonlinear Dimension Reduction Using Kernel Representations Katie Kempfert University of North

Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Dimension Reduction

Nonlinear Dimension Reduction to Improve Predictive Accuracy in Genomic and Neuroimaging Studies

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

Predicting synchronization regimes with spectral dimension reduction on graphs V. Thibeault , G.

Structure-Preserving Method for Dimension Reduction Ewa Nowakowska Institute of Computer

2016 UPDATE ON CLASS SIZE, OVERCROWDING, AND CAPITAL PLAN: P .S. 39, DISTRICT 15 AND CITYWIDE

Dimension Reduction using PCA and SVD Plan of Class Starting the - PowerPoint PPT Presentation

Dimension Reduction using PCA and SVD Plan of Class Starting the machine Learning part of the course. Based on Linear Algebra. If your linear algebra is rusty, check out the pages on Resources/Linear Algebra This class will

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee,

Discriminative Feature Extraction and Dimension Reduction - PCA &amp; LDA Berlin Chen, 2004

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Dimension Reduction with Heavy Tails Gabriel Kuhn Munich University of Technology

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling

Dimension Reduction and High-Dimensional Data Estimation and Inference with Application to

Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences

Dimension Reduction: Analysis and Algorithms Raz Kupferman Institute of Mathematics The Hebrew

Reduced-Rank Singular Value Decomposition for Dimension Reduction with High-Dimensional Data

Active Manifolds: A Geometric Approach to Dimension Reduction for Sensitivity Analysis Anthony

Nonlinear Dimension Reduction Using Kernel Representations Katie Kempfert University of North

Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Dimension Reduction

Nonlinear Dimension Reduction to Improve Predictive Accuracy in Genomic and Neuroimaging Studies

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

Predicting synchronization regimes with spectral dimension reduction on graphs V. Thibeault , G.

Structure-Preserving Method for Dimension Reduction Ewa Nowakowska Institute of Computer

2016 UPDATE ON CLASS SIZE, OVERCROWDING, AND CAPITAL PLAN: P .S. 39, DISTRICT 15 AND CITYWIDE

Discriminative Feature Extraction and Dimension Reduction - PCA & LDA Berlin Chen, 2004