High Dimensional Data PCA So far we ve considered scalar data - PDF document

High Dimensional Data PCA � So far we � ve considered scalar data values f i (or � We have n data points from m dimensions: interpolated/approximated each component of store as columns of an mxn matrix A vector values individually) � We � re looking for linear correlations � In many applications, data is itself in high dimensional space between dimensions • Or there � s no real distinction between dependent (f) • Roughly speaking, fitting lines or planes or and independent (x) -- we just have data points hyperplanes through the origin to the data � Assumption: data is actually organized along a • May want to subtract off the mean value along smaller dimension manifold each dimension for this to make sense • generated from smaller set of parameters than number of output variables � Huge topic: machine learning � Simplest: Principal Components Analysis (PCA) cs542g-term1-2006 1 cs542g-term1-2006 2 Reduction to 1D The rank-1 problem � Assume data points fit through a line through the � Use Least-Squares formulation again: 2 origin (1D subspace) A � uw T min � In this case, say line is along unit vector u. (m- u � R m , u = 1 F dimensional vector) w � R n � Each data point should be a multiple of u (call � Clean it up: take w= � v with � � 0 and |v|=1 the scalar multiples w i ): 2 A * i = uw i A � u � v T min u � R m , u = 1 F � That is, A would be rank-1: A=uw T v � R n , v = 1 � � 0 � Problem in general: find rank-1 matrix that best approximates A � u and v are the first principal components of A cs542g-term1-2006 3 cs542g-term1-2006 4 Solving the rank-1 problem Finding u 2 = u T Avv T A T u ( ) u T Av � First look at u: � Remember trace version of Frobenius norm: 2 = tr A � u � v T ( ) ( ) T A � u � v T A � u � v T ( ) u = u T AA T F ( ) � tr A T u � v T ( ) � tr v � u T A ( ) + tr v � u T u � v T ( ) = tr A T A � AA T is symmetric, thus has a complete set of ( ) � 2 u T Av � + � 2 = tr A T A orthonormal eigenvectors X, eigenvectors mu � Minimize with respect to sigma first: m � � Write u in this basis: � 2 = 0 u = ˆ u i X i � � A � u � v T i = 1 F � 2 u T Av + 2 � = 0 � Then maximizing: T � � � � m m m � = u T Av � � � u T AA T u = µ i ˆ � = µ i ˆ ˆ 2 u i X i u i X i u i � � � � � � � � Then plug in to get a problem for u and v: i = 1 i = 1 i = 1 ( ) ( ) � Obviously pick u to be the eigenvector with min � u T Av max u T Av 2 2 � largest eigenvalue cs542g-term1-2006 5 cs542g-term1-2006 6

Finding v Generalizing � Write the thing we � re maximizing as: � In general, if we expect problem to have 2 = v T A T uu T Av ( ) subspace dimension k, we want the u T Av closest rank-k matrix to A ( ) v = v T A T A • That is, express the data points as linear combinations of a set of k basis vectors � Same argument gives v the eigenvector (plus error) corresponding to max eigenvalue of A T A • We want the optimal set of basis vectors and the optimal linear combinations: � Note we also have 2 = max � AA T 2 � 2 = u T Av ( ) ( ) = max � A T A ( ) = A 2 A � UW T min 2 U � R m � k , U T U = I F W � R n � k cs542g-term1-2006 7 cs542g-term1-2006 8 Finding W Finding U � Plugging in W=A T U we get � Take the same approach as before: 2 = tr A � UW T ( ) ( ) T A � UW T A � UW T 2 min A � UW T F F = tr A T A � 2tr WU T A + tr WU T UW T � min � 2tr A T UU T A + tr A T UU T A 2 � 2tr WU T A + W � max tr U T AA T U = A F 2 F � Set gradient w.r.t. W equal to zero: � AA T is symmetric, hence has a complete set of orthogonormal eigenvectors, say � 2 A T U + 2 W = 0 columns of X, and eigenvalues along the diagonal of M (sorted in decreasing order): W = A T U AA T = XMX T cs542g-term1-2006 9 cs542g-term1-2006 10 Finding U cont’d Back to W � Our problem is now: � We can write W=V � T for an orthogonal V, and maxtr U T XMX T U square kxk � � Note X and U are both orthogonal, so is X T U, � Same argument as for U gives that V should be Z T Z = I tr Z T MZ which we can call Z: the first k eigenvectors of A T A max � What is � ? k m � � � From earlier rank-1 case we know � max µ j Z ji 2 � 11 = � = A 2 = A T Z T Z = I i = 1 j = 1 2 � Since U *1 and V *1 are unit vectors that achieve � Simplest solution: set Z=( I 0) T which means that the 2-norm of A T and A, we can derive that first U is the first k columns of X row and column of � is zero except for diagonal. (first k eigenvectors of AA T ) cs542g-term1-2006 11 cs542g-term1-2006 12

What is � The Singular Value Decomposition T from A � Going all the way to k=m (or n) we get the � Subtract rank-1 matrix U *1 � 11 V *1 Singular Value Decomposition (SVD) of A • zeros matching eigenvalue of A T A or AA T � A=U � V T � Then we can understand the next part of � � The diagonal entries of � are called the singular values � End up with � a diagonal matrix, � The columns of U (eigenvectors of AA T ) are the containing the squareroots of the first k left singular vectors eigenvalues of AA T or A T A (they � re equal) � The columns of V (eigenvectors of A T A) are the right singular vectors � Gives a formula for A as a sum of rank-1 � matrices: A = � i u i v i T i cs542g-term1-2006 13 cs542g-term1-2006 14 Cool things about the SVD Least Squares with SVD A 2 = � 1 � 2-norm: � Define pseudo-inverse for a general A: 2 = � 1 2 + � + � n 2 A F � Frobenius norm: T A + = V � + U T = n v i u i � � Rank(A)= # nonzero singular values � i • Can make a sensible numerical estimate i = 1 � Null(A) spanned by columns of U for zero � i > 0 singular values � Note if A T A is invertible, A + =(A T A) -1 A T � Range(A) spanned by columns of V for nonzero • I.e. solves the least squares problem] singular values A � 1 = V � � 1 U T � For invertible A: � If A T A is singular, pseudo-inverse defined: A + b is the x that minimizes ||b-Ax|| 2 and of n T v i u i � = all those that do so, has smallest ||x|| 2 � i i = 1 cs542g-term1-2006 15 cs542g-term1-2006 16 Solving Eigenproblems The Symmetric Eigenproblem � Computing the SVD is another matter! � Assume A is symmetric and real � We can get U and V by solving the symmetric � Find orthogonal matrix V and diagonal matrix D eigenproblem for AA T or A T A, but more s.t. AV=VD specialized methods are more accurate • Diagonal entries of D are the eigenvalues, corresponding columns of V are the eigenvectors � The unsymmetric eigenproblem is another � Also put: A=VDV T or V T AV=D related computation, with complications: • May involve complex numbers even if A is real � There are a few strategies • If A is not normal (AA T � A T A), it doesn � t have a full • More if you only care about a few eigenpairs, not the basis of eigenvectors complete set… • Eigenvectors may not be orthogonal… Schur decomp � Also: finding eigenvalues of an nxn matrix is � Generalized problem: Ax= � Bx equivalent to solving a degree n polynomial • No “analytic” solution in general for n � 5 � LAPACK provides routines for all these • Thus general algorithms are iterative � We � ll examine symmetric problem in more detail cs542g-term1-2006 17 cs542g-term1-2006 18

High Dimensional Data PCA So far we ve considered scalar data - PDF document

High Dimensional Data PCA So far we ve considered scalar data values f i (or We have n data points from m dimensions: interpolated/approximated each component of store as columns of an mxn matrix A vector values individually) We

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Robust PCA for High-Dimensional Data Huan Xu, Constantine Caramanis and Shie Mannor Talk by Shie

Big Data Management & Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016 What is PCA?

Principal Components Analysis (PCA) Exploratory data analysis of high-dimensional data sets.

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Admixture model

PCA and admixture proportions for NGS data Anders Albrechtsen Admixture model NGSadmix

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

PCA and clustering STEPHANIE J. SPIELMAN, PHD BIO5312, FALL 2017 Exploratory methods for high-

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

Fractional-Tikhonov regularization on graphs (applied to signal and image restoration) by Davide

Interactions If a model contains terms u U + v V then UV interaction term is uv UV

Network Calculus: Reference Material: J.-Y. LeBoudec and Patrick Thiran: Network

2 / 16 Preliminaries Notaion and setup of Problem Notaion Some results Notaion Remark to

The inverse Berreman problem Bill Lionheart and Chris Newton School of Mathematics University of

Hybrid Steepest Descent Method for Variational Inequality Problem over Fixed Point Sets of

Linear algebra A brush-up course Anders Ringgaard Kristensen Slide 1 Outline Real numbers

Workshop 8.2a: Heterogeneity Murray Logan 23 Jul 2016 Section 1 Linear modelling assumptions

High Dimensional Data PCA So far we ve considered scalar data - PDF document

High Dimensional Data PCA So far we ve considered scalar data values f i (or We have n data points from m dimensions: interpolated/approximated each component of store as columns of an mxn matrix A vector values individually) We

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Robust PCA for High-Dimensional Data Huan Xu, Constantine Caramanis and Shie Mannor Talk by Shie

Big Data Management &amp; Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016 What is PCA?

Principal Components Analysis (PCA) Exploratory data analysis of high-dimensional data sets.

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Admixture model

PCA and admixture proportions for NGS data Anders Albrechtsen Admixture model NGSadmix

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

PCA and clustering STEPHANIE J. SPIELMAN, PHD BIO5312, FALL 2017 Exploratory methods for high-

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

Fractional-Tikhonov regularization on graphs (applied to signal and image restoration) by Davide

Interactions If a model contains terms u U + v V then UV interaction term is uv UV

Network Calculus: Reference Material: J.-Y. LeBoudec and Patrick Thiran: Network

2 / 16 Preliminaries Notaion and setup of Problem Notaion Some results Notaion Remark to

The inverse Berreman problem Bill Lionheart and Chris Newton School of Mathematics University of

Hybrid Steepest Descent Method for Variational Inequality Problem over Fixed Point Sets of

Linear algebra A brush-up course Anders Ringgaard Kristensen Slide 1 Outline Real numbers

Workshop 8.2a: Heterogeneity Murray Logan 23 Jul 2016 Section 1 Linear modelling assumptions

Big Data Management & Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015