Lecture 2. Random Matrix Theory and Phase Transitions of PCA Yuan Yao Hong Kong University of Science and Technology February 26, 2020
Outline Recall: Horn’s Parallel Analysis of PCA Random Matrix Theory Phase Transitions of PCA Recall: Horn’s Parallel Analysis of PCA 2
How many components of PCA? ◮ Data matrix: X = [ x 1 | x 2 | · · · | x n ] ∈ R p × n ◮ Centering data matrix: Y = XH where H = I − 1 n 1 · 1 T ◮ PCA is given by top left singular vectors of Y = USV T (called loading vectors) by projections to R p , z j = u j Y ◮ MDS is given by top right singular vectors of Y = USV T as Euclidean embedding coordinates of n sample points ◮ But how many components shall we keep? Recall: Horn’s Parallel Analysis of PCA 3
Recall: Horn’s Parallel Analysis ◮ Data matrix: X = [ x 1 | x 2 | · · · | x n ] ∈ R p × n X 1 , 1 X 1 , 2 · · · X 1 ,n X 2 , 1 X 2 , 2 · · · X 2 ,n X = . . . . ... . . . . . . · · · X p, 1 X p, 2 X p,n ◮ Compute its principal eigenvalues { ˆ λ i } i =1 ,...,p Recall: Horn’s Parallel Analysis of PCA 4
Recall: Horn’s Parallel Analysis ◮ Randomly take p permutations of n numbers π 1 , . . . , π p ∈ S n (usually π 1 is set as identity), noting that sample means are permutation invariant, X 1 ,π 1 (1) X 1 ,π 1 (2) · · · X 1 ,π 1 ( n ) · · · X 2 ,π 2 (1) X 2 ,π 2 (2) X 2 ,π 2 ( n ) X 1 = . . . . ... . . . . . . X p,π p (1) X p,π p (2) · · · X p,π p ( n ) ◮ Compute its principal eigenvalues { ˆ λ 1 i } i =1 ,...,p . ◮ Repeat such procedure for r times, we can get r sets of principal eigenvalues. { ˆ λ k i } i =1 ,...,p for k = 1 , . . . , r Recall: Horn’s Parallel Analysis of PCA 5
Recall: Horn’s Parallel Analysis (continued) ◮ For each i = 1 , define the i -th p -value as the percentage of random eigenvalues { ˆ i } k =1 ,...,r that exceed the i -th principal eigenvalue ˆ λ k λ i of the original data X , pval i = 1 r # { ˆ i > ˆ λ k λ i : k = 1 , . . . , r } . ◮ Setup a threshold q , e.g. q = 0 . 05 , and only keep those principal eigenvalues ˆ λ i such that pval i < q Recall: Horn’s Parallel Analysis of PCA 6
Example ◮ Let’s look at an example of Parallel Analysis – R: https://github.com/yuany-pku/2017_CSIC5011/blob/ master/slides/paran.R – Matlab: papca.m – Python: Recall: Horn’s Parallel Analysis of PCA 7
How does it work? ◮ We are going to introduce an analysis based on Random Matrix Theory for rank-one spike model Recall: Horn’s Parallel Analysis of PCA 8
How does it work? ◮ We are going to introduce an analysis based on Random Matrix Theory for rank-one spike model ◮ There is a phase transition in principal component analysis Recall: Horn’s Parallel Analysis of PCA 8
How does it work? ◮ We are going to introduce an analysis based on Random Matrix Theory for rank-one spike model ◮ There is a phase transition in principal component analysis – If the signal is strong, principal eigenvalues are beyond the random spectrum and principal components are correlated with signal Recall: Horn’s Parallel Analysis of PCA 8
How does it work? ◮ We are going to introduce an analysis based on Random Matrix Theory for rank-one spike model ◮ There is a phase transition in principal component analysis – If the signal is strong, principal eigenvalues are beyond the random spectrum and principal components are correlated with signal – If the signal is weak, all eigenvalues in PCA are due to random noise Recall: Horn’s Parallel Analysis of PCA 8
Outline Recall: Horn’s Parallel Analysis of PCA Random Matrix Theory Phase Transitions of PCA Random Matrix Theory 9
Marˇ cenko-Pastur Distribution of Noise Eigenvalues ◮ Let x i ∼ N (0 , I p ) ( i = 1 , . . . , n ) and X = [ x 1 , x 2 , . . . , x n ] ∈ R p × n . ◮ The sample covariance matrix Σ n = 1 nXX T . � is called Wishart (random) matrix. ◮ When both n and p grow at p n → γ � = 0 , the distribution of the eigenvalues of � Σ n follows the Marˇ ccenko-Pastur (MP) Law � � � ∈ [ a, b ] , 0 t / 1 − 1 µ MP ( t ) = √ δ ( x ) I ( γ > 1) + ( b − t )( t − a ) γ dt t ∈ [ a, b ] , 2 πγt where a = (1 − √ γ ) 2 , b = (1 + √ γ ) 2 . Random Matrix Theory 10
Illustration of MP Law ◮ If γ ≤ 1 , MP distribution has a support on [ a, b ] ; ◮ if γ > 1 , it has an additional point mass 1 − 1 /γ at the origin. (a) (b) Figure: Show by matlab: (a) Marˇ cenko-Pastur distribution with γ = 2 . (b) Marˇ cenko-Pastur distribution with γ = 0 . 5 . Random Matrix Theory 11
Outline Recall: Horn’s Parallel Analysis of PCA Random Matrix Theory Phase Transitions of PCA Phase Transitions of PCA 12
Rank-one Spike Model Consider the following rank-1 signal-noise model Y = X + ε, where ◮ the signal lies in an one-dimensional subspace X = αu with α ∼ N (0 , σ 2 X ) ; ◮ the noise ε ∼ N (0 , σ 2 ε I p ) is i.i.d. Gaussian. Therefore Y ∼ N (0 , Σ) where the limiting covariance matrix Σ is rank-one added by a sparse matrix: X uu T + σ 2 Σ = σ 2 ε I p . Phase Transitions of PCA 13
When does PCA work? ◮ Can we recover signal direction u from principal component analysis on noisy measurements Y ? ◮ It depends on the signal noise ratio, defined as SNR = R := σ 2 X . σ 2 ε For simplicity we assume that σ 2 ε = 1 without loss of generality. Phase Transitions of PCA 14
Phase Transition of PCA ◮ Consider the scenario p γ = lim n. (1) p,n →∞ as in applications, one never has infinite amount of samples and dimensionality ◮ A fundamental result by I. Johnstone in 2006 shows a phase transition of PCA: Phase Transitions of PCA 15
Phase Transitions ◮ The primary (largest) eigenvalue of sample covariance matrix satisfies � (1 + √ γ ) 2 = b, X ≤ √ γ σ 2 λ max ( � Σ n ) → X > √ γ (2) γ (1 + σ 2 σ 2 X )(1 + X ) , σ 2 ◮ The primary eigenvector (principal component) associated with the largest eigenvalue converges to X ≤ √ γ σ 2 0 |� u, v max �| 2 → γ 1 − (3) X > √ γ σ 4 σ 2 , X γ 1+ σ 2 X Phase Transitions of PCA 16
Phase Transitions (continued) In other words, X > √ γ , the primary eigenvalue ◮ If the signal is strong SNR = σ 2 goes beyond the random spectrum (upper bound of MP distribution), and the primary eigenvector is correlated with signal (in a cone around the signal direction whose deviation angle goes to 0 as σ 2 X /γ → ∞ ); X ≤ √ γ , the primary eigenvalue is ◮ If the signal is weak SNR = σ 2 buried in the random spectrum, and the primary eigenvector is random of no correlation with the signal. Phase Transitions of PCA 17
Proof in Sketch ◮ Following the rank-1 model, consider random vectors y i ∼ N (0 , Σ) x uu T + σ 2 ( i = 1 , . . . , n ), where Σ = σ 2 ε I p and u is an arbitrarily chosen unit vector ( � u � 2 = 1 ) showing the signal direction. � n ◮ The sample covariance matrix is ˆ Σ n = 1 i = 1 i =1 y i y T n Y Y T n where Y = [ y 1 , . . . , y n ] ∈ R p × n . Suppose one of its eigenvalue is ˆ λ v , so ˆ and the corresponding unit eigenvector is ˆ Σ n ˆ v = λ ˆ v . ◮ First of all, we relate the ˆ λ to the MP distribution by the trick: z i = Σ − 1 2 y i → Z i ∼ N (0 , I p ) . (4) � n n ZZ T ( Z = [ z 1 , . . . , z n ] ) is a Wishart Then S n = 1 i =1 z i z T i = 1 n random matrix whose eigenvalues follow the Marˇ cenko-Pastur distribution. Phase Transitions of PCA 18
Proof in Sketch ◮ Notice that Σ n = 1 nY Y T = Σ 1 / 2 ( 1 nZZ T )Σ 1 / 2 = Σ ˆ 1 1 2 S n Σ 2 and (ˆ v ) is eigenvalue-eigenvector pair of matrix ˆ λ, ˆ Σ n . Therefore 1 1 v = ˆ v ⇒ S n Σ(Σ − 1 v ) = ˆ λ (Σ − 1 2 S n Σ 2 ˆ 2 ˆ 2 ˆ Σ λ ˆ v ) (5) In other words, ˆ λ and Σ − 1 2 ˆ v are the eigenvalue and eigenvector of matrix S n Σ . ◮ Suppose c Σ − 1 2 ˆ v = v where the constant c makes v a unit eigenvector and thus satisfies, c 2 = c ˆ v T ˆ v = v T Σ v = v T ( σ 2 x uu T + σ 2 x ( u T v ) 2 + σ 2 ε ) = R ( u T v ) 2 +1 . ε ) v = σ 2 (6) Phase Transitions of PCA 19
Proof in Sketch Now we have, S n Σ v = ˆ λv. (7) Plugging in the expression of Σ , it gives X uu T + σ 2 ε I p ) v = ˆ S n ( σ 2 λv Rearrange the term with u to one side, we got X S n u ( u T v ) (ˆ λI p − σ 2 ε S n ) v = σ 2 Assuming that ˆ λI p − σ 2 ε S n is invertible, then multiple its reversion at both sides of the equality, we get, ε S n ) − 1 · S n u ( u T v ) . X · (ˆ v = σ 2 λI p − σ 2 (8) Phase Transitions of PCA 20
Primary Eigenvalue ˆ λ ◮ Multiply (8) by u T at both side, u T v = σ 2 X · u T (ˆ ε S n ) − 1 S n u · ( u T v ) λI p − σ 2 that is, if u T v � = 0 , X · u T (ˆ 1 = σ 2 λI p − σ 2 ε S n ) − 1 S n u (9) Phase Transitions of PCA 21
Recommend
More recommend