Spectral distributions of high-dimensional sample correlation matrices under infinite variance Johannes Heiny Ruhr-University Bochum Joint work with Jianfeng Yao (HKU), Thomas Mikosch and Jorge Yslas (Copenhagen). Random Matrices and Complex Data Analysis Workshop, December 10-12, 2019, Shanghai J. Heiny Sample correlation & off-diagonal 1 / 30
Normalized histogram of eigenvalues and MP density 0.9 Histogram of eigenvalues 0.8 y = f γ (x) 0.7 0.6 0.5 y 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 x Figure: These are NOT spikes! J. Heiny Sample correlation & off-diagonal 2 / 30
Setup for the picture Data matrix X = X n : p × n matrix with iid centered entries and generic variable X d = X 11 . X = ( X it ) i =1 ,...,p ; t =1 ,...,n Sample covariance matrix S = 1 n XX ′ Ordered eigenvalues of S λ 1 ( S ) ≥ λ 2 ( S ) ≥ · · · ≥ λ p ( S ) Sample correlation matrix R = (diag( S )) − 1 / 2 S (diag( S )) − 1 / 2 . J. Heiny Sample correlation & off-diagonal 3 / 30
Regular variation Regular variation with index α > 0 : P ( | X | > x ) = x − α L ( x ) , where L is a slowly varying function. This implies E [ | X | α + ε ] = ∞ for any ε > 0 . Normalizing sequence ( a 2 np ) such that np P ( X 2 > a 2 np x ) → x − α/ 2 , as n → ∞ for x > 0 . 1 / α ℓ ( np ) for a slowly varying function ℓ . Then a np = ( np ) J. Heiny Sample correlation & off-diagonal 4 / 30
Reduction to Diagonal Diagonal X with iid regularly varying entries α ∈ (0 , 4) and p = n β with β ∈ [0 , 1] . We have np � XX ′ − diag( XX ′ ) � P a − 2 → 0 , where � · � denotes the spectral norm. n � ( XX ′ ) ij = X it X jt . t =1 J. Heiny Sample correlation & off-diagonal 5 / 30
Eigenvalues Weyl’s inequality � � � λ i ( A + B ) − λ i ( A ) � ≤ � B � . max i =1 ,...,p Choose A + B = XX ′ and A = diag( XX ′ ) to obtain � � � P a − 2 � λ i ( XX ′ ) − λ i (diag( XX ′ )) max → 0 , n → ∞ . np i =1 ,...,p Note: Limit theory for ( λ i ( S )) reduced to ( S ii ) . J. Heiny Sample correlation & off-diagonal 6 / 30
Heavy-tailed case Theorem (Heiny and Mikosch, 2016) X with iid regularly varying entries α ∈ (0 , 4) and p n = n β ℓ ( n ) with β ∈ [0 , 1] . 1 If β ∈ [0 , 1] , then � � � P a − 2 � λ i ( XX ′ ) − λ i (diag( XX ′ )) max → 0 . np i =1 ,...,p 2 If β ∈ (( α/ 2 − 1) + , 1] , then � � � P a − 2 � λ i ( XX ′ ) − X 2 max → 0 . np ( i ) ,np i =1 ,...,p J. Heiny Sample correlation & off-diagonal 7 / 30
Example: Eigenvalues Figure: Smoothed histogram based on 20000 simulations of the approximation error for the normalized eigenvalue a − 2 np λ 1 ( S ) for entries X it with α = 1 . 6 , β = 1 , n = 1000 and p = 200 . J. Heiny Sample correlation & off-diagonal 8 / 30
Eigenvectors v k unit eigenvector of S associated to λ k ( S ) Unit eigenvectors of diag( S ) are canonical basisvectors e j . Eigenvectors X with iid regularly varying entries with index α ∈ (0 , 4) and p n = n β ℓ ( n ) with β ∈ [0 , 1] . Then for any fixed k ≥ 1 , P � v k − e L k � ℓ 2 → 0 , n → ∞ . J. Heiny Sample correlation & off-diagonal 9 / 30
Localization vs. Delocalization Pareto data Normal Data 1.0 ● ● 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● Size of components Size of Components ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 −0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● −0.15 ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 0 50 100 150 200 Indices of components Indices of Components Figure: X ∼ Pareto (0 . 8) Figure: X ∼ N (0 , 1) Components of eigenvector v 1 . p = 200 , n = 1000 . J. Heiny Sample correlation & off-diagonal 10 / 30
Point Process of Normalized Eigenvalues Point process convergence p ∞ � � d N n = δ a − 2 → δ Γ − 2 /α = N np λ i ( XX ′ ) i i =1 i =1 The limit is a PRM on (0 , ∞ ) with mean measure µ ( x, ∞ ) = x − α/ 2 , x > 0 , and Γ i = E 1 + · · · + E i , ( E i ) iid standard exponential . J. Heiny Sample correlation & off-diagonal 11 / 30
Point Process of Normalized Eigenvalues Limiting distribution: For k ≥ 1 , n →∞ P ( a − 2 lim np λ k ≤ x ) = lim n →∞ P ( N n ( x, ∞ ) < k ) = P ( N ( x, ∞ ) < k ) � x − α/ 2 � s k − 1 � e − x − α/ 2 , = x > 0 . s ! s =0 J. Heiny Sample correlation & off-diagonal 12 / 30
Point Process of Normalized Eigenvalues Limiting distribution: For k ≥ 1 , n →∞ P ( a − 2 lim np λ k ≤ x ) = lim n →∞ P ( N n ( x, ∞ ) < k ) = P ( N ( x, ∞ ) < k ) � x − α/ 2 � s k − 1 � e − x − α/ 2 , = x > 0 . s ! s =0 Largest eigenvalue n λ 1 ( S ) d → Γ − α/ 2 , 1 a 2 np where the limit has a Fr´ echet distribution with parameter α/ 2 . Soshnikov ( 2006 ), Auffinger et al. ( 2009 ), Auffinger and Tang ( 2016 ), Davis et al. ( 2014 , 2016 2 ), JH and Mikosch ( 2016 ) J. Heiny Sample correlation & off-diagonal 12 / 30
α = 3 . 99 α = 3 . 99 , n = 2000 , p = 1000 J. Heiny Sample correlation & off-diagonal 13 / 30
α = 3 α = 3 , n = 2000 , p = 1000 J. Heiny Sample correlation & off-diagonal 14 / 30
α = 2 . 1 α = 2 . 1 , n = 10000 , p = 1000 J. Heiny Sample correlation & off-diagonal 15 / 30
Infinite variance, α < 2 Limiting spectral distribution of ( XX ′ ) under E [ X 2 ] = ∞ : Regular variation with α < 2 : n + p XX ′ → G γ F a − 2 α weakly , whose density g γ α satisfies α ( x ) ∼ c x − 1 − α/ 2 , g γ x → ∞ . Ben Arous and Guionnet (2008), Belinschi et al. (2009) J. Heiny Sample correlation & off-diagonal 16 / 30
Moments of LSD Assumption: X symmetric and regularly varying with index α ∈ (0 , 2) . Goal: For k ≥ 1 , find the limit of � � � = 1 x k F R ( dx ) p E [tr( R k )] E J. Heiny Sample correlation & off-diagonal 17 / 30
Moments of LSD One has p n � � E [tr( R k )] = E [ Y i 1 t 1 Y i 2 t 1 · · · Y i k t k Y i 1 t k ] . i 1 ,...,i k =1 t 1 ,...,t k =1 � �� � := F ( i 1 ,...,i k ) Assumption: X symmetric ⇒ Y ij symmetric X ij √ � n Y ij = t =1 X 2 it J. Heiny Sample correlation & off-diagonal 18 / 30
Moments of LSD k − 2 r − 2 � � p E [tr( R k )] → β k ( γ ) + 2 1 γ r − 1 (Γ(1 − α/ 2)) − r + q +1 α r =2 q =0 � r − q � t ⋆ ( � I ) � � s � � � � Γ( d i ( � α/ 2 I, T )) Γ(1 − α/ 2) Γ( N i ( � I )) s =1 i =1 I ∈C ( q ) I | ( � T ∈C s, | � I ) r,k � m it ( � � � I, T ) − α Γ . 2 ( i,t ) ∈ ∆( � I,T ) J. Heiny Sample correlation & off-diagonal 19 / 30
J. Heiny Sample correlation & off-diagonal 20 / 30
J. Heiny Sample correlation & off-diagonal 21 / 30
Motivation Random walk S n = X 1 + · · · + X n , n ≥ 1 . ( X i ) are iid random variables with generic element X . 1 E [ X ] = 0 and E [ X 2 ] = 1 . 2 Dimension p = p n → ∞ Consider iid copies ( S ( i ) n ) i ≤ p of S n and define the point process p � N n = δ d p ( S ( i ) n / √ n − d p ) . i =1 J. Heiny Sample correlation & off-diagonal 22 / 30
We want to prove: p � d N n = δ d p ( S ( i ) → N , n → ∞ , n / √ n − d p ) i =1 where N is a Poisson random measure with mean measure µ ( x, ∞ ) = e − x , x ∈ R , and � 2 log p − log log p + log 4 π d p = . 2(2 log p ) 1 / 2 J. Heiny Sample correlation & off-diagonal 23 / 30
Recommend
More recommend