Statistical inference in a spiked population model Jian-feng Yao Joint work with Weiming Li (Beijing), Damien Passemier (Rennes)
Overview 1 Spiked eigenvalues: an example 2 Inference on spikes: determination of their number q 0 Known results on spiked population Estimator of q 0 Discussions on the estimator � q 0 Application to S&P stocks data 3 Inference of the bulk spectrum The problem and existing methods A generalized expectation based method Asymptotic properties of the GEE estimator Application to S&P 500 stocks data
1) Spiked eigenvalues: an example ◮ SP 500 daily stock prices ; p = 488 stocks; ◮ n = 1000 daily returns r t ( i ) = log p t ( i ) / p t − 1 ( i ) from 2007-09-24 to 2011-09-12;
The sample correlation matrix ◮ Let the SCM (488 × 488) n � S n = 1 r ) T . ( r t − ¯ r )( r t − ¯ n t =1 ◮ We consider the sample correlation matrix R n with S n ( i , j ) R n ( i , j ) = [ S n ( i , i ) S n ( j , j )] 1 / 2 . ◮ The 10 largest and 10 smallest eigenvalues of R n are: 237.95801 4.8568703 ... 0.0212137 0.0178129 17.762811 4.394394 ... 0.0205001 0.0173591 14.002838 3.4999069 ... 0.0198287 0.0164425 8.7633113 3.0880089 ... 0.0194216 0.0154849 5.2995321 2.7146658 ... 0.0190959 0.0147696
Plots of sample eigenvalues Left: 488 - 1 = 487 eigenvalues right: 488 - 10 = 478 eigenvalues = ⇒ the point: sample eigenvalues = bulk + spikes = ⇒ Analysis and estimation of spikes + bulk
A generic model Random factor model q 0 � x t = a k s t ( t ) + ε t = A s t + ε t , k =1 ◮ s t = ( s t (1) , . . . , s t ( q 0 )) ∈ R q 0 are q 0 < p standardised random signals/factors, ◮ A = ( a 1 , . . . , a q 0 ), p × q 0 deterministic matrix of factor loadings ◮ ε t is an independent p -dimensional noise sequence, with a diagonal covariance matrix: Ψ = cov( ε t ) = diag { σ 2 1 , . . . , σ 2 p } . Therefore, Σ = cov( x t ) = AA ∗ + Ψ . ◮ this model is very old; has wide range of application fields: psychology, chemometrics, signal processing, economics, etc.
2). Inference on spikes a). Known results Spiked population model Population covariance matrix : Cov[x t ] = AA ∗ + σ 2 I p , Σ = with eigenvalues spec(Σ) = ( σ 2 + α ′ 1 , . . . , σ 2 + α ′ q 0 , σ 2 , . . . , σ 2 ) , � �� � p − q 0 where ◮ α ′ 1 ≥ α ′ 2 ≥ · · · ≥ α ′ q 0 > 0 are non null eigenvalues of AA ∗ , or equivalently spec(Σ) = σ 2 × ( α 1 , . . . , α q 0 , 1 , . . . , 1 ) , � �� � p − q 0 with i /σ 2 . α i = 1 + α ′
Asymptotic framework and assumptions 1 p , n → + ∞ such that p / n → c ; 2 The population covariance matrix has K spikes α 1 > · · · > α K with respective multiplicity numbers n i , i.e. spec(Σ) = σ 2 ( α 1 , . . . , α 1 , α 2 , . . . , α 2 , . . . , α K , · · · , α K , 1 , · · · , 1 ); � �� � � �� � � �� � � �� � n 1 n 2 n K p − q 0 [ n 1 + · · · + n K = q 0 ]; 3 α K > 1 + √ c ( detection level ). 4 E ( | x 4 ij | ) < + ∞ .
Convergence of spike eigenvalues � n S n = 1 i =1 x i x ∗ Consider the sample covariance matrix i , with sample n eigenvalues: λ n , 1 ≥ λ n , 2 ≥ · · · ≥ λ n , p . Proposition (Baik and Silverstein - 2006) Let s i = n 1 + · · · + n i for 1 ≤ i ≤ K. Then ◮ For each k ∈ { 1 , . . . , K } and s k − 1 < j ≤ s k almost surely, c α k λ n , j − → ψ ( α k ) = α k + α k − 1 ; ◮ For all 1 ≤ i ≤ L with a prefixed range L almost surely, λ n , q 0 + i → b = (1 + √ c ) 2 . Note. This result has been extended for more general spikes by Bai & Y., Benaych-Georges & Nadakuditi.
b) Estimator of q 0 (number of spikes) ◮ Based on these results, we observe that when all the spikes are simple, i.e. n j ≡ 1, the spacings r > 0 ∀ j ≤ q 0 δ n , j = λ n , j − λ n , j +1 → 0 ∀ j > q 0 ◮ it is possible to detect q 0 form index-number j where δ n , j becomes small (case of simple spikes). Our estimator is define by ˆ q n = min { j ∈ { 1 , . . . , s } : δ n , j +1 < d n } , (1) where ( d n ) n is a sequence to be defined and s > q 0 is a fixed number.
Consistency of ˆ q n : case of simple spikes Assume ◮ All spikes are different (simple spike case); ◮ σ 2 = 1 (if not, take δ n , j /σ 2 ); and 5 Entries have sub-Gaussian tails: for some positive D , D ′ we have for all t ≥ D’, P ( | x ij | ≥ t D ) ≤ e − t . Theorem [Passemier & Y. 2011] Under Assumptions (1)-(5) and in the simple spikes case, if d n → 0 such that n 2 / 3 d n → + ∞ then P (ˆ q n = q 0 ) → 1 .
Proof (idea) � P (ˆ q n = q 0 ) = 1 − P { δ n , j < d n } ∪ { δ n , q 0 +1 ≥ d n } 1 ≤ j ≤ q 0 q 0 � ≥ 1 − P ( δ n , j < d n ) − P ( δ n , q 0 +1 ≥ d n ) . ( ∗ ) j =1 The terms in the sum converge to zero as d n → 0 and δ n , j → r > 0. For the last term P ( n 2 / 3 ( λ n , q 0 +1 − λ n , q 0 +2 ) ≤ n 2 / 3 d n ) 1 − ( ∗ ) = �� � � �� | Y n , 1 | ≤ n 2 / 3 d n | Y n , 2 | ≤ n 2 / 3 d n ≥ ∩ P 2 β 2 β where Y is a tight sequence by the next proposition, and n 2 / 3 d n / 2 β → + ∞ , so 1 − ( ∗ ) → 1.
Proof (an additional important ingredient) An (partial) extension of Tracy-Widom law in presence of spikes: Theorem (Benaych-Georges, Guionnet, Maida - 2010) Under the above assumptions, for all 1 ≤ i ≤ L with a prefixed range L 2 Y n , i = n 3 β ( λ n , q 0 + i − b ) = O P (1) √ where β = (1 + √ c )(1 + 1 3 . c − 1 )
Case of multiple spikes ◮ spacings δ n , j → 0 from a same spike can also tend to 0; ◮ Confusion may be possible between these spacings and those from the bulk eigenvalues; ◮ Hopefully, fluctuations of both type of spacings have different rates: ≃ n − 2 / 3 . n − 1 / 2 v.s. Theorem (Bai and Y. (2008)) Under Assumptions (1)-(4) (2), the n k -dimensional real vector √ n { λ n , j − φ ( α k ) , j ∈ { s k − 1 + 1 , . . . , s k }} converges weakly to the distribution of the n k eigenvalues of a Gaussian random matrix whose covariance depend of α k and c. [ related works are from Baik-Ben-Arous-Pˆ ech´ e, Paul ]
Consistency of ˆ q n : case of multiple spikes The previous theorem of Bai and Y. implies: ◮ If α j = α j +1 , convergence in O P ( n − 1 / 2 ); ◮ For unit eigenvalues, faster convergence in O P ( n − 2 / 3 ). This allows us to use the same estimator provided we use a new threshold d n . Theorem (Passemier & Y. (2011)) Under the above assumptions, if d n = o ( n − 1 / 2 ) , n 2 / 3 d n → + ∞ , and then P (ˆ q n = q 0 ) → 1 .
Simulation experiments We decided to use another version of our estimator which performs better q ∗ ˆ n = min { j ∈ { 1 , . . . , s } : δ n , j +1 < d n and δ n , j +2 < d n } Threshold sequence: d n = Cn − 2 / 3 √ 2 log log n , where C is a constant to be adjusted for each case (Idea: law of the iterated logarithm for λ n , j , j ≤ q 0 .).
Simulation experiments ◮ Performance measure: empirical false detection rates over 500 independent replications P (˜ q n � = q 0 ) ◮ Simulation design: • q 0 : number of spikes; • ( α i ) 1 ≤ i ≤ q 0 : spikes; • p : dimension of the vectors; • n : sample size; • c = p / n ; • σ 2 = 1 given or to be estimated; • C : constant in d n .
Experimental design
c) Discussions - Comparison with an estimator by Kritchman and Nadler In the non-spikes case ( q 0 = 0), n S n ∼ W p (I , n ). In this case Proposition (Johnstone - 2001) � � λ n , 1 < σ 2 β n , p n 2 / 3 s + b → F 1 ( s ) P where F 1 is the Tracy-Widom distribution of order 1 and � � 1 3 . β n , p = (1 + p / n )(1 + n / p ) To distinguish a spike eigenvalue λ n , k from a non-spike one at an asymptotic significance level γ , their idea is to check whether � β n , p − k � λ n , k > σ 2 n 2 / 3 s ( γ ) + b where s ( γ ) verifies F 1 ( s ( γ )) = 1 − γ . Their estimator is � � β n , p − k �� σ 2 ˜ q n = argmin λ n , k < � n 2 / 3 s ( γ ) + b − 1. k
c) Discussions - on the tuning parameter C ◮ C has been tuned manually in each case ; ◮ For real applications, need a procedure to choose this constant; ◮ Idea: use Wishart distributions as a benchmark to calibrate C ; ◮ consider the gap between two largest eigenvalues: ˜ λ 1 − ˜ λ 2
Cont’d ◮ By simulation to get empirical distribution of ˜ λ 1 − ˜ λ 2 ; 500 independent replications. ◮ compute the upper 5% quantile s : P (˜ λ 1 − ˜ λ 2 ≤ s ) ≃ = 0 . 95 . ◮ Define a value � C = sn 2 / 3 / ˜ 2 × log log( n ) . Results:
Assessment of the automated value ˜ C with c = 10 ◮ ˜ C > tuned C slightly ; ◮ Using ˜ C − → only a small drop of performance ; ◮ higher error rates in the case of equal factors for moderate sample sizes
Application to S&P stocks data ◮ Estimated number of factors: � q 0 = 17; σ 2 = 0 . 3616. ◮ Residual variance: �
Recommend
More recommend