statistical inference in a spiked population model jian
play

Statistical inference in a spiked population model Jian-feng Yao - PowerPoint PPT Presentation

Statistical inference in a spiked population model Jian-feng Yao Joint work with Weiming Li (Beijing), Damien Passemier (Rennes) Overview 1 Spiked eigenvalues: an example 2 Inference on spikes: determination of their number q 0 Known results on


  1. Statistical inference in a spiked population model Jian-feng Yao Joint work with Weiming Li (Beijing), Damien Passemier (Rennes)

  2. Overview 1 Spiked eigenvalues: an example 2 Inference on spikes: determination of their number q 0 Known results on spiked population Estimator of q 0 Discussions on the estimator � q 0 Application to S&P stocks data 3 Inference of the bulk spectrum The problem and existing methods A generalized expectation based method Asymptotic properties of the GEE estimator Application to S&P 500 stocks data

  3. 1) Spiked eigenvalues: an example ◮ SP 500 daily stock prices ; p = 488 stocks; ◮ n = 1000 daily returns r t ( i ) = log p t ( i ) / p t − 1 ( i ) from 2007-09-24 to 2011-09-12;

  4. The sample correlation matrix ◮ Let the SCM (488 × 488) n � S n = 1 r ) T . ( r t − ¯ r )( r t − ¯ n t =1 ◮ We consider the sample correlation matrix R n with S n ( i , j ) R n ( i , j ) = [ S n ( i , i ) S n ( j , j )] 1 / 2 . ◮ The 10 largest and 10 smallest eigenvalues of R n are: 237.95801 4.8568703 ... 0.0212137 0.0178129 17.762811 4.394394 ... 0.0205001 0.0173591 14.002838 3.4999069 ... 0.0198287 0.0164425 8.7633113 3.0880089 ... 0.0194216 0.0154849 5.2995321 2.7146658 ... 0.0190959 0.0147696

  5. Plots of sample eigenvalues Left: 488 - 1 = 487 eigenvalues right: 488 - 10 = 478 eigenvalues = ⇒ the point: sample eigenvalues = bulk + spikes = ⇒ Analysis and estimation of spikes + bulk

  6. A generic model Random factor model q 0 � x t = a k s t ( t ) + ε t = A s t + ε t , k =1 ◮ s t = ( s t (1) , . . . , s t ( q 0 )) ∈ R q 0 are q 0 < p standardised random signals/factors, ◮ A = ( a 1 , . . . , a q 0 ), p × q 0 deterministic matrix of factor loadings ◮ ε t is an independent p -dimensional noise sequence, with a diagonal covariance matrix: Ψ = cov( ε t ) = diag { σ 2 1 , . . . , σ 2 p } . Therefore, Σ = cov( x t ) = AA ∗ + Ψ . ◮ this model is very old; has wide range of application fields: psychology, chemometrics, signal processing, economics, etc.

  7. 2). Inference on spikes a). Known results Spiked population model Population covariance matrix : Cov[x t ] = AA ∗ + σ 2 I p , Σ = with eigenvalues spec(Σ) = ( σ 2 + α ′ 1 , . . . , σ 2 + α ′ q 0 , σ 2 , . . . , σ 2 ) , � �� � p − q 0 where ◮ α ′ 1 ≥ α ′ 2 ≥ · · · ≥ α ′ q 0 > 0 are non null eigenvalues of AA ∗ , or equivalently spec(Σ) = σ 2 × ( α 1 , . . . , α q 0 , 1 , . . . , 1 ) , � �� � p − q 0 with i /σ 2 . α i = 1 + α ′

  8. Asymptotic framework and assumptions 1 p , n → + ∞ such that p / n → c ; 2 The population covariance matrix has K spikes α 1 > · · · > α K with respective multiplicity numbers n i , i.e. spec(Σ) = σ 2 ( α 1 , . . . , α 1 , α 2 , . . . , α 2 , . . . , α K , · · · , α K , 1 , · · · , 1 ); � �� � � �� � � �� � � �� � n 1 n 2 n K p − q 0 [ n 1 + · · · + n K = q 0 ]; 3 α K > 1 + √ c ( detection level ). 4 E ( | x 4 ij | ) < + ∞ .

  9. Convergence of spike eigenvalues � n S n = 1 i =1 x i x ∗ Consider the sample covariance matrix i , with sample n eigenvalues: λ n , 1 ≥ λ n , 2 ≥ · · · ≥ λ n , p . Proposition (Baik and Silverstein - 2006) Let s i = n 1 + · · · + n i for 1 ≤ i ≤ K. Then ◮ For each k ∈ { 1 , . . . , K } and s k − 1 < j ≤ s k almost surely, c α k λ n , j − → ψ ( α k ) = α k + α k − 1 ; ◮ For all 1 ≤ i ≤ L with a prefixed range L almost surely, λ n , q 0 + i → b = (1 + √ c ) 2 . Note. This result has been extended for more general spikes by Bai & Y., Benaych-Georges & Nadakuditi.

  10. b) Estimator of q 0 (number of spikes) ◮ Based on these results, we observe that when all the spikes are simple, i.e. n j ≡ 1, the spacings   r > 0 ∀ j ≤ q 0 δ n , j = λ n , j − λ n , j +1 →  0 ∀ j > q 0 ◮ it is possible to detect q 0 form index-number j where δ n , j becomes small (case of simple spikes). Our estimator is define by ˆ q n = min { j ∈ { 1 , . . . , s } : δ n , j +1 < d n } , (1) where ( d n ) n is a sequence to be defined and s > q 0 is a fixed number.

  11. Consistency of ˆ q n : case of simple spikes Assume ◮ All spikes are different (simple spike case); ◮ σ 2 = 1 (if not, take δ n , j /σ 2 ); and 5 Entries have sub-Gaussian tails: for some positive D , D ′ we have for all t ≥ D’, P ( | x ij | ≥ t D ) ≤ e − t . Theorem [Passemier & Y. 2011] Under Assumptions (1)-(5) and in the simple spikes case, if d n → 0 such that n 2 / 3 d n → + ∞ then P (ˆ q n = q 0 ) → 1 .

  12. Proof (idea)    �  P (ˆ q n = q 0 ) = 1 − P { δ n , j < d n } ∪ { δ n , q 0 +1 ≥ d n } 1 ≤ j ≤ q 0 q 0 � ≥ 1 − P ( δ n , j < d n ) − P ( δ n , q 0 +1 ≥ d n ) . ( ∗ ) j =1 The terms in the sum converge to zero as d n → 0 and δ n , j → r > 0. For the last term P ( n 2 / 3 ( λ n , q 0 +1 − λ n , q 0 +2 ) ≤ n 2 / 3 d n ) 1 − ( ∗ ) = �� � � �� | Y n , 1 | ≤ n 2 / 3 d n | Y n , 2 | ≤ n 2 / 3 d n ≥ ∩ P 2 β 2 β where Y is a tight sequence by the next proposition, and n 2 / 3 d n / 2 β → + ∞ , so 1 − ( ∗ ) → 1.

  13. Proof (an additional important ingredient) An (partial) extension of Tracy-Widom law in presence of spikes: Theorem (Benaych-Georges, Guionnet, Maida - 2010) Under the above assumptions, for all 1 ≤ i ≤ L with a prefixed range L 2 Y n , i = n 3 β ( λ n , q 0 + i − b ) = O P (1) √ where β = (1 + √ c )(1 + 1 3 . c − 1 )

  14. Case of multiple spikes ◮ spacings δ n , j → 0 from a same spike can also tend to 0; ◮ Confusion may be possible between these spacings and those from the bulk eigenvalues; ◮ Hopefully, fluctuations of both type of spacings have different rates: ≃ n − 2 / 3 . n − 1 / 2 v.s. Theorem (Bai and Y. (2008)) Under Assumptions (1)-(4) (2), the n k -dimensional real vector √ n { λ n , j − φ ( α k ) , j ∈ { s k − 1 + 1 , . . . , s k }} converges weakly to the distribution of the n k eigenvalues of a Gaussian random matrix whose covariance depend of α k and c. [ related works are from Baik-Ben-Arous-Pˆ ech´ e, Paul ]

  15. Consistency of ˆ q n : case of multiple spikes The previous theorem of Bai and Y. implies: ◮ If α j = α j +1 , convergence in O P ( n − 1 / 2 ); ◮ For unit eigenvalues, faster convergence in O P ( n − 2 / 3 ). This allows us to use the same estimator provided we use a new threshold d n . Theorem (Passemier & Y. (2011)) Under the above assumptions, if d n = o ( n − 1 / 2 ) , n 2 / 3 d n → + ∞ , and then P (ˆ q n = q 0 ) → 1 .

  16. Simulation experiments We decided to use another version of our estimator which performs better q ∗ ˆ n = min { j ∈ { 1 , . . . , s } : δ n , j +1 < d n and δ n , j +2 < d n } Threshold sequence: d n = Cn − 2 / 3 √ 2 log log n , where C is a constant to be adjusted for each case (Idea: law of the iterated logarithm for λ n , j , j ≤ q 0 .).

  17. Simulation experiments ◮ Performance measure: empirical false detection rates over 500 independent replications P (˜ q n � = q 0 ) ◮ Simulation design: • q 0 : number of spikes; • ( α i ) 1 ≤ i ≤ q 0 : spikes; • p : dimension of the vectors; • n : sample size; • c = p / n ; • σ 2 = 1 given or to be estimated; • C : constant in d n .

  18. Experimental design

  19. c) Discussions - Comparison with an estimator by Kritchman and Nadler In the non-spikes case ( q 0 = 0), n S n ∼ W p (I , n ). In this case Proposition (Johnstone - 2001) � � λ n , 1 < σ 2 β n , p n 2 / 3 s + b → F 1 ( s ) P where F 1 is the Tracy-Widom distribution of order 1 and � � 1 3 . β n , p = (1 + p / n )(1 + n / p ) To distinguish a spike eigenvalue λ n , k from a non-spike one at an asymptotic significance level γ , their idea is to check whether � β n , p − k � λ n , k > σ 2 n 2 / 3 s ( γ ) + b where s ( γ ) verifies F 1 ( s ( γ )) = 1 − γ . Their estimator is � � β n , p − k �� σ 2 ˜ q n = argmin λ n , k < � n 2 / 3 s ( γ ) + b − 1. k

  20. c) Discussions - on the tuning parameter C ◮ C has been tuned manually in each case ; ◮ For real applications, need a procedure to choose this constant; ◮ Idea: use Wishart distributions as a benchmark to calibrate C ; ◮ consider the gap between two largest eigenvalues: ˜ λ 1 − ˜ λ 2

  21. Cont’d ◮ By simulation to get empirical distribution of ˜ λ 1 − ˜ λ 2 ; 500 independent replications. ◮ compute the upper 5% quantile s : P (˜ λ 1 − ˜ λ 2 ≤ s ) ≃ = 0 . 95 . ◮ Define a value � C = sn 2 / 3 / ˜ 2 × log log( n ) . Results:

  22. Assessment of the automated value ˜ C with c = 10 ◮ ˜ C > tuned C slightly ; ◮ Using ˜ C − → only a small drop of performance ; ◮ higher error rates in the case of equal factors for moderate sample sizes

  23. Application to S&P stocks data ◮ Estimated number of factors: � q 0 = 17; σ 2 = 0 . 3616. ◮ Residual variance: �

Recommend


More recommend