log covariance matrix estimation
play

Log Covariance Matrix Estimation Xinwei Deng Department of - PowerPoint PPT Presentation

Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed Log-ME Method


  1. Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1

  2. Outline • Background and Motivation • The Proposed Log-ME Method • Simulation and Real Example • Summary and Discussion 2

  3. Background • Covariance matrix estimation is important in multivariate analysis and many statistical applications. • Suppose x 1 ,..., x n are i.i.d. p -dimensional random vectors ∼ N ( 0 , Σ ) . Let ′ S = ∑ n i / n be the sample covariance matrix. The negative log-likelihood i = 1 x i x function is proportional to L n ( Σ ) = − log | Σ − 1 | + tr [ Σ − 1 S ] . (1) • Recent interests of p is large or p ≈ n . S is not a stable estimate. – The largest eigenvalues of S overly estimate the true eigenvalues. – When p > n , S is singular and the smallest eigenvalue is zero. How to estimate Σ − 1 ? 3

  4. Recent Estimation Methods on Σ or Σ − 1 • Reduce number of nonzeros estimates of Σ or Σ − 1 . – Σ : Bickel and Levina (2008), using thresholding. – Σ − 1 : Yuan and Lin (2007), l 1 penalty on Σ − 1 . Friedman et al., (2008), Graphical Lasso. Meinshausen and Buhlmann (2006), Reformulated as regression. • Shrinkage estimates of the covariance matrix. – Ledoit and Wolf (2006), ρΣ +( 1 − ρ ) µI . – Won et al. (2009), control the condition number (largest eigenvalue/smallest eigenvalue). 4

  5. Motivation • Estimate of Σ or Σ − 1 needs to be positive definite. – The mathematical restriction makes the covariance matrix estimation problem challenging. • Any positive definite Σ can be expressed as a matrix exponential of a real symmetric matrix A . Σ = exp ( A ) = I + A + A 2 2! + ··· – Expressing the likelihood function in terms of A ≡ log ( Σ ) releases the mathematical restriction. • Consider the spectral decomposition of Σ = TDT ′ with D = diag ( d 1 ,..., d p ) . Then A = TMT ′ with M = diag ( log ( d 1 ) ,..., log ( d p )) . 5

  6. Idea of the Proposed Method • Leonard and Hsu (1992) used this log-transformation method to estimate Σ by approximating the likelihood using Volterra integral equation. – Their approximation based on on S being nonsingular ⇒ not applicable when p ≥ n . • We extend the likelihood approximation to the case of singular S . • Regularize the largest and smallest eigenvalues of Σ simultaneously . • An efficient iterative quadratic programming algorithm to estimate A (log Σ ). • Call the resulting estimate “Log-ME”, Logarithm-transformed Matrix Estimate. 6

  7. A Simple Example • Experiment: simulate x i ’s from N ( 0 , I ) , i = 1 ,..., n where n = 50. • For each p varying from 5 to 100, consider the the largest and smallest eigenvalues of the covariance matrix estimate. • For each p , repeat the experiment 100 times and compute the average of the largest eigenvalues and the average of the smallest eigenvalues for – The sample covariance matrix. – The Log-ME covariance matrix estimate 7

  8. ) 1 / : 7 ;06 ! <= 8.4 ! 9: & $ 3456-/71-1-+6-.8439-1-/7+,47- -*122+-3/+)4+,5126+/+-3)*13+ !'% ( !'$ # !'# ' !'" " ! & 1 / ! "! #! $! %! &!! ! "! #! $! %! &!! *+,-./+0.12 ()*+,-).,/0 The averages of the largest and smallest eigenvalues of covariance matrix estimates over the dimension p .The true eigenvalues are all equal to 1. 8

  9. The Transformed Log-Likelihood • In terms of the covariance matrix logarithm A , the negative log-likelihood function in (1) becomes L n ( A ) = tr ( A )+ tr [ exp ( − A ) S ] . (2) • The problem of estimating a positive definite matrix Σ now becomes a problem of estimating a real symmetric matrix A . • Because of the matrix exponential term exp ( − A ) S , estimating A by directly minimizing L n ( A ) is nontrivial. • Our approach: Approximate exp ( − A ) S using the Volterra integral equation (valid even for S singular case). 9

  10. The Volterra Integral Equation • The Volterra integral equation (Bellman, 1970, page 175) is � t exp ( At ) = exp ( A 0 t )+ 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( As ) ds . (3) • Repeatedly applying (3) leads to � t exp ( At ) = exp ( A 0 t )+ 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( A 0 s ) ds � t � s + 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( A 0 ( s − u ))( A − A 0 ) exp ( A 0 u ) duds 0 + cubic and higher order terms , (4) where A 0 = log ( Σ 0 ) and Σ 0 is an initial estimate of Σ . • The expression of exp ( − A ) can be obtained by letting t = 1 in (4) and replacing A , A 0 in (4) with − A , − A 0 . 10

  11. Approximation to the Log-Likelihood • The term tr [ exp ( − A ) S ] can be written as � 1 tr [ exp ( − A ) S ] = tr ( S Σ − 1 0 tr [( A − A 0 ) Σ − s 0 S Σ s − 1 0 ) − ] ds 0 � 1 � s 0 tr [( A − A 0 ) Σ u − s ( A − A 0 ) Σ − u 0 S Σ s − 1 + ] duds 0 0 0 + cubic and higher order terms . (5) • By leaving out the higher order terms in (5), we approximate L n ( A ) by using l n ( A ) : � � 1 � l n ( A ) = tr ( S Σ − 1 0 tr [( A − A 0 ) Σ − s 0 S Σ s − 1 0 ) − ] ds − tr ( A ) 0 � 1 � s 0 tr [( A − A 0 ) Σ u − s ( A − A 0 ) Σ − u 0 S Σ s − 1 + ] duds . (6) 0 0 0 11

  12. Explicit Form of l n ( A ) • The integrations in l n ( A ) can be analytically solved through the spectral decomposition of Σ 0 = T 0 D 0 T ′ 0 . • Some Notation: – Here D 0 = diag ( d ( 0 ) 1 ,..., d ( 0 ) p ) with d ( 0 ) ’s as the eigenvalues of Σ 0 . i – T 0 = ( t ( 0 ) 1 ,..., t ( 0 ) p ) with t ( 0 ) as the corresponding eigenvector for d ( 0 ) . i i – Let B = T ′ 0 ( A − A 0 ) T 0 = ( b ij ) p × p , and ˜ S = T ′ 0 ST 0 = ( ˜ s ij ) p × p . • The l n ( A ) can be written as a function of b ij : p p p 1 ∑ ii + ∑ i = 1 ∑ ∑ ∑ ∑ 2 ξ ii b 2 ξ ij b 2 τ ij b ii b ij + η kij b ik b k j l n ( A ) = ij + 2 i = 1 i < j j � = i k = 1 i < j , i � = k , j � = k � p � ∑ β ii b ii + 2 ∑ β ij b ij − , (7) i = 1 i < j up to some constant. Getting B ↔ Getting A . 12

  13. Some Details • For the linear term, s ij ( d ( 0 ) − d ( 0 ) j ) / ( d ( 0 ) d ( 0 ) j ) ˜ s ii ˜ i i β ii = − 1 , β ij = . d ( 0 ) ( log d ( 0 ) − log d ( 0 ) j ) i i • For the quadratic term, s ii ˜ ξ ii = , d ( 0 ) i s ii / d ( 0 ) s j j / d ( 0 ) ( d ( 0 ) / d ( 0 ) s ii / d ( 0 ) +( d ( 0 ) j / d ( 0 ) s j j / d ( 0 ) − ˜ − 1 ) ˜ − 1 ) ˜ ˜ ξ ij = i j i j i i j + , log d ( 0 ) − log d ( 0 ) ( log d ( 0 ) − log d ( 0 ) ) 2 j i j i   1 / d ( 0 ) − 1 / d ( 0 ) 1 / d ( 0 ) j i τ ij = i  ˜ ) 2 + s ij ,  ( log d ( 0 ) − log d ( 0 ) log d ( 0 ) − log d ( 0 ) j i j i   1 / d ( 0 ) − 1 / d ( 0 ) 1 / d ( 0 ) − 1 / d ( 0 ) 2 / d ( 0 ) − 1 / d ( 0 ) − 1 / d ( 0 ) i j j i i j η kij = k  ˜ + + s ij .  log ( d ( 0 ) k / d ( 0 ) j ) log ( d ( 0 ) j / d ( 0 ) log ( d ( 0 ) k / d ( 0 ) ) log ( d ( 0 ) / d ( 0 ) log ( d ( 0 ) k / d ( 0 ) ) log ( d ( 0 ) k / d ( 0 ) ) j ) j ) i i i i 13

  14. The Log-ME Method • Propose a regularized method to estimate Σ by using the approximate log-likelihood function l n ( A ) . F = tr ( A 2 ) = ∑ p • Consider the penalty function � A � 2 i = 1 ( log ( d i )) 2 , where d i is the eigenvalue of the covariance matrix Σ . – If d i goes to zero or diverges to infinity, the value of log ( d i ) goes to infinity in both cases. – Such a penalty function can simultaneously regularize the largest and smallest eigenvalues of the covariance matrix estimate. • Estimate Σ , or equivalently A , by minimizing l n , λ ( B ) ≡ l n , λ ( A ) = l n ( A )+ λ tr ( A 2 ) , (8) where λ is a tuning parameter. 14

  15. An Iterative Algorithm • The l n , λ ( B ) depends on an initial estimate Σ 0 , or equivalently, A 0 . • Propose to iteratively use l n , λ ( B ) to obtain its minimizer ˆ B : Algorithm: Step 1 : Set an initial covariance matrix estimate Σ 0 , a positive definite matrix. Step 2 : Use the spectral decomposition Σ 0 = T 0 D 0 T ′ 0 , and set A 0 = log ( Σ 0 ) . ′ B by minimizing l n , λ in (10). Then obtain ˆ Step 3 : Compute ˆ A = T 0 ˆ 0 + A 0 , BT and update the estimate of Σ by ′ Σ = exp ( ˆ ˆ A ) = exp ( T 0 ˆ 0 + A 0 ) . BT Σ − Σ 0 � 2 Step 4 : Check if � ˆ F is less than a pre-specified positive tolerance value. Otherwise, set Σ 0 = ˆ Σ and go back to Step 2 . • Set an initial Σ 0 in Step 1 to be S + ε I . 15

  16. Simulation Study • Six different covariance models of Σ = ( σ ij ) p × p are used for comparison, – Model 1: Homogeneous model with Σ = I . – Model 2: MA(1) model with σ ii = 1 , σ i , i − 1 = σ i − 1 , i = 0 . 45. – Model 3: Circle model with σ ii = 1 , σ i , i − 1 = σ i − 1 , i = 0 . 3, σ 1 , p = σ p , 1 = 0 . 3. • Compare four estimation methods: the banding estimate (Bickel and Levina, 2008), the LW estimate (Ledoit and Wolf, 2006), the Glasso estimate (Yuan and Lin, 2007), and the CN estimate (Won et al., 2009). • Consider two loss functions to evaluate the performance of each method, − 1 | + tr ( ˆ − 1 Σ ) − ( − log | Σ − 1 | + p ) , Σ Σ KL = − log | ˆ ∆ 1 = | ˆ d 1 / ˆ d p − d 1 / d p | , where d 1 and d p are the largest and smallest eigenvalue of Σ . Denote ˆ d 1 and ˆ d p to be their estimates. 16

Recommend


More recommend