high dimensional covariance decomposition into sparse
play

High-Dimensional Covariance Decomposition into Sparse Markov and - PowerPoint PPT Presentation

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid Janzamin and Anima Anandkumar U.C. Irvine High-Dimensional Covariance Estimation n i.i.d. samples, p variables X := [ X 1 , . . . , X p ] T . Covariance


  1. High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid Janzamin and Anima Anandkumar U.C. Irvine

  2. High-Dimensional Covariance Estimation n i.i.d. samples, p variables X := [ X 1 , . . . , X p ] T . Covariance estimation: Σ ∗ := E [ XX T ] . High-dimensional regime: both n, p → ∞ and n ≪ p . Challenge: empirical (sample) covariance ill-posed when n ≪ p : n � Σ n := 1 x ( k ) x ( k ) T . � n k =1 Solution: Imposing Sparsity for Tractable High-dimensional Estimation

  3. Incorporating Sparsity in High Dimensions Sparse Covariance Sparse Inverse Covariance 1 − 1 Σ ∗ J ∗ Σ ∗ Σ ∗ M R

  4. Incorporating Sparsity in High Dimensions Sparse Covariance Σ ∗ Σ ∗ R Relationship with Statistical Properties (Gaussian) Sparse Covariance (Independence Model): marginal independence

  5. Incorporating Sparsity in High Dimensions Sparse Inverse Covariance 1 J ∗ − 1 Σ ∗ M Relationship with Statistical Properties (Gaussian) Sparse Inverse Covariance (Markov Model): conditional independence Local Markov Property: X i ⊥ X V \{ nbd( i ) ∪ i } | X nbd( i ) For Gaussian: J ij = 0 ⇔ ( i, j ) / ∈ E

  6. Incorporating Sparsity in High Dimensions Sparse Covariance Sparse Inverse Covariance 1 − 1 Σ ∗ J ∗ Σ ∗ Σ ∗ M R Relationship with Statistical Properties (Gaussian) Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence

  7. Incorporating Sparsity in High Dimensions Sparse Covariance Sparse Inverse Covariance 1 J ∗ − 1 Σ ∗ Σ ∗ Σ ∗ M R Relationship with Statistical Properties (Gaussian) Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence Guarantees under Sparsity Constraints in High Dimensions Consistent Estimation when n = Ω(log p ) ⇒ n ≪ p . Consistent: Sparsistent and Satisfying reasonable Norm Guarantees.

  8. Incorporating Sparsity in High Dimensions Sparse Covariance Sparse Inverse Covariance 1 − 1 Σ ∗ J ∗ Σ ∗ Σ ∗ M R Relationship with Statistical Properties (Gaussian) Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence Guarantees under Sparsity Constraints in High Dimensions Consistent Estimation when n = Ω(log p ) ⇒ n ≪ p . Going beyond Sparsity in High Dimensions?

  9. Going Beyond Sparse Models Motivation Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains

  10. Going Beyond Sparse Models Motivation Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains One Possibility (This Work): Proposing Sparse Markov Model by adding Sparse Residual Perturbation 1 Σ ∗ J ∗ − 1 Σ ∗ R M

  11. Going Beyond Sparse Models Motivation Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains One Possibility (This Work): Proposing Sparse Markov Model by adding Sparse Residual Perturbation 1 Σ ∗ J ∗ − 1 Σ ∗ R M Efficient Decomposition and Estimation in High Dimensions? Unique Decomposition? Good Sample Requirements?

  12. Summary of Results 1 Σ ∗ + Σ ∗ R = J ∗ − 1 . M

  13. Summary of Results 1 Σ ∗ + Σ ∗ R = J ∗ − 1 . M Contribution 1: Novel Model for Decomposition Decomposition into Markov and residual domains. Statistically meaningful model Unification of Sparse Covariance and Inverse Covariance Estimation.

  14. Summary of Results 1 Σ ∗ + Σ ∗ R = J ∗ − 1 . M Contribution 1: Novel Model for Decomposition Decomposition into Markov and residual domains. Statistically meaningful model Unification of Sparse Covariance and Inverse Covariance Estimation. Contribution 2: Methods and Guarantees Conditions for unique decomposition (exact statistics). Sparsistency and norm guarantees in both Markov and independence domains (sample analysis) Sample requirement: no. of samples n = Ω(log p ) for p variables. Efficient Method for Covariance Decomposition and Estimation in High-Dimension

  15. Related Works Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding. ◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.) Sparse Inverse Covariance Estimation: ◮ ℓ 1 Penalization (Meinshausen and B¨ uhlmann) (Ravikumar et. al) ◮ Non-Convex Methods (Anandkumar et. al) (Zhang)

  16. Related Works Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding. ◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.) Sparse Inverse Covariance Estimation: ◮ ℓ 1 Penalization (Meinshausen and B¨ uhlmann) (Ravikumar et. al) ◮ Non-Convex Methods (Anandkumar et. al) (Zhang) Beyond Sparse Models: Decomposition Issues Sparse + Low Rank (Chandrasekaran et. al) (Candes et. al) Decomposable Regularizers (Negahban et. al)

  17. Related Works Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding. ◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.) Sparse Inverse Covariance Estimation: ◮ ℓ 1 Penalization (Meinshausen and B¨ uhlmann) (Ravikumar et. al) ◮ Non-Convex Methods (Anandkumar et. al) (Zhang) Beyond Sparse Models: Decomposition Issues Sparse + Low Rank (Chandrasekaran et. al) (Candes et. al) Decomposable Regularizers (Negahban et. al) Multi-Resolution Markov+Independence Models (Choi et. al) Decomposition in inverse covariance domain Lack theoretical guarantees Our contribution: Guaranteed Decomposition and Estimation

  18. Outline Introduction 1 Algorithm 2 Guarantees 3 Experiments 4 Proof Techniques 5 Conclusion 6

  19. Some Intuitions and Ideas Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance

  20. Some Intuitions and Ideas Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance Sparse Covariance Estimation (Independence Model) Σ ∗ = Σ ∗ I . � Σ n : sample covariance using n samples p variables: p ≫ n .

  21. Some Intuitions and Ideas Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance Sparse Covariance Estimation (Independence Model) Σ ∗ = Σ ∗ I . � Σ n : sample covariance using n samples p variables: p ≫ n . Σ n (Bickel & Levina): Hard-thresholding the off-diagonal entries of � � log p threshold chosen as n Sparsistency (support recovery) and Norm Guarantees when n = Ω(log p ) ⇒ n ≪ p .

  22. Recap of Inverse Covariance (Markov) Estimation Σ ∗ = J ∗ − 1 +Σ ∗ 1 M R � Σ n : sample covariance using n i.i.d. samples

  23. Recap of Inverse Covariance (Markov) Estimation Σ ∗ = J ∗ − 1 +Σ ∗ 1 M R � Σ n : sample covariance using n i.i.d. samples ℓ 1 -MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08) � � � Σ n , J M � − log det J M + γ � J M � 1 , off J M := argmin J M ≻ 0 where � � J M � 1 , off := | ( J M ) ij | . i � = j

  24. Recap of Inverse Covariance (Markov) Estimation Σ ∗ = J ∗ − 1 +Σ ∗ 1 M R � Σ n : sample covariance using n i.i.d. samples ℓ 1 -MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08) � � � Σ n , J M � − log det J M + γ � J M � 1 , off J M := argmin J M ≻ 0 where � � J M � 1 , off := | ( J M ) ij | . i � = j Max-entropy Formulation (Lagrangian Dual) � Σ M := argmax log det Σ M − λ � Σ R � 1 , off Σ M ≻ 0 , Σ R � � �� Σ n � � � Σ n − Σ M � ∞ , off ≤ γ, � � s . t . Σ M d = , Σ R d = 0 . d

  25. Recap of Inverse Covariance (Markov) Estimation Σ ∗ = J ∗ − 1 +Σ ∗ 1 M R � Σ n : sample covariance using n i.i.d. samples ℓ 1 -MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08) � � � Σ n , J M � − log det J M + γ � J M � 1 , off J M := argmin J M ≻ 0 where � � J M � 1 , off := | ( J M ) ij | . i � = j Max-entropy Formulation (Lagrangian Dual) � Σ M := argmax log det Σ M − λ � Σ R � 1 , off Σ M ≻ 0 , Σ R � � �� Σ n � � � Σ n − Σ M � ∞ , off ≤ γ, � � s . t . Σ M d = , Σ R d = 0 . d Consistent Estimation Under Certain Conditions, n = Ω(log p )

  26. Extension to Markov+Independence Models? 1 Σ ∗ + Σ ∗ − 1 . R = J ∗ M Sparse Covariance Estimation Hard-thresholding the off-diagonal entries of � Σ n . Sparse Inverse Covariance Estimation Add ℓ 1 penalty to maximum likelihood program (involving inverse covariance matrix estimation)

  27. Extension to Markov+Independence Models? 1 Σ ∗ + Σ ∗ − 1 . R = J ∗ M Sparse Covariance Estimation Hard-thresholding the off-diagonal entries of � Σ n . Sparse Inverse Covariance Estimation Add ℓ 1 penalty to maximum likelihood program (involving inverse covariance matrix estimation) Is it possible to unify above methods and guarantees?

Recommend


More recommend