Methods for Robust High Dimensional Graphical Model Selection Bala - PowerPoint PPT Presentation

Methods for Robust High Dimensional Graphical Model Selection Bala Rajaratnam Department of Statistics Stanford University Fields Institute Joint work with S. Oh (UC Berkeley) and K. Khare (U. Florida)

Motivation • Availability of high-throughput data from various applications • Need for methodology/tools for analyzing high-dimensional data • Examples: ◮ Biology: gene expression data ◮ Environmental science: climate data on spatial grid ◮ Finance: returns on thousands of stocks ◮ Retail: consumer behavior • Common goals: ◮ Understand complex relationships & multivariate dependencies ◮ Formulate correct models & develop inferential procedures 1 / 44

Modeling relationships • Correlation: basic measure of linear pairwise relationships • Covariance matrix Σ : collection of relationships • Estimates of Σ required in procedures such as PCA, CCA, MANOVA, etc. • Estimating (functions of) Σ and Ω = Σ − 1 are of statistical interest • Estimating Σ is difficult in high dimensions 2 / 44

Sparse estimates • Matrix Σ or Ω of size p -by- p has O ( p 2 ) elements • Estimating O ( p 2 ) parameters with classical estimators is not viable, especially when n ≪ p • Reliably estimate small number of parameters in Σ • Model selection: zero/non-zero structure recovery • Gives rise to sparse estimates of Σ or Ω • Sparsity pattern can be represented by graphs/networks 3 / 44

Gaussian Graphical Models (GGM) • Assume Y = ( Y 1 , . . . , Y p ) ′ has distribution N p ( 0, Σ ) • Denote V = { 1, 2, . . . , p } • Covariance matrix cov ( Y ) = Σ encodes marginal dependencies Y i ⊥ ⊥ Y j ⇐ ⇒ cov ( Y i , Y j ) = [ Σ ] ij = 0 • Inverse covariance matrix Ω = Σ − 1 encodes conditional dependencies given the rest � � Y i ⊥ ⊥ Y j | Y V \{ i , j } ⇐ ⇒ [ Ω ] ij = 0 � �� matrix element conditional independence • Also known as Markov Random Fields (MRF) 4 / 44

Gaussian Graphical Models (GGM) • Graph summarizes relationships with nodes V = { 1, . . . , p } and set E of edges [ Ω ] ij = 0 ⇐ ⇒ i � ∼ j �� matrix element network/graph • Build a graph from sparse Ω A B C A � � 1 0.2 0.3 A Ω = 0.2 2 0 B 0.3 0 1.2 C B C 5 / 44

GGM estimation algorithms • Regularized Gaussian likelihood methods ◮ Block coordinate descent (COVSEL) [Banerjee et al., 2008] ◮ Graphical lasso (GLASSO) [Friedman, Hastie, & Tibshirani, 2008] ◮ Large-scale GLASSO [Mazumder & Hastie, 2012] ◮ QUIC [Hsieh et al., 2011] ◮ G-ISTA [Guillot, Rajaratnam et al., 2012] ◮ Graphical Dual Proximal Gradient Methods [Dalal & Rajaratnam, 2013] ◮ Others • Bayesian methods ◮ Dawid & Lauritzen, 1993, Annals of Statistics ◮ Letac & Massam, 2007, Annals of Statistics ◮ Rajaratnam, Massam & Carvalho, 2008, Annals of Statistics ◮ Khare & Rajaratnam, 2011, Annals of Statistics ◮ Others • Testing-based methods ◮ Hero & Rajaratnam, 2011, JASA ◮ Hero & Rajaratnam, 2012, IEEE, Information Theory 6 / 44

Regularized Gaussian likelihood graphical model selection • All ℓ 1 -regularized Gaussian-likelihood methods solve ˆ Ω = arg max Ω ≻ 0 { log det ( Ω ) − tr ( Ω S ) − λ � Ω � 1 } • S : sample covariance matrix • Graphical Lasso [Friedman, Hastie, & Tibshirani, 2008] • Ω can be computed by solving optimization problem • Adding ℓ 1 -regularization term λ � Ω � 1 introduces sparsity • Penalty parameter λ controls level of sparsity • Dependency on Gaussianity ◮ Parametric model ◮ Sensitivity to outliers ◮ Log-concave function 7 / 44

Regularized pseudo-likelihood graphical model selection • Two main main approaches: 1. ℓ 1 -regularized likelihood methods 2. ℓ 1 -regularized regression-based/pseudo-likelihood methods • Series of linear regressions form a pseudo-likelihood function • Objective function is the ℓ 1 -penalized pseudo-likelihood • Pseudo-likelihood assumes less about distribution of the data • Applicable to wider range of data 8 / 44

Partial covariance and correlation • Matrix Y ∈ R n × p denotes iid observations of random variable with mean zero, covariance Σ = Ω − 1 . • Goal : estimate partial correlation graph • Partial correlation in terms of Ω = [ ω ij ] : ω ij ρ ij = − √ ω ii ω jj • Called “partial” because correlation of residuals r k , where � � β ( k ) = arg r k = Y k − Y ˆ β ( k ) , where ˆ � Y k − Y β � 2 min . 2 β : β k = 0 Now, partial correlation is ρ ij = cor ( r i , r j ) � ω jj • It can be shown that ρ ij ω ii � �� β ij • Zero/non-zero pattern of [ ρ ij ] is identical to that of Ω • Partial correlation graph is given by sparsity pattern of Ω 9 / 44

Regularized regression-based graphical model selection • Neighborhood selection (NS) [Meinshausen and B¨ uhlmann, 2006] � � ω ( i ) = arg min � Y i − Y β � 2 ˆ 2 + λ � β � 1 β : β i = 0 • Neighborhood of i is defined as ne ( i ) = { k : ˆ ω ( i ) � � = 0 } k • MB does not take into account symmetry of Ω ne ( i ) ne ( j ) j ∈ � ⇒ i ∈ � � = • Current state-of-the-art methods address this issue ◮ SPACE [Peng et al., 2009] ◮ SYMLASSO [Friedman, Hastie, & Tibshirani, 2010] ◮ SPLICE [Rocha et al., 2008] 10 / 44

Sparse PArtial Correlation Estimation [Peng et al., 2009] SPACE objective function : ( w i = 1 or w i = ω ii ) � ω jj p p Q spc ( Ω ) =: − 1 n log ω ii + 1 � � � � ρ ij Y j � 2 | ρ ij | w i � Y i − 2 + λ 2 2 ω ii i = 1 i = 1 j � = i 1 � i < j � p � �� β ij 1. Update [ ρ ij ] coordinate-wise (using current estimates [ ˆ ω ii ] ): �   p   1 � � ω jj ˆ � [ ρ ij ] ← min ρ ij Y j � 2 | ρ ij | w i � Y i − 2 + λ 2 ˆ ω ii [ ρ ij ]   i = 1 j � = i 1 � i < j � p ρ ij ] and [ ˆ 2. Update [ ω ii ] (using current estimates [ ˆ ω ii ] ):   � − 1 ˆ � ω jj  � Y i − ρ ij Y j � 2  ω ii ← ˆ 2 ω ii ˆ j � = i 11 / 44

Non-converging example: p = 3 case 1e+01 1e+03 abs. difference between successive updates 1e−01 1e+00 1e−03 1e−03 Convergence Threshold Convergence Threshold 0 1000 2000 3000 4000 0 1000 2000 3000 4000 iterations iterations estimator pvar.1 pvar.2 pvar.3 pcor.1 pcor.2 pcor.3 estimator pvar.1 pvar.2 pvar.3 pcor.1 pcor.2 pcor.3 (a) SPACE ( w i = ω ii ) (b) SPACE ( w i = 1) Figure: Y ( i ) ∼ N 3 ( 0, Ω − 1 ) , (left) n = 4, (right) n = 100 12 / 44

Non-convergence of SPACE • Investigate the nature and extent of convergence issues: 1. Are such examples pathological? How widespread are they? 2. When do they occur ? • Consider a sparse 100 × 100 matrix Ω with edge density 4% and condition number of 100. • Generate 100 multivariate Gaussian datasets (with n = 100), µ = 0 and Σ = Ω − 1 . • Record the number of times (out of 100) for which SPACE1 (uniform weights) and SPACE2 (partial variance weights) do not converge within 1500 iterations. • Original implementation of SPACE by [Peng et al., 2009] claims only 3 iterations are sufficient. 13 / 44

Non-convergence of SPACE SPACE1 ( w i = 1) SPACE2 ( w i = ω ii ) λ ∗ λ ∗ NZ NC NZ NC 0.026 60.9% 92 0.085 79.8% 100 0.099 19.7% 100 0.160 28.3% 0 0.163 7.6% 100 0.220 10.7% 0 0.228 2.9% 100 0.280 4.8% 0 0.614 0.4% 0 0.730 0.5% 97 Table: Number of simulations (out of 100) that do not converge within 1500 iterations (NC) for select values of penalty parameter ( λ ∗ = λ/ n ). Average percentage of non-zeros (NZ) in ˆ Ω are also shown. • SPACE exhibits extensive non-convergence behavior • Problem exacerbated when condition number is high • Typical of high dimensional sample starved settings 14 / 44

Symmetric Lasso and SPLICE SYMLASSO [Friedman, Hastie, & Tibshirani, 2010]:   p Ω ) = 1  n log α ii + 1 � � � Q sym ( α , ˘ ω ij α ii Y j � 2  + λ � Y i + | ω ij | , 2 α ii i = 1 j � = i 1 � i < j � p where α ii = 1 /ω ii . SPLICE [Rocha et al., 2008]: p p Q spl ( B , D ) = n � ii ) + 1 � 1 � � β ij Y j � 2 + λ log ( d 2 � Y i − | β ij | , d 2 2 2 ii i = 1 i = 1 j � = i i < j where d 2 ii = ω ii . Also, alternating (off-diagonal vs diagonal) iterative algorithms No convergence guarantees 15 / 44

Regularized regression-based graphical model selection • Advantage : Regression-based methods perform better model selection than likelihood-based methods in finite sample • Advantage : Regression-based methods are less restrictive than Gaussian likelihood-based methods • Disadvantage : ˆ Ω may not be positive definite (can be fixed) • Disadvantage : Solution may not be computable • Cause : Iterative algorithms SPACE, SYMLASSO and SPLICE are not guaranteed to converge 16 / 44

Regression-based methods: summary METHOD SYMLASSO SPLICE SPACE NS Property Symmetry + + + Convergence guarantee N/A Asymptotic consistency ( n , p → ∞ ) + + How can we obtain all of the good properties simultaneously? 17 / 44

Methods for Robust High Dimensional Graphical Model Selection Bala - PowerPoint PPT Presentation

Methods for Robust High Dimensional Graphical Model Selection Bala Rajaratnam Department of Statistics Stanford University Fields Institute Joint work with S. Oh (UC Berkeley) and K. Khare (U. Florida) Motivation Availability of

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

High-Dimensional Graphical Model Selection Anima Anandkumar U.C. Irvine Joint work with Vincent

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Lecture 1: Introduction Statistical and Computational Methods for Learning through Graphical

ANTIBACTERIAL ACTIVITY Antibacterial activity: Standard drugs 246 Graphical Presentation of

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

JUST THE MATHS SLIDES NUMBER 5.10 GEOMETRY 10 (Graphical solutions) by A.J.Hobson

Evolution of Chemotherapy for H Hormone Refractory Prostate R f t P t t Cancer Ian F

Searching for bioactive molecules in prostate cancer from Mayan traditional medicinal plants. Juan

potent enzyme inducer enzalutamide in a treatment-experienced HIV positive male with prostate

Stochastic Hybrid Systems: Modelling Prostate Cancer and Psoriasis Fedor Shmarov School of

Surveillance, Epidemiology, and End Results (SEER) Program SEER Progress Report to the BSA

Why mammograms are more important than instagrams: I have nothing to disclose regarding this

The Aging Skin Normal maturation and sun exposure Too much- Tumors, lentigenes, seborrheic

Neuroendocrine Prostate Cancer Spectrum Diagnosis and Treatment Eleni Efstathiou M.D., PhD. In

Methods for Robust High Dimensional Graphical Model Selection Bala - PowerPoint PPT Presentation

Methods for Robust High Dimensional Graphical Model Selection Bala Rajaratnam Department of Statistics Stanford University Fields Institute Joint work with S. Oh (UC Berkeley) and K. Khare (U. Florida) Motivation Availability of

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

High-Dimensional Graphical Model Selection Anima Anandkumar U.C. Irvine Joint work with Vincent

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Lecture 1: Introduction Statistical and Computational Methods for Learning through Graphical

ANTIBACTERIAL ACTIVITY Antibacterial activity: Standard drugs 246 Graphical Presentation of

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

JUST THE MATHS SLIDES NUMBER 5.10 GEOMETRY 10 (Graphical solutions) by A.J.Hobson

Evolution of Chemotherapy for H Hormone Refractory Prostate R f t P t t Cancer Ian F

Searching for bioactive molecules in prostate cancer from Mayan traditional medicinal plants. Juan

potent enzyme inducer enzalutamide in a treatment-experienced HIV positive male with prostate

Stochastic Hybrid Systems: Modelling Prostate Cancer and Psoriasis Fedor Shmarov School of

Surveillance, Epidemiology, and End Results (SEER) Program SEER Progress Report to the BSA

Why mammograms are more important than instagrams: I have nothing to disclose regarding this

The Aging Skin Normal maturation and sun exposure Too much- Tumors, lentigenes, seborrheic

Neuroendocrine Prostate Cancer Spectrum Diagnosis and Treatment Eleni Efstathiou M.D., PhD. In

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical