Beyond the graphical Lasso: Structure learning via inverse covariance estimation Po-Ling Loh UC Berkeley Department of Statistics ICML Workshop on Covariance Selection and Graphical Model Structure Learning June 26, 2014 Joint work with Martin Wainwright (UC Berkeley) & Peter B¨ uhlmann (ETH Z¨ urich) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 1 / 40
Outline Introduction 1 Generalized inverse covariances 2 Linear structural equation models 3 Corrupted data 4 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 2 / 40
Outline Introduction 1 Generalized inverse covariances 2 Linear structural equation models 3 Corrupted data 4 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 3 / 40
Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p X 3 X 1 X 2 X p P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40
Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p X 3 X 1 X 2 X p Markov property: ( s , t ) / ∈ E = ⇒ X s ⊥ ⊥ X t | X \{ s , t } P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40
Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p A X 3 X 1 B X 2 X p S More generally, X A ⊥ ⊥ X B | X S when S ⊆ V separates A from B P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40
Directed graphical models Directed acyclic graph G = ( V , E ) X 1 X 2 X 3 X p Markov property: X j ⊥ ⊥ X Nondesc( j ) | X Pa( j ) , ∀ j P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 5 / 40
Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40
Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40
Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d Sources of corruption: non-i.i.d. observations, contamination by noise/missing data P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40
Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d Sources of corruption: non-i.i.d. observations, contamination by noise/missing data Note: Structure learning generally harder for directed graphs (topological order unknown) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40
Graphical Lasso When ( X 1 , . . . , X p ) ∼ N (0 , Σ), well-known fact: (Σ − 1 ) st = 0 ⇐ ⇒ ( s , t ) / ∈ E P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 7 / 40
Graphical Lasso When ( X 1 , . . . , X p ) ∼ N (0 , Σ), well-known fact: (Σ − 1 ) st = 0 ⇐ ⇒ ( s , t ) / ∈ E Establishes statistical consistency of graphical Lasso (Yuan & Lin ’07): � � trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st | Θ � 0 s � = t P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 7 / 40
Some observations P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40
Some observations Only sample-based quantity is � Σ: � � trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st | Θ � 0 s � = t P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40
Some observations Only sample-based quantity is � Σ: � � trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st | Θ � 0 s � = t Although graphical Lasso is penalized Gaussian MLE , can always be used to estimate � Θ from � Σ: � � (Σ ∗ ) − 1 = arg min trace(Σ ∗ Θ) − log det(Θ) Θ P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40
Some observations Only sample-based quantity is � Σ: � � trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st | Θ � 0 s � = t Although graphical Lasso is penalized Gaussian MLE , can always be used to estimate � Θ from � Σ: � � (Σ ∗ ) − 1 = arg min trace(Σ ∗ Θ) − log det(Θ) Θ We extend graphical Lasso to discrete-valued data (undirected case) and linear structural equation models (directed case) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40
Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� � log p � � Θ − Θ ∗ � max � + λ n P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40
Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� � log p � � Θ − Θ ∗ � max � + λ n Deviation condition holds w.h.p. for various ensembles (e.g., sub-Gaussian) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40
Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� � log p � � Θ − Θ ∗ � max � + λ n Deviation condition holds w.h.p. for various ensembles (e.g., sub-Gaussian) � Thresholding � log p Θ at level yields correct support n P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40
Outline Introduction 1 Generalized inverse covariances 2 Linear structural equation models 3 Corrupted data 4 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 10 / 40
Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40
Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable Then ( i , j ) / ∈ E iff Θ ij = 0 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40
Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable Then ( i , j ) / ∈ E iff Θ ij = 0 In general non-Gaussian setting, relationship between entries of Θ = Σ − 1 and edges of G unknown P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40
Discrete graphical models Assume X i ’s take values in a discrete set: { 0 , 1 , . . . , m − 1 } P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 12 / 40
Discrete graphical models Assume X i ’s take values in a discrete set: { 0 , 1 , . . . , m − 1 } Our results: Establish relationship between augmented inverse covariance matrices and edge structure New algorithms for structure learning in discrete graphs P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 12 / 40
An illustrative example Binary Ising model: � � , P θ ( x 1 , . . . , x p ) ∝ exp θ s x s + θ st x s x t s ∈ V ( s , t ) ∈ E P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 13 / 40
An illustrative example Binary Ising model: � � , P θ ( x 1 , . . . , x p ) ∝ exp θ s x s + θ st x s x t s ∈ V ( s , t ) ∈ E θ ∈ R p + ( p 2 ) , ( x 1 , . . . , x p ) ∈ { 0 , 1 } p P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 13 / 40
An illustrative example Ising models with θ s = 0 . 1 , θ st = 2 X 1 X 4 X 2 X 3 X 1 X 4 X 2 X 3 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40
An illustrative example Ising models with θ s = 0 . 1 , θ st = 2 X 1 X 4 9 . 80 − 3 . 59 0 0 − 3 . 59 34 . 30 − 4 . 77 0 Θ chain = 0 − 4 . 77 34 . 30 − 3 . 59 0 0 − 3 . 59 9 . 80 X 2 X 3 X 1 X 4 51 . 37 − 5 . 37 − 0 . 17 − 5 . 37 − 5 . 37 51 . 37 − 5 . 37 − 0 . 17 Θ loop = − 0 . 17 − 5 . 37 51 . 37 − 5 . 37 − 5 . 37 − 0 . 17 − 5 . 37 51 . 37 X 2 X 3 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40
An illustrative example Ising models with θ s = 0 . 1 , θ st = 2 X 1 X 4 9 . 80 − 3 . 59 0 0 − 3 . 59 34 . 30 − 4 . 77 0 Θ chain = 0 − 4 . 77 34 . 30 − 3 . 59 0 0 − 3 . 59 9 . 80 X 2 X 3 X 1 X 4 51 . 37 − 5 . 37 − 0 . 17 − 5 . 37 − 5 . 37 51 . 37 − 5 . 37 − 0 . 17 Θ loop = − 0 . 17 − 5 . 37 51 . 37 − 5 . 37 − 5 . 37 − 0 . 17 − 5 . 37 51 . 37 X 2 X 3 Θ is graph-structured for chain, but not loop P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40
Recommend
More recommend