Beyond the graphical Lasso: Structure learning via inverse - PowerPoint PPT Presentation

Beyond the graphical Lasso: Structure learning via inverse covariance estimation Po-Ling Loh UC Berkeley Department of Statistics ICML Workshop on Covariance Selection and Graphical Model Structure Learning June 26, 2014 Joint work with Martin Wainwright (UC Berkeley) & Peter B¨ uhlmann (ETH Z¨ urich) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 1 / 40

Outline Introduction 1 Generalized inverse covariances 2 Linear structural equation models 3 Corrupted data 4 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 2 / 40

Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p X 3 X 1 X 2 X p P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40

Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p X 3 X 1 X 2 X p Markov property: ( s , t ) / ∈ E = ⇒ X s ⊥ ⊥ X t | X \{ s , t } P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40

Undirected graphical models Undirected graph G = ( V , E ) Joint distribution of ( X 1 , . . . , X p ), where | V | = p A X 3 X 1 B X 2 X p S More generally, X A ⊥ ⊥ X B | X S when S ⊆ V separates A from B P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 4 / 40

Directed graphical models Directed acyclic graph G = ( V , E ) X 1 X 2 X 3 X p Markov property: X j ⊥ ⊥ X Nondesc( j ) | X Pa( j ) , ∀ j P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 5 / 40

Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40

Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40

Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d Sources of corruption: non-i.i.d. observations, contamination by noise/missing data P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40

Structure learning � � n ( X ( i ) 1 , X ( i ) 2 , . . . , X ( i ) Goal: Edge recovery from n samples: p ) i =1 High-dimensional setting: p ≫ n , assume deg( G ) ≤ d Sources of corruption: non-i.i.d. observations, contamination by noise/missing data Note: Structure learning generally harder for directed graphs (topological order unknown) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 6 / 40

Graphical Lasso When ( X 1 , . . . , X p ) ∼ N (0 , Σ), well-known fact: (Σ − 1 ) st = 0 ⇐ ⇒ ( s , t ) / ∈ E P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 7 / 40

Graphical Lasso When ( X 1 , . . . , X p ) ∼ N (0 , Σ), well-known fact: (Σ − 1 ) st = 0 ⇐ ⇒ ( s , t ) / ∈ E Establishes statistical consistency of graphical Lasso (Yuan & Lin ’07):     � �  trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st |  Θ � 0 s � = t P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 7 / 40

Some observations P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40

Some observations Only sample-based quantity is � Σ:     � �  trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st |  Θ � 0 s � = t P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40

Some observations Only sample-based quantity is � Σ:     � �  trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st |  Θ � 0 s � = t Although graphical Lasso is penalized Gaussian MLE , can always be used to estimate � Θ from � Σ: � � (Σ ∗ ) − 1 = arg min trace(Σ ∗ Θ) − log det(Θ) Θ P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40

Some observations Only sample-based quantity is � Σ:     � �  trace( � Θ ∈ arg min ΣΘ) − log det(Θ) + λ | Θ st |  Θ � 0 s � = t Although graphical Lasso is penalized Gaussian MLE , can always be used to estimate � Θ from � Σ: � � (Σ ∗ ) − 1 = arg min trace(Σ ∗ Θ) − log det(Θ) Θ We extend graphical Lasso to discrete-valued data (undirected case) and linear structural equation models (directed case) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 8 / 40

Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� log p � � Θ − Θ ∗ � max � + λ n P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40

Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� log p � � Θ − Θ ∗ � max � + λ n Deviation condition holds w.h.p. for various ensembles (e.g., sub-Gaussian) P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40

Theory for graphical Lasso If � � log p log p � � Σ − Σ ∗ � max � and λ � , n n then �� log p � � Θ − Θ ∗ � max � + λ n Deviation condition holds w.h.p. for various ensembles (e.g., sub-Gaussian) � Thresholding � log p Θ at level yields correct support n P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 9 / 40

Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40

Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable Then ( i , j ) / ∈ E iff Θ ij = 0 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40

Non-Gaussian distributions (Liu et al. ’09, ’12): ( X 1 , . . . , X p ) follows nonparanormal distribution if ( f 1 ( X 1 ) , . . . , f p ( X p )) ∼ N (0 , Σ), and f j ’s monotone and differentiable Then ( i , j ) / ∈ E iff Θ ij = 0 In general non-Gaussian setting, relationship between entries of Θ = Σ − 1 and edges of G unknown P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 11 / 40

Discrete graphical models Assume X i ’s take values in a discrete set: { 0 , 1 , . . . , m − 1 } P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 12 / 40

Discrete graphical models Assume X i ’s take values in a discrete set: { 0 , 1 , . . . , m − 1 } Our results: Establish relationship between augmented inverse covariance matrices and edge structure New algorithms for structure learning in discrete graphs P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 12 / 40

An illustrative example Binary Ising model:   � �  , P θ ( x 1 , . . . , x p ) ∝ exp θ s x s + θ st x s x t s ∈ V ( s , t ) ∈ E P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 13 / 40

An illustrative example Binary Ising model:   � �  , P θ ( x 1 , . . . , x p ) ∝ exp θ s x s + θ st x s x t s ∈ V ( s , t ) ∈ E θ ∈ R p + ( p 2 ) , ( x 1 , . . . , x p ) ∈ { 0 , 1 } p P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 13 / 40

An illustrative example Ising models with θ s = 0 . 1 , θ st = 2 X 1 X 4 X 2 X 3 X 1 X 4 X 2 X 3 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40

An illustrative example Ising models with θ s = 0 . 1 , θ st = 2   X 1 X 4 9 . 80 − 3 . 59 0 0   − 3 . 59 34 . 30 − 4 . 77 0   Θ chain =   0 − 4 . 77 34 . 30 − 3 . 59 0 0 − 3 . 59 9 . 80 X 2 X 3   X 1 X 4 51 . 37 − 5 . 37 − 0 . 17 − 5 . 37   − 5 . 37 51 . 37 − 5 . 37 − 0 . 17   Θ loop =   − 0 . 17 − 5 . 37 51 . 37 − 5 . 37 − 5 . 37 − 0 . 17 − 5 . 37 51 . 37 X 2 X 3 P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40

An illustrative example Ising models with θ s = 0 . 1 , θ st = 2   X 1 X 4 9 . 80 − 3 . 59 0 0   − 3 . 59 34 . 30 − 4 . 77 0   Θ chain =   0 − 4 . 77 34 . 30 − 3 . 59 0 0 − 3 . 59 9 . 80 X 2 X 3   X 1 X 4 51 . 37 − 5 . 37 − 0 . 17 − 5 . 37   − 5 . 37 51 . 37 − 5 . 37 − 0 . 17   Θ loop =   − 0 . 17 − 5 . 37 51 . 37 − 5 . 37 − 5 . 37 − 0 . 17 − 5 . 37 51 . 37 X 2 X 3 Θ is graph-structured for chain, but not loop P. Loh (UC Berkeley) Beyond the graphical Lasso June 26, 2014 14 / 40

Beyond the graphical Lasso: Structure learning via inverse - PowerPoint PPT Presentation

Beyond the graphical Lasso: Structure learning via inverse covariance estimation Po-Ling Loh UC Berkeley Department of Statistics ICML Workshop on Covariance Selection and Graphical Model Structure Learning June 26, 2014 Joint work with

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

1. Algorithms for Inverse Reinforcement Learning 2. Apprenticeship learning via Inverse

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Learning in Graphical Models Andrea Passerini passerini@disi.unitn.it Machine Learning Learning

Kelly OBrien, Sandra Gardner, Ahmed Bayoumi, Sergio Rueda, Curtis Cooper, Trevor Hart, Patty

Polynomial systems of graphical models Elizabeth Gross University of Hawaii at M anoa

A Unified Structural Equation Modeling Approach for the Decomposition of Rank-Dependent

Nonrecursive Model for Peer-Influences Data Variables in the Model The R Statistical Computing

Type-Based Structural Analysis for Modular Systems of Equations Linkping University, 19 June

using Stan mc-stan.org About the speaker Robert Grant is senior lecturer in health & social

E2E Provisioning Workshop Dr. Jan Gruntord CEO CESNET, Czech Republic, Member of GN3 Executive

Smart Regions Smart Solutions A place-based approach to decarbonisation: can Horizon Europe

Beyond the graphical Lasso: Structure learning via inverse - PowerPoint PPT Presentation

Beyond the graphical Lasso: Structure learning via inverse covariance estimation Po-Ling Loh UC Berkeley Department of Statistics ICML Workshop on Covariance Selection and Graphical Model Structure Learning June 26, 2014 Joint work with

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

1. Algorithms for Inverse Reinforcement Learning 2. Apprenticeship learning via Inverse

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Learning in Graphical Models Andrea Passerini passerini@disi.unitn.it Machine Learning Learning

Kelly OBrien, Sandra Gardner, Ahmed Bayoumi, Sergio Rueda, Curtis Cooper, Trevor Hart, Patty

Polynomial systems of graphical models Elizabeth Gross University of Hawaii at M anoa

A Unified Structural Equation Modeling Approach for the Decomposition of Rank-Dependent

Nonrecursive Model for Peer-Influences Data Variables in the Model The R Statistical Computing

Type-Based Structural Analysis for Modular Systems of Equations Linkping University, 19 June

using Stan mc-stan.org About the speaker Robert Grant is senior lecturer in health &amp; social

E2E Provisioning Workshop Dr. Jan Gruntord CEO CESNET, Czech Republic, Member of GN3 Executive

Smart Regions Smart Solutions A place-based approach to decarbonisation: can Horizon Europe

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

using Stan mc-stan.org About the speaker Robert Grant is senior lecturer in health & social