High-Dimensional Graphical Model Selection Anima Anandkumar U.C. Irvine Joint work with Vincent Tan (U. Wisc.) and Alan Willsky (MIT).
Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Arsenal Mets Manchester United Phillies
Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United Phillies � . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G
Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United Phillies � . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Tree-Structured Graphical Models 4 1 2 3
Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United Phillies � . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Tree-Structured Graphical Models 4 � � P i,j ( x i , x j ) P ( x ) = P i ( x i ) P i ( x i ) P j ( x j ) i ∈ V ( i,j ) ∈ E 1 = P 1 ( x 1 ) P 2 | 1 ( x 2 | x 1 ) P 3 | 1 ( x 3 | x 1 ) P 4 | 1 ( x 4 | x 1 ) . 2 3
Structure Learning of Graphical Models Graphical model on p nodes n i.i.d. samples from multivariate distribution B Output estimated structure � G n � � G n � = G � Structural Consistency: lim n →∞ P = 0 .
Structure Learning of Graphical Models Graphical model on p nodes n i.i.d. samples from multivariate distribution B Output estimated structure � G n � � G n � = G � Structural Consistency: lim n →∞ P = 0 . Challenge: High Dimensionality (“Data-Poor” Regime) Large p , small n regime ( p ≫ n ) Sample Complexity: Required # of samples to achieve consistency Challenge: Computational Complexity Goal: Address above challenges and provide provable guarantees
Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1
Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T
Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML
Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML n samples and p nodes: Sample complexity: log p = O (1) . n
Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML n samples and p nodes: Sample complexity: log p = O (1) . n What other classes of graphical models are tractable for learning?
Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable � P ( x ) = 1 . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G
Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable � P ( x ) = 1 . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable
Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable � P ( x ) = 1 . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable Can we provide learning guarantees under above conditions?
Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable � P ( x ) = 1 . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable Can we provide learning guarantees under above conditions? Our Perspective: Tractable Graph Families Characterize the class of tractable families Incorporate all the above challenges Relevant for real datasets, e.g., social-network data
Related Work in Structure Learning Algorithms for Structure Learning Chow and Liu (68) Meinshausen and Buehlmann (06) Bresler, Mossel and Sly (09) Ravikumar, Wainwright and Lafferty (10) . . . Approaches Employed EM/Search approaches Combinatorial/Greedy approach Convex relaxation, . . .
Outline Introduction 1 Tractable Graph Families 2 Structure Estimation in Graphical Models 3 Method and Guarantees 4 Conclusion 5
Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0
Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ )
Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?
Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?
Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i �⊥ ⊥ X j | X S = ⇒ I ( X i ; X j | X S ) ≈ 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?
Tractable Graph Families: Local Separation γ -Local Separator S γ ( i, j ) i j Minimal vertex separator with respect to paths of S length less than γ ( η, γ ) -Local Separation Property for Graph G | S γ ( i, j ) | ≤ η for all ( i, j ) / ∈ G Locally tree-like Small-world Graphs Erd˝ os-R´ enyi graphs Watts-Strogatz model Power-law/scale-free graphs Hybrid/augmented graphs B
Outline Introduction 1 Tractable Graph Families 2 Structure Estimation in Graphical Models 3 Method and Guarantees 4 Conclusion 5
Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation
Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp ,
Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max
Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max Graph G satisfies ( η, γ ) local separation property
Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max Graph G satisfies ( η, γ ) local separation property Tradeoff between η, γ, J min , J max for tractable learning
Regime of Tractable Learning Efficient Learning Under Approximate Separation Maximum edge potential J max of Ising model satisfies J max < J ∗ . J ∗ is threshold for phase transition for conditional uniqueness.
Recommend
More recommend