high dimensional graphical model selection
play

High-Dimensional Graphical Model Selection Anima Anandkumar U.C. - PowerPoint PPT Presentation

High-Dimensional Graphical Model Selection Anima Anandkumar U.C. Irvine Joint work with Vincent Tan (U. Wisc.) and Alan Willsky (MIT). Graphical Models: Definition Conditional Independence Dodgers Everton X A X B | X S B Red Sox S


  1. High-Dimensional Graphical Model Selection Anima Anandkumar U.C. Irvine Joint work with Vincent Tan (U. Wisc.) and Alan Willsky (MIT).

  2. Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Arsenal Mets Manchester United Phillies

  3. Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United   Phillies  �  . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G

  4. Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United   Phillies  �  . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Tree-Structured Graphical Models 4 1 2 3

  5. Graphical Models: Definition Conditional Independence Dodgers Everton X A ⊥ ⊥ X B | X S B Red Sox S Baseball Soccer Chelsea A Yankees Factorization Arsenal Mets Manchester United   Phillies  �  . P ( x ) ∝ exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Tree-Structured Graphical Models 4 � � P i,j ( x i , x j ) P ( x ) = P i ( x i ) P i ( x i ) P j ( x j ) i ∈ V ( i,j ) ∈ E 1 = P 1 ( x 1 ) P 2 | 1 ( x 2 | x 1 ) P 3 | 1 ( x 3 | x 1 ) P 4 | 1 ( x 4 | x 1 ) . 2 3

  6. Structure Learning of Graphical Models Graphical model on p nodes n i.i.d. samples from multivariate distribution B Output estimated structure � G n � � G n � = G � Structural Consistency: lim n →∞ P = 0 .

  7. Structure Learning of Graphical Models Graphical model on p nodes n i.i.d. samples from multivariate distribution B Output estimated structure � G n � � G n � = G � Structural Consistency: lim n →∞ P = 0 . Challenge: High Dimensionality (“Data-Poor” Regime) Large p , small n regime ( p ≫ n ) Sample Complexity: Required # of samples to achieve consistency Challenge: Computational Complexity Goal: Address above challenges and provide provable guarantees

  8. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1

  9. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T

  10. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML

  11. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML n samples and p nodes: Sample complexity: log p = O (1) . n

  12. Tree Graphical Models: Tractable Learning Maximum likelihood learning of tree structure Proposed by Chow and Liu (68) Max. weight spanning tree n � B ˆ T ML = arg max log P ( x V ) . T k =1 � ˆ � I n ( X i ; X j ) . T ML = arg max T ( i,j ) ∈ T Pairwise statistics suffice for ML n samples and p nodes: Sample complexity: log p = O (1) . n What other classes of graphical models are tractable for learning?

  13. Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable    � P ( x ) = 1  . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G

  14. Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable    � P ( x ) = 1  . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable

  15. Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable    � P ( x ) = 1  . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable Can we provide learning guarantees under above conditions?

  16. Learning Graphical Models Beyond Trees Challenges Presence of cycles ◮ Pairwise statistics no longer suffice ◮ Likelihood function not tractable    � P ( x ) = 1  . Z exp Ψ i,j ( x i , x j ) ( i,j ) ∈ G Presence of high-degree nodes ◮ Brute-force search not tractable Can we provide learning guarantees under above conditions? Our Perspective: Tractable Graph Families Characterize the class of tractable families Incorporate all the above challenges Relevant for real datasets, e.g., social-network data

  17. Related Work in Structure Learning Algorithms for Structure Learning Chow and Liu (68) Meinshausen and Buehlmann (06) Bresler, Mossel and Sly (09) Ravikumar, Wainwright and Lafferty (10) . . . Approaches Employed EM/Search approaches Combinatorial/Greedy approach Convex relaxation, . . .

  18. Outline Introduction 1 Tractable Graph Families 2 Structure Estimation in Graphical Models 3 Method and Guarantees 4 Conclusion 5

  19. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0

  20. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ )

  21. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?

  22. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i ⊥ ⊥ X j | X S ⇐ ⇒ I ( X i ; X j | X S ) = 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?

  23. Intuitions: Conditional Mutual Information Test Separators in Graphical Models i j S ? X i �⊥ ⊥ X j | X S = ⇒ I ( X i ; X j | X S ) ≈ 0 Observations ∆ -separator for graphs with maximum degree ∆ ◮ Brute-force search for the separator: argmin I ( X i ; X j | X S ) | S |≤ ∆ ◮ Computational complexity scales as O ( p ∆ ) Approximate separators in general graphs?

  24. Tractable Graph Families: Local Separation γ -Local Separator S γ ( i, j ) i j Minimal vertex separator with respect to paths of S length less than γ ( η, γ ) -Local Separation Property for Graph G | S γ ( i, j ) | ≤ η for all ( i, j ) / ∈ G Locally tree-like Small-world Graphs Erd˝ os-R´ enyi graphs Watts-Strogatz model Power-law/scale-free graphs Hybrid/augmented graphs B

  25. Outline Introduction 1 Tractable Graph Families 2 Structure Estimation in Graphical Models 3 Method and Guarantees 4 Conclusion 5

  26. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation

  27. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp ,

  28. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max

  29. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max Graph G satisfies ( η, γ ) local separation property

  30. Setup: Ising and Gaussian Graphical Models n i.i.d. samples available for structure estimation Ising and Gaussian Graphical Models � 1 � 2 x T J G x + h T x x ∈ {− 1 , 1 } p . P ( x ) ∝ exp , � � − 1 2 x T J G x + h T x x ∈ R p . f ( x ) ∝ exp , For ( i, j ) ∈ G , J min ≤ | J i,j | ≤ J max Graph G satisfies ( η, γ ) local separation property Tradeoff between η, γ, J min , J max for tractable learning

  31. Regime of Tractable Learning Efficient Learning Under Approximate Separation Maximum edge potential J max of Ising model satisfies J max < J ∗ . J ∗ is threshold for phase transition for conditional uniqueness.

Recommend


More recommend