learning chordal markov networks by dynamic programming
play

Learning chordal Markov networks by dynamic programming Kustaa - PowerPoint PPT Presentation

Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014 Kustaa Kangas Learning chordal Markov networks by dynamic programming Probabilistic graphical


  1. Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim¨ aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014 Kustaa Kangas Learning chordal Markov networks by dynamic programming

  2. Probabilistic graphical models Graphical model ◮ Graph structure G on the vertex set V = { 1 , . . . , n } ◮ Represents conditional independencies in a joint distribution p ( X ) = p ( X 1 , . . . , X n ) Advantages ◮ Easy to read ◮ Compact way to store a distribution ◮ Efficient inference Kustaa Kangas Learning chordal Markov networks by dynamic programming

  3. Probabilistic graphical models Directed models: Bayesian networks, ... Undirected models: Markov networks, ... Structure learning problem : Given samples from p ( X 1 , . . . , X n ), find a model that best fits the sampled data. Kustaa Kangas Learning chordal Markov networks by dynamic programming

  4. Probabilistic graphical models Structure learning in chordal Markov networks : Find a chordal Markov network that maximizes a given decomposable score. Prior work: ◮ Constraint satisfaction, Corander et al. ◮ Integer linear programming, Bartlett and Cussens Our result: Dynamic programming in O (4 n ) time and O (3 n ) space for n variables. ◮ First non-trivial bound ◮ Competitive in practice Kustaa Kangas Learning chordal Markov networks by dynamic programming

  5. Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  6. Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  7. Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  8. Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  9. Markov networks If p is strictly positive, it factorizes as � p ( X 1 , . . . , X n ) = ψ C ( X C ) , C ∈ C where ◮ C is the set of (maximal) cliques of G ◮ ψ C are mappings to positive reals ◮ X C = { X v : v ∈ C } (Hammersley–Clifford Theorem) Kustaa Kangas Learning chordal Markov networks by dynamic programming

  10. Bayesian networks ◮ Directed acyclic graph ◮ Conditional independencies by d-separation ◮ Factorizes: n � p ( X 1 , . . . , X n ) = p ( X i | parents ( X i )) i =1 Kustaa Kangas Learning chordal Markov networks by dynamic programming

  11. Bayesian and Markov networks ◮ Bayesian and Markov networks are not equivalent ◮ Chordal Markov networks are the intersection between the two Kustaa Kangas Learning chordal Markov networks by dynamic programming

  12. Chordal graphs ◮ A chord is an edge between two non-consecutive vertices in a cycle. ◮ An graph is chordal or triangulated if every cycle of at least 4 vertices has a chord . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  13. Chordal graphs ◮ A chord is an edge between two non-consecutive vertices in a cycle. ◮ An graph is chordal or triangulated if every cycle of at least 4 vertices has a chord . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  14. Clique tree decomposition 7 1 5 9 2 3 6 8 4 Kustaa Kangas Learning chordal Markov networks by dynamic programming

  15. Clique tree decomposition 7 1 5 9 2 3 6 8 4 Kustaa Kangas Learning chordal Markov networks by dynamic programming

  16. Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  17. Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  18. Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming

  19. Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Separator : Intersection of adjacent cliques in a clique tree. Every clique tree has the same multiset of separators. Kustaa Kangas Learning chordal Markov networks by dynamic programming

  20. Clique tree decomposition 7 1 2 1 7 5 1 2 3 3 5 9 6 9 2 3 2 4 8 6 8 4 8 Theorem: A graph is chordal if and only if it has a clique tree. Kustaa Kangas Learning chordal Markov networks by dynamic programming

  21. Chordal Markov networks 7 1 5 9 2 3 6 8 4 ◮ ψ i ( X C i ) = p ( C i ) / p ( S i ) ◮ Factorization becomes � C ∈ C p ( X C ) � p ( X 1 , . . . , X n ) = ψ C ( X C ) = S ∈ S p ( X S ) , � C ∈ C where C and S are the sets of cliques and separators. Kustaa Kangas Learning chordal Markov networks by dynamic programming

  22. Structure learning Given sampled data D from p ( X 1 , . . . X n ), how well does a graph structure G fit the data? Common scoring criteria decompose as � C ∈ C score ( C ) score ( G ) = � S ∈ S score ( S ) Each score ( C ) is the probability of the data projected to C , possibly extended with a prior or penalization term. e.g. maximum likelihood, Bayesian Dirichlet, ... Kustaa Kangas Learning chordal Markov networks by dynamic programming

  23. Structure learning Structure learning problem in chordal Markov networks: Given score ( C ) for each C ⊆ V , find a chordal graph G that maximizes � C ∈ C score ( C ) score ( G ) = S ∈ S score ( S ) . � We assume each score ( C ) can be efficiently computed and focus on the combinatorial problem. Kustaa Kangas Learning chordal Markov networks by dynamic programming

  24. Structure learning Bruteforce solution: ◮ Enumerate undirected graphs ◮ Determine which are chordal ◮ For each chordal G , find a clique tree to evaluate score ( G ) ◮ O ∗ (2( n 2 )) Kustaa Kangas Learning chordal Markov networks by dynamic programming

  25. Structure learning We denote score ( T ) = score ( G ) when T is a clique tree of G . ◮ Every clique tree T uniquely specifies a chordal graph G . ◮ We can search the space of clique trees instead. Kustaa Kangas Learning chordal Markov networks by dynamic programming

  26. Recursive characterization 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Let T be rooted at C with subtrees T 1 , . . . , T k rooted at C 1 , . . . , C k . Then, k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 Kustaa Kangas Learning chordal Markov networks by dynamic programming

  27. Recurrence For S ⊂ V and ∅ ⊂ R ⊆ V \ S , let f ( S , R ) be the maximum score ( G ) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f ( ∅ , V ). Kustaa Kangas Learning chordal Markov networks by dynamic programming

  28. Recurrence For S ⊂ V and ∅ ⊂ R ⊆ V \ S , let f ( S , R ) be the maximum score ( G ) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f ( ∅ , V ). k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) . S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C Kustaa Kangas Learning chordal Markov networks by dynamic programming

  29. Recurrence k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C R R R 1 R 3 C C S C S R 2 Kustaa Kangas Learning chordal Markov networks by dynamic programming

  30. Recurrence k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 k f ( S i ∪ R i ) � f ( R ) = max score ( C ) score ( S i ) ∅ ⊂ C ⊆ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C R R R 1 R 3 C C S C S R 2 Kustaa Kangas Learning chordal Markov networks by dynamic programming

  31. Recurrence k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C Kustaa Kangas Learning chordal Markov networks by dynamic programming

  32. Recurrence k f ( S i , R i ) � f ( S , R ) = max score ( C ) max score ( S i ) S ⊂ C ⊆ S ∪ R S i ⊂ C i =1 { R 1 , . . . , R k } ❁ R \ C Kustaa Kangas Learning chordal Markov networks by dynamic programming

Recommend


More recommend