Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim¨ aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Probabilistic graphical models Graphical model ◮ Graph structure G on the vertex set V = { 1 , . . . , n } ◮ Represents conditional independencies in a joint distribution p ( X ) = p ( X 1 , . . . , X n ) Advantages ◮ Easy to read ◮ Compact way to store a distribution ◮ Efficient inference Kustaa Kangas Learning chordal Markov networks by dynamic programming
Probabilistic graphical models Directed models: Bayesian networks, ... Undirected models: Markov networks, ... Structure learning problem : Given samples from p ( X 1 , . . . , X n ), find a model that best fits the sampled data. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Probabilistic graphical models Structure learning in chordal Markov networks : Find a chordal Markov network that maximizes a given decomposable score. Prior work: ◮ Constraint satisfaction, Corander et al. ◮ Integer linear programming, Bartlett and Cussens Our result: Dynamic programming in O (4 n ) time and O (3 n ) space for n variables. ◮ First non-trivial bound ◮ Competitive in practice Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks If p is strictly positive, it factorizes as � p ( X 1 , . . . , X n ) = ψ C ( X C ) , C ∈ C where ◮ C is the set of (maximal) cliques of G ◮ ψ C are mappings to positive reals ◮ X C = { X v : v ∈ C } (Hammersley–Clifford Theorem) Kustaa Kangas Learning chordal Markov networks by dynamic programming
Bayesian networks ◮ Directed acyclic graph ◮ Conditional independencies by d-separation ◮ Factorizes: n � p ( X 1 , . . . , X n ) = p ( X i | parents ( X i )) i =1 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Bayesian and Markov networks ◮ Bayesian and Markov networks are not equivalent ◮ Chordal Markov networks are the intersection between the two Kustaa Kangas Learning chordal Markov networks by dynamic programming
Chordal graphs ◮ A chord is an edge between two non-consecutive vertices in a cycle. ◮ An graph is chordal or triangulated if every cycle of at least 4 vertices has a chord . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Chordal graphs ◮ A chord is an edge between two non-consecutive vertices in a cycle. ◮ An graph is chordal or triangulated if every cycle of at least 4 vertices has a chord . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 5 9 2 3 6 8 4 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 5 9 2 3 6 8 4 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Separator : Intersection of adjacent cliques in a clique tree. Every clique tree has the same multiset of separators. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 7 5 1 2 3 3 5 9 6 9 2 3 2 4 8 6 8 4 8 Theorem: A graph is chordal if and only if it has a clique tree. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Chordal Markov networks 7 1 5 9 2 3 6 8 4 ◮ ψ i ( X C i ) = p ( C i ) / p ( S i ) ◮ Factorization becomes � C ∈ C p ( X C ) � p ( X 1 , . . . , X n ) = ψ C ( X C ) = S ∈ S p ( X S ) , � C ∈ C where C and S are the sets of cliques and separators. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Structure learning Given sampled data D from p ( X 1 , . . . X n ), how well does a graph structure G fit the data? Common scoring criteria decompose as � C ∈ C score ( C ) score ( G ) = � S ∈ S score ( S ) Each score ( C ) is the probability of the data projected to C , possibly extended with a prior or penalization term. e.g. maximum likelihood, Bayesian Dirichlet, ... Kustaa Kangas Learning chordal Markov networks by dynamic programming
Structure learning Structure learning problem in chordal Markov networks: Given score ( C ) for each C ⊆ V , find a chordal graph G that maximizes � C ∈ C score ( C ) score ( G ) = S ∈ S score ( S ) . � We assume each score ( C ) can be efficiently computed and focus on the combinatorial problem. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Structure learning Bruteforce solution: ◮ Enumerate undirected graphs ◮ Determine which are chordal ◮ For each chordal G , find a clique tree to evaluate score ( G ) ◮ O ∗ (2( n 2 )) Kustaa Kangas Learning chordal Markov networks by dynamic programming
Structure learning We denote score ( T ) = score ( G ) when T is a clique tree of G . ◮ Every clique tree T uniquely specifies a chordal graph G . ◮ We can search the space of clique trees instead. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recursive characterization 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Let T be rooted at C with subtrees T 1 , . . . , T k rooted at C 1 , . . . , C k . Then, k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence For S ⊂ V and ∅ ⊂ R ⊆ V \ S , let f ( S , R ) be the maximum score ( G ) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f ( ∅ , V ). Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence For S ⊂ V and ∅ ⊂ R ⊆ V \ S , let f ( S , R ) be the maximum score ( G ) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f ( ∅ , V ). k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) . S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C R R R 1 R 3 C C S C S R 2 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 k f ( S i ∪ R i ) � f ( R ) = max score ( C ) score ( S i ) ∅ ⊂ C ⊆ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C R R R 1 R 3 C C S C S R 2 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence k f ( S i , R i ) � f ( S , R ) = max score ( C ) max score ( S i ) S ⊂ C ⊆ S ∪ R S i ⊂ C i =1 { R 1 , . . . , R k } ❁ R \ C Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recommend
More recommend