Probabilistic Graphical Models David Sontag New York University Lecture 5, Feb. 28, 2013 David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 1 / 22
Today’s lecture 1 Using VE for conditional queries 2 Running-time of variable elimination Elimination as graph transformation Fill edges, width, treewidth 3 Sum-product belief propagation (BP) Done on blackboard 4 Max-product belief propagation David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 2 / 22
How to introduce evidence? Recall that our original goal was to answer conditional probability queries, p ( Y | E = e ) = p ( Y , e ) p ( e ) Apply variable elimination algorithm to the task of computing P ( Y , e ) Replace each factor φ ∈ Φ that has E ∩ Scope [ φ ] � = ∅ with φ ′ ( x Scope [ φ ] − E ) = φ ( x Scope [ φ ] − E , e E ∩ Scope [ φ ] ) Then, eliminate the variables in X − Y − E . The returned factor φ ∗ ( Y ) is p ( Y , e ) To obtain the conditional p ( Y | e ), normalize the resulting product of factors – the normalization constant is p ( e ) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 3 / 22
Sum-product VE for conditional distributions David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 4 / 22
Running time of variable elimination Let n be the number of variables, and m the number of initial factors At each step, we pick a variable X i and multiply all factors involving X i , resulting in a single factor ψ i Let N i be the number of variables in the factor ψ i , and let N max = max i N i The running time of VE is then O ( mk N max ), where k = | Val ( X ) | . Why? The primary concern is that N max can potentially be as large as n David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 5 / 22
Running time in graph-theoretic concepts Let’s try to analyze the complexity in terms of the graph structure G Φ is the undirected graph with one node per variable, where there is an edge ( X i , X j ) if these appear together in the scope of some factor φ Ignoring evidence, this is either the original MRF (for sum-product VE on MRFs) or the moralized Bayesian network: David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 6 / 22
Elimination as graph transformation When a variable X is eliminated, We create a single factor ψ that contains X and all of the variables Y with which it appears in factors We eliminate X from ψ , replacing it with a new factor τ that contains all of the variables Y , but not X . Let’s call the new set of factors Φ X How does this modify the graph, going from G Φ to G Φ X ? Constructing ψ generates edges between all of the variables Y ∈ Y Some of these edges were already in G Φ , some are new The new edges are called fill edges The step of removing X from Φ to construct Φ X removes X and all its incident edges from the graph David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 7 / 22
Example (Graph) (Elim. C ) (Elim. D ) (Elim. I ) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 8 / 22
Induced graph We can summarize the computation cost using a single graph that is the union of all the graphs resulting from each step of the elimination We call this the induced graph I Φ , ≺ , where ≺ is the elimination ordering David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 9 / 22
Example (Induced graph) (Maximal Cliques) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 10 / 22
Properties of the induced graph Theorem : Let I Φ , ≺ be the induced graph for a set of factors Φ and ordering ≺ , then Every factor generated during VE has a scope that is a clique in I Φ , ≺ 1 Every maximal clique in I Φ , ≺ is the scope of some intermediate factor 2 in the computation (see book for proof) Thus, N max is equal to the size of the largest clique in I Φ , ≺ The running time, O ( mk N max ), is exponential in the size of the largest clique of the induced graph David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 11 / 22
Example (Maximal Cliques) (VE) The maximal cliques in I G , ≺ are = { C , D } C 1 = { D , I , G } C 2 C 3 = { G , L , S , J } C 4 = { G , J , H } David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 12 / 22
Induced width The width of an induced graph is #nodes in largest clique - 1 We define the induced width w G , ≺ to be the width of the graph I G , ≺ induced by applying VE to G using ordering ≺ The treewidth , or “minimal induced width” of graph G is w ∗ G = min ≺ w G , ≺ The treewidth provides a bound on the best running time achievable by VE on a distribution that factorizes over G : O ( mk w ∗ G ), Unfortunately, finding the best elimination ordering (equivalently, computing the treewidth) for a graph is NP-hard In practice, heuristics (e.g., min-fill) are used to find a good elimination ordering David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 13 / 22
Chordal Graphs Graph is chordal , or triangulated, if every cycle of length ≥ 3 has a shortcut (called a “chord”) Theorem: Every induced graph is chordal Proof: (by contradiction) Assume we have a chordless cycle X 1 − X 2 − X 3 − X 4 − X 1 in the induced graph Suppose X 1 was the first variable that we eliminated (of these 4) After a node is eliminated, no fill edges can be added to it. Thus, X 1 − X 2 and X 1 − X 4 must have pre-existed Eliminating X 1 introduces the edge X 2 − X 4 , contradicting our assumption David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 14 / 22
Chordal graphs Thm: Every induced graph is chordal Thm: Any chordal graph has an elimination ordering that does not introduce any fill edges (The elimination ordering is REVERSE) Conclusion: Finding a good elimination ordering is equivalent to making graph chordal with minimal width David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 15 / 22
Today’s lecture 1 Using VE for conditional queries 2 Running-time of variable elimination Elimination as graph transformation Fill edges, width, treewidth 3 Sum-product belief propagation (BP) Done on blackboard 4 Max-product belief propagation David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 16 / 22
MAP inference Recall the MAP inference task, p ( x ) = 1 � arg max p ( x ) , φ c ( x c ) Z x c ∈ C (we assume any evidence has been subsumed into the potentials, as discussed in the last lecture) Since the normalization term is simply a constant, this is equivalent to � arg max φ c ( x c ) x c ∈ C (called the max-product inference task) Furthermore, since log is monotonic, letting θ c ( x c ) = lg φ c ( x c ), we have that this is equivalent to � arg max θ c ( x c ) x c ∈ C (called max-sum ) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 17 / 22
Semi-rings Compare the sum-product problem with the max-product (equivalently, max-sum in log space): � � sum-product φ c ( x c ) x c ∈ C � max-sum max θ c ( x c ) x c ∈ C Can exchange operators (+ , ∗ ) for (max , +) and, because both are semirings satisfying associativity and commutativity, everything works! We get “max-product variable elimination” and “max-product belief propagation” David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 18 / 22
Simple example Suppose we have a simple chain, A − B − C − D , and we want to find the MAP assignment, a , b , c , d φ AB ( a , b ) φ BC ( b , c ) φ CD ( c , d ) max Just as we did before, we can push the maximizations inside to obtain: max a , b φ AB ( a , b ) max φ BC ( b , c ) max φ CD ( c , d ) c d or, equivalently, max a , b θ AB ( a , b ) + max θ BC ( b , c ) + max θ CD ( c , d ) c d To find the actual maximizing assignment, we do a traceback (or keep back pointers) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 19 / 22
Max-product variable elimination Procedure Max-Product-VE ( Φ , // Set of factors over X ≺ // Ordering on X ) 1 Let X 1 , . . . , X k be an ordering of X such that 2 X i ≺ X j iff i < j 3 for i = 1 , . . . , k 4 (Φ , φ X i ) ← Max-Product-Eliminate-Var (Φ , X i ) x ∗ ← Traceback-MAP ( { φ X i : i = 1 , . . . , k } ) 5 return x ∗ , Φ 6 // Φ contains the probability of the MAP Procedure Max-Product-Eliminate-Var ( Φ , // Set of factors Z // Variable to be eliminated ) Φ � ← { φ ∈ Φ : Z ∈ Scope [ φ ] } 1 Φ �� ← Φ − Φ � 2 3 ψ ← � φ ∈ Φ � φ 4 τ ← max Z ψ return (Φ �� ∪ { τ } , ψ ) 5 Procedure Traceback-MAP ( { φ X i : i = 1 , . . . , k } ) 1 for i = k, . . . , 1 u i ← ( x ∗ i +1 , . . . , x ∗ 2 k ) � Scope [ φ X i ] − { X i }� 3 // The maximizing assignment to the variables eliminated after X i x ∗ i ← arg max x i φ X i ( x i , u i ) 4 // x ∗ 5 i is chosen so as to maximize the corresponding entry in the factor, relative to the previous choices u i return x ∗ 6 David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 20 / 22
Recommend
More recommend