Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16
Today’s lecture 1 Running-time of variable elimination Elimination as graph transformation Fill edges, width, treewidth 2 Sum-product belief propagation (BP) Done on blackboard 3 Max-product belief propagation 4 Loopy belief propagation David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 2 / 16
Running time of VE in graph-theoretic concepts Let’s try to analyze the complexity in terms of the graph structure G Φ is the undirected graph with one node per variable, where there is an edge ( X i , X j ) if these appear together in the scope of some factor φ Ignoring evidence, this is either the original MRF (for sum-product VE on MRFs) or the moralized Bayesian network: David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 3 / 16
Elimination as graph transformation When a variable X is eliminated, We create a single factor ψ that contains X and all of the variables Y with which it appears in factors We eliminate X from ψ , replacing it with a new factor τ that contains all of the variables Y , but not X . Let’s call the new set of factors Φ X How does this modify the graph, going from G Φ to G Φ X ? Constructing ψ generates edges between all of the variables Y ∈ Y Some of these edges were already in G Φ , some are new The new edges are called fill edges The step of removing X from Φ to construct Φ X removes X and all its incident edges from the graph David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 4 / 16
Induced graph We can summarize the computation cost using a single graph that is the union of all the graphs resulting from each step of the elimination We call this the induced graph I Φ , ≺ , where ≺ is the elimination ordering David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 5 / 16
Chordal Graphs Graph is chordal , or triangulated, if every cycle of length ≥ 3 has a shortcut (called a “chord”) Theorem: Every induced graph is chordal Proof: (by contradiction) Assume we have a chordless cycle X 1 − X 2 − X 3 − X 4 − X 1 in the induced graph Suppose X 1 was the first variable that we eliminated (of these 4) After a node is eliminated, no fill edges can be added to it. Thus, X 1 − X 2 and X 1 − X 4 must have pre-existed Eliminating X 1 introduces the edge X 2 − X 4 , contradicting our assumption David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 6 / 16
Chordal graphs Thm: Every induced graph is chordal Thm: Any chordal graph has an elimination ordering that does not introduce any fill edges (The elimination ordering is REVERSE) Conclusion: Finding a good elimination ordering is equivalent to making graph chordal with minimal width David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 7 / 16
Today’s lecture 1 Running-time of variable elimination Elimination as graph transformation Fill edges, width, treewidth 2 Sum-product belief propagation (BP) Done on blackboard 3 Max-product belief propagation 4 Loopy belief propagation David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 8 / 16
MAP inference Recall the MAP inference task, p ( x ) = 1 � arg max p ( x ) , φ c ( x c ) Z x c ∈ C (we assume any evidence has been subsumed into the potentials, as discussed in the last lecture) Since the normalization term is simply a constant, this is equivalent to � arg max φ c ( x c ) x c ∈ C (called the max-product inference task) Furthermore, since log is monotonic, letting θ c ( x c ) = lg φ c ( x c ), we have that this is equivalent to � arg max θ c ( x c ) x c ∈ C (called max-sum ) David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 9 / 16
Semi-rings Compare the sum-product problem with the max-product (equivalently, max-sum in log space): � � sum-product φ c ( x c ) x c ∈ C � max-sum max θ c ( x c ) x c ∈ C Can exchange operators (+ , ∗ ) for (max , +) and, because both are semirings satisfying associativity and commutativity, everything works! We get “max-product variable elimination” and “max-product belief propagation” David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 10 / 16
Simple example Suppose we have a simple chain, A − B − C − D , and we want to find the MAP assignment, a , b , c , d φ AB ( a , b ) φ BC ( b , c ) φ CD ( c , d ) max Just as we did before, we can push the maximizations inside to obtain: max a , b φ AB ( a , b ) max φ BC ( b , c ) max φ CD ( c , d ) c d or, equivalently, max a , b θ AB ( a , b ) + max θ BC ( b , c ) + max θ CD ( c , d ) c d [ Illustrate factor max-marginalization on board. ] To find the actual maximizing assignment, we do a traceback (or keep back pointers) David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 11 / 16
Max-product variable elimination Procedure Max-Product-VE ( Φ , // Set of factors over X ≺ // Ordering on X ) 1 Let X 1 , . . . , X k be an ordering of X such that 2 X i ≺ X j iff i < j 3 for i = 1 , . . . , k 4 (Φ , φ X i ) ← Max-Product-Eliminate-Var (Φ , X i ) x ∗ ← Traceback-MAP ( { φ X i : i = 1 , . . . , k } ) 5 6 return x ∗ , Φ // Φ contains the probability of the MAP Procedure Max-Product-Eliminate-Var ( Φ , // Set of factors Z // Variable to be eliminated ) Φ � ← { φ ∈ Φ : Z ∈ Scope [ φ ] } 1 Φ �� ← Φ − Φ � 2 3 ψ ← � φ ∈ Φ � φ 4 τ ← max Z ψ return (Φ �� ∪ { τ } , ψ ) 5 Procedure Traceback-MAP ( { φ X i : i = 1 , . . . , k } ) 1 for i = k, . . . , 1 2 u i ← ( x ∗ i +1 , . . . , x ∗ k ) � Scope [ φ X i ] − { X i }� 3 // The maximizing assignment to the variables eliminated after X i 4 x ∗ i ← arg max x i φ X i ( x i , u i ) // x ∗ 5 i is chosen so as to maximize the corresponding entry in the factor, relative to the previous choices u i 6 return x ∗ David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 12 / 16
Max-product belief propagation (for tree-structured MRFs) Same as sum-product BP except that the messages are now: � m j → i ( x i ) = max φ j ( x j ) φ ij ( x i , x j ) m k → j ( x j ) x j k ∈ N ( j ) \ i After passing all messages, can compute single node max-marginals , � m i ( x i ) = φ i ( x i ) m j → i ( x i ) ∝ max x V \ i p ( x V \ i , x i ) j ∈ N ( i ) If the MAP assignment x ∗ is unique , can find it by locally decoding each of the single node max-marginals, i.e. x ∗ i = arg max m i ( x i ) x i David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 13 / 16
Max-sum belief propagation (for tree-structured MRFs) Same as sum-product BP except that the messages are now: � m j → i ( x i ) = max θ j ( x j ) + θ ij ( x i , x j ) + m k → j ( x j ) x j k ∈ N ( j ) \ i After passing all messages, can compute single node max-marginals , � m i ( x i ) = θ i ( x i ) + m j → i ( x i ) = max x V \ i log p ( x V \ i , x i ) + C j ∈ N ( i ) If the MAP assignment x ∗ is unique , can find it by locally decoding each of the single node max-marginals, i.e. x ∗ i = arg max m i ( x i ) x i Working in log-space prevents numerical underflow/overflow David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 14 / 16
Implementing sum-product in log-space Recall the sum-product messages: � � m j → i ( x i ) = φ j ( x j ) φ ij ( x i , x j ) m k → j ( x j ) x j k ∈ N ( j ) \ i Making the messages in log-space corresponds to the update: � � m j → i ( x i ) = log exp( θ j ( x j ) + θ ij ( x i , x j ) + m k → j ( x j )) x j k ∈ N ( j ) \ i � = log exp( T ( x i , x j )) , x j where T ( x i , x j ) = θ j ( x j ) + θ ij ( x i , x j ) + � k ∈ N ( j ) \ i m k → j ( x j ) Letting c x i = max x j T ( x i , x j ), this is equivalent to � = c x i + log exp( T ( x i , x j ) − c x i ) , x j David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 15 / 16
Exactly solving MAP, beyond trees MAP as a discrete optimization problem is � � arg max θ i ( x i ) + θ ij ( x i , x j ) x i ∈ V ij ∈ E Very general discrete optimization problem – many hard combinatorial optimization problems can be written as this (e.g., 3-SAT) Studied in operations research communities, theoretical computer science, AI (constraint satisfaction, weighted SAT), etc. Very fast moving field, both for theory and heuristics David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 16 / 16
Recommend
More recommend