Probabilistic Graphical Models David Sontag New York University - PowerPoint PPT Presentation

Probabilistic Graphical Models David Sontag New York University Lecture 5, Feb. 28, 2013 David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 1 / 22

Today’s lecture 1 Using VE for conditional queries 2 Running-time of variable elimination Elimination as graph transformation Fill edges, width, treewidth 3 Sum-product belief propagation (BP) Done on blackboard 4 Max-product belief propagation David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 2 / 22

How to introduce evidence? Recall that our original goal was to answer conditional probability queries, p ( Y | E = e ) = p ( Y , e ) p ( e ) Apply variable elimination algorithm to the task of computing P ( Y , e ) Replace each factor φ ∈ Φ that has E ∩ Scope [ φ ] � = ∅ with φ ′ ( x Scope [ φ ] − E ) = φ ( x Scope [ φ ] − E , e E ∩ Scope [ φ ] ) Then, eliminate the variables in X − Y − E . The returned factor φ ∗ ( Y ) is p ( Y , e ) To obtain the conditional p ( Y | e ), normalize the resulting product of factors – the normalization constant is p ( e ) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 3 / 22

Sum-product VE for conditional distributions David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 4 / 22

Running time of variable elimination Let n be the number of variables, and m the number of initial factors At each step, we pick a variable X i and multiply all factors involving X i , resulting in a single factor ψ i Let N i be the number of variables in the factor ψ i , and let N max = max i N i The running time of VE is then O ( mk N max ), where k = | Val ( X ) | . Why? The primary concern is that N max can potentially be as large as n David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 5 / 22

Running time in graph-theoretic concepts Let’s try to analyze the complexity in terms of the graph structure G Φ is the undirected graph with one node per variable, where there is an edge ( X i , X j ) if these appear together in the scope of some factor φ Ignoring evidence, this is either the original MRF (for sum-product VE on MRFs) or the moralized Bayesian network: David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 6 / 22

Elimination as graph transformation When a variable X is eliminated, We create a single factor ψ that contains X and all of the variables Y with which it appears in factors We eliminate X from ψ , replacing it with a new factor τ that contains all of the variables Y , but not X . Let’s call the new set of factors Φ X How does this modify the graph, going from G Φ to G Φ X ? Constructing ψ generates edges between all of the variables Y ∈ Y Some of these edges were already in G Φ , some are new The new edges are called fill edges The step of removing X from Φ to construct Φ X removes X and all its incident edges from the graph David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 7 / 22

Example (Graph) (Elim. C ) (Elim. D ) (Elim. I ) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 8 / 22

Induced graph We can summarize the computation cost using a single graph that is the union of all the graphs resulting from each step of the elimination We call this the induced graph I Φ , ≺ , where ≺ is the elimination ordering David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 9 / 22

Example (Induced graph) (Maximal Cliques) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 10 / 22

Properties of the induced graph Theorem : Let I Φ , ≺ be the induced graph for a set of factors Φ and ordering ≺ , then Every factor generated during VE has a scope that is a clique in I Φ , ≺ 1 Every maximal clique in I Φ , ≺ is the scope of some intermediate factor 2 in the computation (see book for proof) Thus, N max is equal to the size of the largest clique in I Φ , ≺ The running time, O ( mk N max ), is exponential in the size of the largest clique of the induced graph David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 11 / 22

Example (Maximal Cliques) (VE) The maximal cliques in I G , ≺ are = { C , D } C 1 = { D , I , G } C 2 C 3 = { G , L , S , J } C 4 = { G , J , H } David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 12 / 22

Induced width The width of an induced graph is #nodes in largest clique - 1 We define the induced width w G , ≺ to be the width of the graph I G , ≺ induced by applying VE to G using ordering ≺ The treewidth , or “minimal induced width” of graph G is w ∗ G = min ≺ w G , ≺ The treewidth provides a bound on the best running time achievable by VE on a distribution that factorizes over G : O ( mk w ∗ G ), Unfortunately, finding the best elimination ordering (equivalently, computing the treewidth) for a graph is NP-hard In practice, heuristics (e.g., min-fill) are used to find a good elimination ordering David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 13 / 22

Chordal Graphs Graph is chordal , or triangulated, if every cycle of length ≥ 3 has a shortcut (called a “chord”) Theorem: Every induced graph is chordal Proof: (by contradiction) Assume we have a chordless cycle X 1 − X 2 − X 3 − X 4 − X 1 in the induced graph Suppose X 1 was the first variable that we eliminated (of these 4) After a node is eliminated, no fill edges can be added to it. Thus, X 1 − X 2 and X 1 − X 4 must have pre-existed Eliminating X 1 introduces the edge X 2 − X 4 , contradicting our assumption David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 14 / 22

Chordal graphs Thm: Every induced graph is chordal Thm: Any chordal graph has an elimination ordering that does not introduce any fill edges (The elimination ordering is REVERSE) Conclusion: Finding a good elimination ordering is equivalent to making graph chordal with minimal width David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 15 / 22

Today’s lecture 1 Using VE for conditional queries 2 Running-time of variable elimination Elimination as graph transformation Fill edges, width, treewidth 3 Sum-product belief propagation (BP) Done on blackboard 4 Max-product belief propagation David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 16 / 22

MAP inference Recall the MAP inference task, p ( x ) = 1 � arg max p ( x ) , φ c ( x c ) Z x c ∈ C (we assume any evidence has been subsumed into the potentials, as discussed in the last lecture) Since the normalization term is simply a constant, this is equivalent to � arg max φ c ( x c ) x c ∈ C (called the max-product inference task) Furthermore, since log is monotonic, letting θ c ( x c ) = lg φ c ( x c ), we have that this is equivalent to � arg max θ c ( x c ) x c ∈ C (called max-sum ) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 17 / 22

Semi-rings Compare the sum-product problem with the max-product (equivalently, max-sum in log space): � � sum-product φ c ( x c ) x c ∈ C � max-sum max θ c ( x c ) x c ∈ C Can exchange operators (+ , ∗ ) for (max , +) and, because both are semirings satisfying associativity and commutativity, everything works! We get “max-product variable elimination” and “max-product belief propagation” David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 18 / 22

Simple example Suppose we have a simple chain, A − B − C − D , and we want to find the MAP assignment, a , b , c , d φ AB ( a , b ) φ BC ( b , c ) φ CD ( c , d ) max Just as we did before, we can push the maximizations inside to obtain: max a , b φ AB ( a , b ) max φ BC ( b , c ) max φ CD ( c , d ) c d or, equivalently, max a , b θ AB ( a , b ) + max θ BC ( b , c ) + max θ CD ( c , d ) c d To find the actual maximizing assignment, we do a traceback (or keep back pointers) David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 19 / 22

Max-product variable elimination Procedure Max-Product-VE ( Φ , // Set of factors over X ≺ // Ordering on X ) 1 Let X 1 , . . . , X k be an ordering of X such that 2 X i ≺ X j iff i < j 3 for i = 1 , . . . , k 4 (Φ , φ X i ) ← Max-Product-Eliminate-Var (Φ , X i ) x ∗ ← Traceback-MAP ( { φ X i : i = 1 , . . . , k } ) 5 return x ∗ , Φ 6 // Φ contains the probability of the MAP Procedure Max-Product-Eliminate-Var ( Φ , // Set of factors Z // Variable to be eliminated ) Φ � ← { φ ∈ Φ : Z ∈ Scope [ φ ] } 1 Φ �� ← Φ − Φ � 2 3 ψ ← � φ ∈ Φ � φ 4 τ ← max Z ψ return (Φ �� ∪ { τ } , ψ ) 5 Procedure Traceback-MAP ( { φ X i : i = 1 , . . . , k } ) 1 for i = k, . . . , 1 u i ← ( x ∗ i +1 , . . . , x ∗ 2 k ) � Scope [ φ X i ] − { X i }� 3 // The maximizing assignment to the variables eliminated after X i x ∗ i ← arg max x i φ X i ( x i , u i ) 4 // x ∗ 5 i is chosen so as to maximize the corresponding entry in the factor, relative to the previous choices u i return x ∗ 6 David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 20 / 22

Probabilistic Graphical Models David Sontag New York University - PowerPoint PPT Presentation

Probabilistic Graphical Models David Sontag New York University Lecture 5, Feb. 28, 2013 David Sontag (NYU) Graphical Models Lecture 5, Feb. 28, 2013 1 / 22 Todays lecture 1 Using VE for conditional queries 2 Running-time of variable

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Network Components Parts of a Network app router link host Computer Networks 2 Parts of a

What is a prime number? What is a prime number? What is a prime number? What is a prime number?

Readings: 7.1

Main Main Mes Messag age four plagues later (ten total) Get a lamb without blemish

Efficient Distributed Workload (Re-)Embedding Monika Stefan Stefan Henzinger Neumann Schmid

(jeez y) Where is the Internet? Answers from : (G. Whilikers) Out there. (Mike) the way I

Cyber@UC Meeting 61 Running a Linux box securely If Youre New! Join our Slack:

ICT and international security Gian Piero Siroli, Physics and Astronomy Dept. Univ. of Bologna