Probabilistic Graphical Models Probabilistic Graphical Models - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak Ravanbakhsh Fall 2019

Learning objective Learning objective an intuition for inference in graphical models why is it difficult? exact inference by variable elimination

Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n Introducing evidence leads to a similar problem P ( X = x , X = x ) P ( X = ∣ = ) = 1 1 m m x X x 1 1 m m P ( X = x ) m m

Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n Introducing evidence leads to a similar problem P ( X = x , X = x ) P ( X = ∣ = ) = 1 1 m m x X x 1 1 m m P ( X = x ) m m MAP inference changes sum to max ∗ x = P ( X = x ) arg max x maximum a posteriori

Probability query Probability query marginalization P ( X ) = ∑ x P ( X , X = , … , X = ) x x 1 1 2 2 ,…, x n n 2 n n = 2 X 1 representation: O (∣ V al ( X ) × V al ( X )∣) 1 2 inference: O (∣ V al ( X ) × V al ( X )∣) 1 2 X 2 P ( X ) 1

Probability query Probability query marginalization P ( X ) = ∑ x P ( X , X = , … , X = ) x x 1 1 2 2 ,…, x n n 2 n n = 3 X 1 representation: X O (∣ V al ( X ) × V al ( X ) × V al ( X )∣) 3 1 2 3 inference: O (∣ V al ( X ) × V al ( X ) × V al ( X )∣) 1 2 3 X 2 P ( X ) 1

Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n

Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n can have a compact representation of P: Bayes-net or Markov net 1 ∏ i =1 e.g. has an representation n −1 p ( x ) = ( x , x ) O ( n ) ϕ i +1 i i Z

Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n can have a compact representation of P: Bayes-net or Markov net 1 ∏ i =1 e.g. has an representation n −1 p ( x ) = ( x , x ) O ( n ) ϕ i +1 i i Z efficient inference ?

Complexity of inference Complexity of inference can we always avoid the exponential cost of inference? No! can we at least guarantee a good approximation? No! proof idea: reduce 3-SAT to inference in a graphical model despite this, graphical models are used for combinatorial optimization (why?)

Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable

Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable P ( X = x )

Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable given a BN, calculating is #P-complete P ( X = x )

Complexity of Complexity of approximate approximate inference inference given a BN, approximating with a relative error is NP-hard ϵ P ( X = x ) ρ Proof: ρ > 0 ⇔ P ( X = 1) > 0 ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 1+ ϵ our approximation

Complexity of Complexity of approximate approximate inference inference given a BN, approximating with an absolute error P ( X = x ∣ E = e ) ϵ for any is NP-hard 1 0 < ϵ < ρ (1 − ϵ ) ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 2

Complexity of Complexity of approximate approximate inference inference given a BN, approximating with an absolute error P ( X = x ∣ E = e ) ϵ for any is NP-hard 1 0 < ϵ < ρ (1 − ϵ ) ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 2 Proof: sequentially fix ∗ ∗ ∗ = arg max P ( Q = q ∣ ( Q , … , Q ) = ( q … q ), X = 1) q 1 i −1 q i 1 i −1 i 1 1 0 1 either > or > q q i 2 i 2 1 since this leads to a solution ϵ < 2

so far... so far... reduce the representation-cost using a graph structure inference-cost is in the worst case exponential can we reduce it using the graph structure?

Probability query: Probability query: example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ 1 i +1 n i i Z p ( x )? V al ( X ) = {1, … , d }∀ i n i Take 1: calculate n-dim. array p(x) O ( d ) n marginalize it p ( x ) p ( x ) = ∑ − x n n

Inference: Inference: example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ 1 i +1 n i i Z p ( x )? n Take 2: ~ calculate ( x ) = … ( x , x ) … ϕ ( x , x ) ∑ x ∑ x p ϕ 1 1 2 n −1 n −1 m n 1 n −1 without building p ( x ) ~ ~ normalize it p ( x ) = ( x )/( ( x )) ∑ x p n p n n n idea : use the distributive law: ab + ac = a ( b + c ) 3 operations 2 operations

Inference and the Inference and the distributive law distributive law ab + ac = a ( b + c ) distributive law 3 operations 2 operations save comutation by factoring the operations in disguise f ( x , y ) g ( y , z ) = g ( y , z ) f ( x , y ) ∑ x , y ∑ y ∑ x assuming ∣ V al ( X )∣ = ∣ V al ( Y )∣ = ∣ V al ( Z )∣ = d complexity: from to 3 2 O ( d ) O ( d )

Inference: Inference: back to back to example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ i +1 1 i i n Z Take 2: ~ objective ( x ) = ∑ x … ∑ x ( x , x ) … ϕ ( x , x ) p ϕ 1 1 2 n −1 n −1 m n 1 n −1 systematically apply the factorization: ~ ( x ) = ( x , x ) ( x , x ) … ( x , x ) ∑ x n ∑ x ∑ x p ϕ ϕ ϕ n −1 n −1 n −2 n −2 n −1 1 1 2 m n −1 n −2 1 complexity is instead of 2 O ( d ) n O ( nd )

Inference: Inference: example 2 example 2 p ( x , ˉ 6 ) 1 x p ( x ∣ ˉ 6 ) = Objective: x 1 p ( ˉ 6 ) x P ( X ∣ = ˉ 6 ) X x another way to write 1 6 (used in Jordan's textbook) calculate the numerator denominator is then easy p ( ˉ 6 ) = p ( x , ˉ 6 ) ∑ x x 1 x 1 source: Michael Jordan's book

Inference: Inference: example 2 example 2 3 O ( d ) source: Michael Jordan's book

Inference: Inference: example example p ( x , ˉ 6 ) 1 x 2 O ( d ) source: Michael Jordan's book

Inference: Inference: example example p ( x , ˉ 6 ) 1 x is constant 3 O ( d ) 2 O ( d )

Inference: Inference: example example overall complexity instead of O ( d ) 3 5 O ( d ) if we had built the 5d array of p ( x , x , x , x , x ∣ ˉ 6 ) x 1 2 3 4 5 in the general case O ( d ) n

Inference: Inference: example example (undirected version) (undirected version) 1 ∑ x p ( x , ˉ 6 ) = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x , x ) δ ( x , ˉ 6 ) 1 x 6 x 1 2 1 3 2 3 3 5 2 5 6 ,…, x Z 2 5 using a delta-function for conditioning { 1, if x = ˉ 6 x Text 6 ) ≜ δ ( x , ˉ 6 x 6 0, otherwise add it as a local potential

Inference: Inference: example example (undirected version) (undirected version) every step remains the same 1 ∑ x p ( x , ˉ 6 ) = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x , x ) δ ( x , ˉ 6 ) 1 x 6 x 1 2 1 3 2 3 3 5 2 5 6 ,…, x Z 2 5 1 ∑ x = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) m ( x , x ) 1 2 1 3 2 3 3 5 6 2 5 ,…, x Z 2 5 … 1 ∑ x = ϕ ( x , x ) … , m ( x ) ϕ ( x , x ) m ( x , x ) 2 ∑ x 1 2 4 1 3 5 2 3 Z 2 3 1 ∑ x = ϕ ( x , x ) … , m ( x ) m ( x , x ) 1 2 4 2 3 1 2 Z 2 1 = ( x ) m 2 1 Z except: in Bayes-nets Z=1 at this point normalization is easy!

Variable elimination Variable elimination input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K output: ( D ) ∑ x m ∏ k ϕ k k ,…, x i i 1 go over in some order: , … , x x i i 1 m collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ ∈ Scope [ ϕ ]} t t x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t marginalize out : ′ = x ∑ x ψ ψ i t t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the product of factors in Φ t = m

Variable elimination: Variable elimination: example example input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K 0 Φ = { p ( x ∣ ), p ( x ∣ ), p ( ˉ 6 ∣ , x ), p ( x ∣ ), p ( x ∣ )} x x x x x x 2 1 3 1 2 5 4 2 5 3 output: ( D ) ∑ x m ∏ k ϕ ,…, x k k i i 1

Probabilistic Graphical Models Probabilistic Graphical Models - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak Ravanbakhsh Fall 2019 Learning objective Learning objective an intuition for inference in graphical models why is it difficult? exact inference by

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Parameter learning in Bayesian

fifteen allowable stress adjustment factors terms, C with subscript i.e, bending: wood

PARADIGM Erkin Otles CS 838 PARADIGM Approach We developed an approach called PARADIGM

Polynomial Functions In Factored Form MHF4U: Advanced Functions Polynomials are generally written

Factor Models: A Review James J. Heckman The University of Chicago Econ 312, Winter 2019

Named Entity Recognition Lecture 12: October 18, 2013 CS886 2 Natural Language Understanding

Aon plc February 2020 Greg Case Chief Executive Officer Christa Davies Chief Financial Officer

Aon plc May 2020 Greg Case Chief Executive Officer Christa Davies Chief Financial Officer 1

Six views of embodied cognition (Wilson, 2002) What is meant by embodied cognition?