Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak Ravanbakhsh Fall 2019
Learning objective Learning objective an intuition for inference in graphical models why is it difficult? exact inference by variable elimination
Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n Introducing evidence leads to a similar problem P ( X = x , X = x ) P ( X = ∣ = ) = 1 1 m m x X x 1 1 m m P ( X = x ) m m
Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n Introducing evidence leads to a similar problem P ( X = x , X = x ) P ( X = ∣ = ) = 1 1 m m x X x 1 1 m m P ( X = x ) m m MAP inference changes sum to max ∗ x = P ( X = x ) arg max x maximum a posteriori
Probability query Probability query marginalization P ( X ) = ∑ x P ( X , X = , … , X = ) x x 1 1 2 2 ,…, x n n 2 n n = 2 X 1 representation: O (∣ V al ( X ) × V al ( X )∣) 1 2 inference: O (∣ V al ( X ) × V al ( X )∣) 1 2 X 2 P ( X ) 1
Probability query Probability query marginalization P ( X ) = ∑ x P ( X , X = , … , X = ) x x 1 1 2 2 ,…, x n n 2 n n = 3 X 1 representation: X O (∣ V al ( X ) × V al ( X ) × V al ( X )∣) 3 1 2 3 inference: O (∣ V al ( X ) × V al ( X ) × V al ( X )∣) 1 2 3 X 2 P ( X ) 1
Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n
Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n can have a compact representation of P: Bayes-net or Markov net 1 ∏ i =1 e.g. has an representation n −1 p ( x ) = ( x , x ) O ( n ) ϕ i +1 i i Z
Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n can have a compact representation of P: Bayes-net or Markov net 1 ∏ i =1 e.g. has an representation n −1 p ( x ) = ( x , x ) O ( n ) ϕ i +1 i i Z efficient inference ?
Complexity of inference Complexity of inference can we always avoid the exponential cost of inference? No! can we at least guarantee a good approximation? No! proof idea: reduce 3-SAT to inference in a graphical model despite this, graphical models are used for combinatorial optimization (why?)
Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable
Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable P ( X = x )
Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable given a BN, calculating is #P-complete P ( X = x )
Complexity of Complexity of approximate approximate inference inference given a BN, approximating with a relative error is NP-hard ϵ P ( X = x ) ρ Proof: ρ > 0 ⇔ P ( X = 1) > 0 ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 1+ ϵ our approximation
Complexity of Complexity of approximate approximate inference inference given a BN, approximating with an absolute error P ( X = x ∣ E = e ) ϵ for any is NP-hard 1 0 < ϵ < ρ (1 − ϵ ) ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 2
Complexity of Complexity of approximate approximate inference inference given a BN, approximating with an absolute error P ( X = x ∣ E = e ) ϵ for any is NP-hard 1 0 < ϵ < ρ (1 − ϵ ) ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 2 Proof: sequentially fix ∗ ∗ ∗ = arg max P ( Q = q ∣ ( Q , … , Q ) = ( q … q ), X = 1) q 1 i −1 q i 1 i −1 i 1 1 0 1 either > or > q q i 2 i 2 1 since this leads to a solution ϵ < 2
so far... so far... reduce the representation-cost using a graph structure inference-cost is in the worst case exponential can we reduce it using the graph structure?
Probability query: Probability query: example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ 1 i +1 n i i Z p ( x )? V al ( X ) = {1, … , d }∀ i n i Take 1: calculate n-dim. array p(x) O ( d ) n marginalize it p ( x ) p ( x ) = ∑ − x n n
Inference: Inference: example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ 1 i +1 n i i Z p ( x )? n Take 2: ~ calculate ( x ) = … ( x , x ) … ϕ ( x , x ) ∑ x ∑ x p ϕ 1 1 2 n −1 n −1 m n 1 n −1 without building p ( x ) ~ ~ normalize it p ( x ) = ( x )/( ( x )) ∑ x p n p n n n idea : use the distributive law: ab + ac = a ( b + c ) 3 operations 2 operations
Inference and the Inference and the distributive law distributive law ab + ac = a ( b + c ) distributive law 3 operations 2 operations save comutation by factoring the operations in disguise f ( x , y ) g ( y , z ) = g ( y , z ) f ( x , y ) ∑ x , y ∑ y ∑ x assuming ∣ V al ( X )∣ = ∣ V al ( Y )∣ = ∣ V al ( Z )∣ = d complexity: from to 3 2 O ( d ) O ( d )
Inference: Inference: back to back to example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ i +1 1 i i n Z Take 2: ~ objective ( x ) = ∑ x … ∑ x ( x , x ) … ϕ ( x , x ) p ϕ 1 1 2 n −1 n −1 m n 1 n −1 systematically apply the factorization: ~ ( x ) = ( x , x ) ( x , x ) … ( x , x ) ∑ x n ∑ x ∑ x p ϕ ϕ ϕ n −1 n −1 n −2 n −2 n −1 1 1 2 m n −1 n −2 1 complexity is instead of 2 O ( d ) n O ( nd )
Inference: Inference: example 2 example 2 p ( x , ˉ 6 ) 1 x p ( x ∣ ˉ 6 ) = Objective: x 1 p ( ˉ 6 ) x P ( X ∣ = ˉ 6 ) X x another way to write 1 6 (used in Jordan's textbook) calculate the numerator denominator is then easy p ( ˉ 6 ) = p ( x , ˉ 6 ) ∑ x x 1 x 1 source: Michael Jordan's book
Inference: Inference: example 2 example 2 3 O ( d ) source: Michael Jordan's book
Inference: Inference: example example p ( x , ˉ 6 ) 1 x 2 O ( d ) source: Michael Jordan's book
Inference: Inference: example example p ( x , ˉ 6 ) 1 x is constant 3 O ( d ) 2 O ( d )
Inference: Inference: example example overall complexity instead of O ( d ) 3 5 O ( d ) if we had built the 5d array of p ( x , x , x , x , x ∣ ˉ 6 ) x 1 2 3 4 5 in the general case O ( d ) n
Inference: Inference: example example (undirected version) (undirected version) 1 ∑ x p ( x , ˉ 6 ) = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x , x ) δ ( x , ˉ 6 ) 1 x 6 x 1 2 1 3 2 3 3 5 2 5 6 ,…, x Z 2 5 using a delta-function for conditioning { 1, if x = ˉ 6 x Text 6 ) ≜ δ ( x , ˉ 6 x 6 0, otherwise add it as a local potential
Inference: Inference: example example (undirected version) (undirected version) every step remains the same 1 ∑ x p ( x , ˉ 6 ) = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x , x ) δ ( x , ˉ 6 ) 1 x 6 x 1 2 1 3 2 3 3 5 2 5 6 ,…, x Z 2 5 1 ∑ x = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) m ( x , x ) 1 2 1 3 2 3 3 5 6 2 5 ,…, x Z 2 5 … 1 ∑ x = ϕ ( x , x ) … , m ( x ) ϕ ( x , x ) m ( x , x ) 2 ∑ x 1 2 4 1 3 5 2 3 Z 2 3 1 ∑ x = ϕ ( x , x ) … , m ( x ) m ( x , x ) 1 2 4 2 3 1 2 Z 2 1 = ( x ) m 2 1 Z except: in Bayes-nets Z=1 at this point normalization is easy!
Variable elimination Variable elimination input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K output: ( D ) ∑ x m ∏ k ϕ k k ,…, x i i 1 go over in some order: , … , x x i i 1 m collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ ∈ Scope [ ϕ ]} t t x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t marginalize out : ′ = x ∑ x ψ ψ i t t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the product of factors in Φ t = m
Variable elimination: Variable elimination: example example input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K 0 Φ = { p ( x ∣ ), p ( x ∣ ), p ( ˉ 6 ∣ , x ), p ( x ∣ ), p ( x ∣ )} x x x x x x 2 1 3 1 2 5 4 2 5 3 output: ( D ) ∑ x m ∏ k ϕ ,…, x k k i i 1
Recommend
More recommend