probabilistic graphical models probabilistic graphical
play

Probabilistic Graphical Models Probabilistic Graphical Models - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak Ravanbakhsh Fall 2019 Learning objective Learning objective an intuition for inference in graphical models why is it difficult? exact inference by


  1. Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak Ravanbakhsh Fall 2019

  2. Learning objective Learning objective an intuition for inference in graphical models why is it difficult? exact inference by variable elimination

  3. Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n Introducing evidence leads to a similar problem P ( X = x , X = x ) P ( X = ∣ = ) = 1 1 m m x X x 1 1 m m P ( X = x ) m m

  4. Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n Introducing evidence leads to a similar problem P ( X = x , X = x ) P ( X = ∣ = ) = 1 1 m m x X x 1 1 m m P ( X = x ) m m MAP inference changes sum to max ∗ x = P ( X = x ) arg max x maximum a posteriori

  5. Probability query Probability query marginalization P ( X ) = ∑ x P ( X , X = , … , X = ) x x 1 1 2 2 ,…, x n n 2 n n = 2 X 1 representation: O (∣ V al ( X ) × V al ( X )∣) 1 2 inference: O (∣ V al ( X ) × V al ( X )∣) 1 2 X 2 P ( X ) 1

  6. Probability query Probability query marginalization P ( X ) = ∑ x P ( X , X = , … , X = ) x x 1 1 2 2 ,…, x n n 2 n n = 3 X 1 representation: X O (∣ V al ( X ) × V al ( X ) × V al ( X )∣) 3 1 2 3 inference: O (∣ V al ( X ) × V al ( X ) × V al ( X )∣) 1 2 3 X 2 P ( X ) 1

  7. Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n

  8. Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n can have a compact representation of P: Bayes-net or Markov net 1 ∏ i =1 e.g. has an representation n −1 p ( x ) = ( x , x ) O ( n ) ϕ i +1 i i Z

  9. Probability query Probability query marginalization P ( X ) = P ( X , X = , … , X = ) ∑ x x x 1 1 2 2 n n ,…, x 2 n complexity of representation & inference O ( ∣ V al ( X )∣) ∏ i i binary variables O (2 ) n can have a compact representation of P: Bayes-net or Markov net 1 ∏ i =1 e.g. has an representation n −1 p ( x ) = ( x , x ) O ( n ) ϕ i +1 i i Z efficient inference ?

  10. Complexity of inference Complexity of inference can we always avoid the exponential cost of inference? No! can we at least guarantee a good approximation? No! proof idea: reduce 3-SAT to inference in a graphical model despite this, graphical models are used for combinatorial optimization (why?)

  11. Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable

  12. Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable P ( X = x )

  13. Complexity of inference: Complexity of inference: proof proof given a BN, decide whether is NP-complete P ( X = x ) > 0 belongs to NP NP-hardness: answering this query >> solving 3-SAT SAT vars. SAT clauses X = 1 iff satisfiable given a BN, calculating is #P-complete P ( X = x )

  14. Complexity of Complexity of approximate approximate inference inference given a BN, approximating with a relative error is NP-hard ϵ P ( X = x ) ρ Proof: ρ > 0 ⇔ P ( X = 1) > 0 ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 1+ ϵ our approximation

  15. Complexity of Complexity of approximate approximate inference inference given a BN, approximating with an absolute error P ( X = x ∣ E = e ) ϵ for any is NP-hard 1 0 < ϵ < ρ (1 − ϵ ) ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 2

  16. Complexity of Complexity of approximate approximate inference inference given a BN, approximating with an absolute error P ( X = x ∣ E = e ) ϵ for any is NP-hard 1 0 < ϵ < ρ (1 − ϵ ) ≤ P ( X = x ) ≤ ρ (1 + ϵ ) 2 Proof: sequentially fix ∗ ∗ ∗ = arg max P ( Q = q ∣ ( Q , … , Q ) = ( q … q ), X = 1) q 1 i −1 q i 1 i −1 i 1 1 0 1 either > or > q q i 2 i 2 1 since this leads to a solution ϵ < 2

  17. so far... so far... reduce the representation-cost using a graph structure inference-cost is in the worst case exponential can we reduce it using the graph structure?

  18. Probability query: Probability query: example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ 1 i +1 n i i Z p ( x )? V al ( X ) = {1, … , d }∀ i n i Take 1: calculate n-dim. array p(x) O ( d ) n marginalize it p ( x ) p ( x ) = ∑ − x n n

  19. Inference: Inference: example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ 1 i +1 n i i Z p ( x )? n Take 2: ~ calculate ( x ) = … ( x , x ) … ϕ ( x , x ) ∑ x ∑ x p ϕ 1 1 2 n −1 n −1 m n 1 n −1 without building p ( x ) ~ ~ normalize it p ( x ) = ( x )/( ( x )) ∑ x p n p n n n idea : use the distributive law: ab + ac = a ( b + c ) 3 operations 2 operations

  20. Inference and the Inference and the distributive law distributive law ab + ac = a ( b + c ) distributive law 3 operations 2 operations save comutation by factoring the operations in disguise f ( x , y ) g ( y , z ) = g ( y , z ) f ( x , y ) ∑ x , y ∑ y ∑ x assuming ∣ V al ( X )∣ = ∣ V al ( Y )∣ = ∣ V al ( Z )∣ = d complexity: from to 3 2 O ( d ) O ( d )

  21. Inference: Inference: back to back to example example 1 ∏ i =1 n −1 p ( x ) = ( x , x ) x x ϕ i +1 1 i i n Z Take 2: ~ objective ( x ) = ∑ x … ∑ x ( x , x ) … ϕ ( x , x ) p ϕ 1 1 2 n −1 n −1 m n 1 n −1 systematically apply the factorization: ~ ( x ) = ( x , x ) ( x , x ) … ( x , x ) ∑ x n ∑ x ∑ x p ϕ ϕ ϕ n −1 n −1 n −2 n −2 n −1 1 1 2 m n −1 n −2 1 complexity is instead of 2 O ( d ) n O ( nd )

  22. Inference: Inference: example 2 example 2 p ( x , ˉ 6 ) 1 x p ( x ∣ ˉ 6 ) = Objective: x 1 p ( ˉ 6 ) x P ( X ∣ = ˉ 6 ) X x another way to write 1 6 (used in Jordan's textbook) calculate the numerator denominator is then easy p ( ˉ 6 ) = p ( x , ˉ 6 ) ∑ x x 1 x 1 source: Michael Jordan's book

  23. Inference: Inference: example 2 example 2 3 O ( d ) source: Michael Jordan's book

  24. Inference: Inference: example example p ( x , ˉ 6 ) 1 x 2 O ( d ) source: Michael Jordan's book

  25. Inference: Inference: example example p ( x , ˉ 6 ) 1 x is constant 3 O ( d ) 2 O ( d )

  26. Inference: Inference: example example overall complexity instead of O ( d ) 3 5 O ( d ) if we had built the 5d array of p ( x , x , x , x , x ∣ ˉ 6 ) x 1 2 3 4 5 in the general case O ( d ) n

  27. Inference: Inference: example example (undirected version) (undirected version) 1 ∑ x p ( x , ˉ 6 ) = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x , x ) δ ( x , ˉ 6 ) 1 x 6 x 1 2 1 3 2 3 3 5 2 5 6 ,…, x Z 2 5 using a delta-function for conditioning { 1, if x = ˉ 6 x Text 6 ) ≜ δ ( x , ˉ 6 x 6 0, otherwise add it as a local potential

  28. Inference: Inference: example example (undirected version) (undirected version) every step remains the same 1 ∑ x p ( x , ˉ 6 ) = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x , x ) δ ( x , ˉ 6 ) 1 x 6 x 1 2 1 3 2 3 3 5 2 5 6 ,…, x Z 2 5 1 ∑ x = ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) ϕ ( x , x ) m ( x , x ) 1 2 1 3 2 3 3 5 6 2 5 ,…, x Z 2 5 … 1 ∑ x = ϕ ( x , x ) … , m ( x ) ϕ ( x , x ) m ( x , x ) 2 ∑ x 1 2 4 1 3 5 2 3 Z 2 3 1 ∑ x = ϕ ( x , x ) … , m ( x ) m ( x , x ) 1 2 4 2 3 1 2 Z 2 1 = ( x ) m 2 1 Z except: in Bayes-nets Z=1 at this point normalization is easy!

  29. Variable elimination Variable elimination input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K output: ( D ) ∑ x m ∏ k ϕ k k ,…, x i i 1 go over in some order: , … , x x i i 1 m collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ ∈ Scope [ ϕ ]} t t x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t marginalize out : ′ = x ∑ x ψ ψ i t t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the product of factors in Φ t = m

  30. Variable elimination: Variable elimination: example example input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K 0 Φ = { p ( x ∣ ), p ( x ∣ ), p ( ˉ 6 ∣ , x ), p ( x ∣ ), p ( x ∣ )} x x x x x x 2 1 3 1 2 5 4 2 5 3 output: ( D ) ∑ x m ∏ k ϕ ,…, x k k i i 1

Recommend


More recommend