probabilistic graphical models probabilistic graphical
play

Probabilistic Graphical Models Probabilistic Graphical Models - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019 Siamak Ravanbakhsh Learning objectives Learning objectives Markov networks: how it represents a prob. dist. independence assumptions


  1. Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019 Siamak Ravanbakhsh

  2. Learning objectives Learning objectives Markov networks: how it represents a prob. dist. independence assumptions Hammersley-Clifford theorem factorization representations: factor-graph log-linear models

  3. Challenge Challenge Given the following set of CIs draw their DAG I ( P ) = {( A ⊥ C ∣ B , D ), ( D ⊥ B ∣ A , C )} A A OR ? D B D B C C

  4. Challenge Challenge Given the following set of CIs draw their DAG I ( P ) = {( A ⊥ C ∣ B , D ), ( D ⊥ B ∣ A , C )} A A A ? D B D B OR D B OR C C C

  5. Challenge Challenge Given the following set of CIs draw their DAG I ( P ) = {( A ⊥ C ∣ B , D ), ( D ⊥ B ∣ A , C )} a DAG cannot be a P-map for P an undirected model can!

  6. Challenge Challenge Given the following set of CIs draw their DAG I ( P ) = {( A ⊥ C ∣ B , D ), ( D ⊥ B ∣ A , C )} a DAG cannot be a P-map for P A an undirected model can! D B C

  7. Motivation Motivation Statistical physics: Ising model of ferromagnetism CIs are naturally expressed using an undirected model Image: https://web.stanford.edu/~peastman/statmech/phasetransitions.html

  8. Motivation Motivation Social sciences CIs are naturally expressed using an undirected model

  9. Motivation Motivation Combinatorial problems CIs are naturally expressed using an undirected model Graph coloring

  10. Factorization Factorization in Markov networks in Markov networks 1 P ( A , B , C , D ) = ( A , B ) ϕ ( B , C ) ϕ ( C , D ) ϕ ( A , D ) ϕ 1 2 3 4 Z A ϕ ϕ Z = ( a , b ) ϕ ( b , c ) ϕ ( c , d ) ϕ ( a , d ) 4 1 ∑ a , b , c , d ϕ 1 2 3 4 D B is a normalization constant ( partition function ) ϕ ϕ 2 3 C is called a factor ( potential ) : V al ( A , B ) → [0, +∞) ϕ 1

  11. MRF: MRF:Conditional Independencies Conditional Independencies 1 P ( A , B , C , D ) = ( A , B ) ϕ ( B , C ) ϕ ( C , D ) ϕ ( A , D ) ( Z ) ϕ 1 2 3 4 A ϕ ϕ f ( B , A , C ) g ( D , A , C ) 4 1 assignment (?) D B P ⊨ ( B ⊥ D ∣ A , C ) ϕ ϕ 2 3 C 1 P ( A , B , C , D ) = ( A , B ) ϕ ( A , D ) ϕ ( C , D ) ϕ ( B , C ) ( Z ) ϕ 1 2 3 4 P ⊨ ( A ⊥ C ∣ B , D )

  12. Product of factors Product of factors 1 P ( A , B , C , D ) = ( A , B ) ϕ ( B , C ) ϕ ( C , D ) ϕ ( A , D ) ϕ 1 2 3 4 Z ψ ( A , B , C ) : V al ( A , B , C ) → ℜ + A V al ( A , B ) → ℜ + : ϕ ϕ ϕ 1 4 1 V al ( B , C ) → ℜ + : ϕ 2 D B ϕ ϕ 2 3 C similar to a 3D tensor V al ( A ) × V al ( B ) × V al ( C )

  13. Q: Do factors represent marginals? 1 Simplified example: P ( A , B , C ) = ( A , B ) ϕ ( B , C ) ϕ 1 2 Z P ( A , B , C ) × Z Z = .25 + .35 + … = 1.55 Marginal probabilities: ϕ 1 1 1 P ( a , b ) = (.25 + .35)/ Z ≈ .38 ϕ 2 1 2 P ( a , b ) = (.08 + .16)/ Z ≈ .15 Compare to ϕ 1 1 1 ( a , b ) = .5 ϕ 1 1 2 ( a , b ) = .8 ϕ 1

  14. Factorization: Factorization: general form general form P factorizes over the cliques 1 ∏ k P ( X ) = ( D ) ϕ k k Z Gibbs distribution Can always convert to factorization over maximal cliques

  15. Factorization: Factorization: general form general form P factorizes over cliques 1 ∏ k P ( X ) = ( D ) ϕ k k Z Rewrite as factorization over maximal cliques original form of P P ( A , B , C , D ) = ϕ ( A , B ) ϕ ( A , D ) ϕ ( B , D ) ϕ ( C , D ) ϕ ( B , C ) 1 2 3 4 5 factorized over cliques P ( A , B , C , D ) = ψ ( A , B , C ) ψ ( B , C , D ) 1 2

  16. Factorized form: Factorized form: directed vs undirected directed vs undirected Bayesian Networks: Markov Networks: 1 ∏ k P ( X ) = ( D P ( X ) = ) P ( X ∣ ) ∏ k ϕ Pa k k i X Z i No partition function Each factor is a cond. distribution One factor per variable

  17. Conditioning on the Conditioning on the evidence evidence given , how to obtain P ( X ∣ U = u )? P ( X )∝ ( D ) ∏ k ϕ k k fix the evidence in the relevant factors P ( X ∣ U = u ) ∝ [ U = u ] ∏ k ϕ k reduced factor ( A , B , C ) c ] conditioned on C = c 1 [ C = k ϕ k ϕ

  18. Conditioning on the Conditioning on the evidence evidence effect on the graphical model cannot create new dependencies compare this to colliders in Bayes-nets S = s G = g

  19. Pairwise Pairwise conditional independencies conditional independencies Non-adjacent nodes are independent given everything else X ⊥ Y ∣ X − { X , Y } X Y

  20. Local Local conditional Independencies conditional Independencies : Markov blanket of node X in graph H MB ( X ) H H MB H X ⊥ X − X − MB ( X ) ∣ X Given its Markov blanket X is independent of every other variable

  21. Local conditional Independencies Local conditional Independencies : Markov blanket of X in graph H H MB ( X ) X H MB H X ⊥ X − X − MB ( X ) ∣ : Markov blanket of X in DAG G G MB ( X ) X Parents Children Parents of children G MB G X ⊥ X − X − MB ( X ) ∣

  22. Global Global conditional Independencies conditional Independencies X ⊥ Y ∣ Z iff every path between X and Y is blocked by Z much simpler than D-separation Y X Z

  23. Relationship between the three Relationship between the three ⇐ ⇐ pairwise local I global I I ℓ p X X X ′ ( X ⊥ Y ∣ Z ) ′′ ( X ⊥ Y ∣ Z ) ( X ⊥ Y ∣ Z ) Y Y Y

  24. Relationship between the three Relationship between the three ⇐ ⇐ pairwise local I global I I ℓ p X X X ′ ( X ⊥ Y ∣ Z ) ′′ ( X ⊥ Y ∣ Z ) ( X ⊥ Y ∣ Z ) Y Y Y ⇒ ⇒ local global I I P>0: pairwise I ℓ p

  25. Factorization Factorization & independence & independence (same family of distributions) Recall this relationship in Bayesian Networks : Equivalent Factorization according to a DAG Local & global CIs Is it similar for Markov Networks ? Equivalent? Factorization according to an undirected graph Pairwise, local & global CIs

  26. Factorization & Independence Factorization & Independence Is it similar for Markov Networks ? Factorization according to an undirected graph Pairwise, local & global CIs Short answer: for positive distributions they are equivalent

  27. ⇒ Factorization Factorization CI CI given does local CI hold? P ( X ) ∝ ( C ∏ k ) ϕ k k X i

  28. ⇒ Factorization Factorization CI CI given does local CI hold? P ( X ) ∝ ( C ∏ k ) ϕ k k proof X i

  29. ⇒ Factorization Factorization CI CI given does local CI hold? P ( X ) ∝ ( C ∏ k ) ϕ k k proof P ( X ) ∝ ( C ) = ( C ( C ) ) ∏ k ∏ C ∏ C ϕ ϕ ϕ k k ∈ MB ( X ) k k k / ∈ MB ( X ) k k k i i = f ( X , MB ( X )) g ( X − ) ⇒ X X i i i i ⊥ X − MB ( X H ) − ∣ MB ( X H ) X X i i i i

  30. ⇒ CI CI factorization factorization Hammersely-Clifford theorem: If P is strictly positive satisfying CI I ( H ) then P factorizes over H proof needs canonical parametrization

  31. Parametrization: Parametrization: redundancy redundancy is this representation of P unique? b 0 b 1 b 0 b 1 P ( A , B , C ) ∝ ϕ ( A , B ) ϕ ( B , C ) ϕ ( C , A ) 1 2 3 a 0 c 0 1 2 1 1 B a 1 c 1 2 1 2 2 C A c 0 c 1 a 0 4 8 a 1 16 1

  32. Parametrization: Parametrization: redundancy redundancy is this representation of P unique? 1 P ( A , B , C ) = ( A , B ) ϕ ( B , C ) ϕ ( C , A ) b 0 b 1 b 0 b 1 ϕ 1 2 3 Z a 0 c 0 10 20 10 10 B a 1 c 1 20 10 20 20 multiplying all factors by a constant C A only affects Z c 0 c 1 a 0 .4 .8 a 1 1.6 .1

  33. Parametrization: Parametrization: redundancy redundancy is this representation of P unique? log-values 1 P ( A , B , C ) = ( A , B ) ϕ ( B , C ) ϕ ( C , A ) b 0 b 1 b 0 b 1 ϕ 1 2 3 Z a 0 c 0 0 1 0 0 B a 1 c 1 1 0 1 1 use the logarithmic form C A 1 ( ψ ( A , B )+ ψ ( B , C )+ ψ ( C , A )) P ( A , B , C ) = 2 1 2 3 c 0 c 1 Z a 0 2 3 a 1 4 0

  34. Parametrization: Parametrization: redundancy redundancy b 1 b 0 Is this representation of P unique? 0 0 log-values 1 P ( A , B , C ) = ϕ ( A , B ) ϕ ( B , C ) ϕ ( C , A ) ϕ ( B ) ϕ ( A ) ϕ ( C ) 1 2 3 4 5 6 b 0 b 1 b 0 b 1 Z a 0 c 0 0 1 0 0 B use the logarithmic form a 1 c 1 1 0 0 0 1 ( ψ ( A , B )+ ψ ( B , C )+ ψ ( C , A )) P ( A , B , C ) = 2 1 2 3 Z C A a 1 c 1 a 0 c 0 c 0 c 1 0 1 2 0 simplify using local potentials a 0 0 1 1 ( ψ ( A , B )+ ψ ( B , C )+ ψ ( C , A )+ ψ ( A )+ ψ ( B )+ ψ ( C )) P ( A , B , C ) = 2 1 2 3 4 5 6 Z a 1 4 0

  35. Parametrization: Parametrization: redundancy redundancy is this representation of P unique? log-values 1 P ( A , B , C ) = ( A , B ) ϕ ( B , C ) ϕ ( C , A ) ϕ ( A ) ϕ ( C ) b 0 b 1 ϕ 1 2 3 5 6 Z a 0 0 1 B a 1 use the logarithmic form 1 0 c 1 c 0 1 ( ψ ( A , B )+ ψ ( B , C )+ ψ ( C , A )) P ( A , B , C ) = 2 1 2 3 Z 0 1 C A a 1 a 0 c 0 c 1 2 0 simplify using local potentials a 0 0 1 1 ( ψ ( A , B )+ ψ ( B , C )+ ψ ( C , A )+ ψ ( A )+ ψ ( B )+ ψ ( C )) P ( A , B , C ) = 2 1 2 3 4 5 6 Z a 1 4 0

Recommend


More recommend