graphical models graphical models
play

Graphical Models Graphical Models Bayesian Networks Siamak - PowerPoint PPT Presentation

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on Previously on Probabilistic Graphical Models Probabilistic Graphical Models Probability distribution and density functions Random variable


  1. Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ?

  2. Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ?

  3. Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ?

  4. Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ? why?

  5. Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ? why? read from the graph?

  6. Conditional independencies (CI): Conditional independencies (CI): notation notation I ( P ) 1. set of all CIs of the distribution P 2. set of local CIs from the graph (DAG) ( G ) I ℓ I ( G ) 3. set of all ( global ) CIs from the graph

  7. Local conditional independencies ( Local conditional independencies (CI CIs) s) for any node X ⊥ ∣ X NonDescendents Parents i i X X i i ( G ) = { I D ⊥ I , S ℓ I ⊥ D G ⊥ S ∣ I , D S ⊥ G , L , D ∣ I L ⊥ D , I , S ∣ G } graph G

  8. Local CIs Local CIs from factorization from factorization use the factorized form P ( X ) = P ( X ∣ ) ∏ i Pa i X i to show ∀ X i P ( X , NonDesc ∣ ) = P ( X ∣ ) P ( NonDesc ∣ ) Pa Pa Pa i X X i X X X i i i i i which means ⊥ ∣ X NonDesc Pa i X X i i

  9. Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G )

  10. Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( G , S ∣ I ) = = P ( D , I , G , S , L ) ∑ d , g , s , l

  11. Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l

  12. Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( I ) P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( I ) P ( D ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l

  13. Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( I ) P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( I ) P ( D ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( S ∣ I ) P ( G ∣ I ) 1

  14. Factorization Factorization from local CIs from local CIs from local CI s ( G ) = { X ⊥ ∣ ∣ i } I NonDesc Pa ℓ i X X i i find a topological ordering (parents before children): , … , X X i i 1 n use the chain rule n P ( X ) = P ( X ) P ( X ∣ , … , X ) 1 ∏ j =2 X i i i i 1 j −1 j simplify using local CIs n P ( X ) = P ( X ) P ( X ∣ ) 1 ∏ j =2 Pa i i X j i j

  15. Factorization Factorization from local CIs: from local CIs: example example local CIs ( G ) = { I ( D ⊥ I , S ), ( I ⊥ D ) , ( G ⊥ S ∣ I ), ℓ ( S ⊥ G , L , D ∣ I ), ( L ⊥ D , I , S ∣ G ) } a topological ordering: D, I, G, L, S use the chain rule P ( D , I , G , S , L ) = P ( D ) P ( I ∣ D ) P ( G ∣ D , I ) P ( L ∣ D , I , G ) P ( S ∣ D , I , G , L ) simplify using ( G ) I ℓ P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( L ∣ G ) P ( S ∣ I )

  16. ⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to G

  17. ⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to ( G ) ⊆ I ( P ) I G ℓ

  18. ⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to ( G ) ⊆ I ( P ) I G ℓ is an I-map for P G it does not mislead us about independencies in P

  19. Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN

  20. Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN Example: { 1/12, if x ⊗ y ⊗ z = 0 p ( x , y , z ) = 1/6, if x ⊗ y ⊗ z = 1 ( X ⊥ Y ), ( Y ⊥ Z ), ( X ⊥ Z ) ∈ I ( P ) ( X ⊥ Y ∣ Z ), ( Y ⊥ Z ∣ Z ), ( X ⊥ Z ∣ Y ) ∈ / I ( P )

  21. Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN Example: { 1/12, if x ⊗ y ⊗ z = 0 p ( x , y , z ) = 1/6, if x ⊗ y ⊗ z = 1 ( X ⊥ Y ), ( Y ⊥ Z ), ( X ⊥ Z ) ∈ I ( P ) ( X ⊥ Y ∣ Z ), ( Y ⊥ Z ∣ Z ), ( X ⊥ Z ∣ Y ) ∈ / I ( P )

  22. Summary Summary so far so far simplification of the chain rule P ( X ) = P ( X ∣ ) ∏ i Pa i X i Bayes-net represented using a DAG naive Bayes local conditional independencies I = { X ⊥ ∣ ∣ i } NonDesc Pa i X X i i hold in a Bayes-net imply a Bayes-net Note: motivation is not just compressed representation, but faster inference and learning as well

  23. Global Global CIs CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs

  24. Global Global CIs CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs ⇒ factorized form of P global CIs ( G ) ⊆ I ( G ) ⊆ I ( P ) I ℓ

  25. Global CIs Global CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs ⇒ factorized form of P global CIs ( G ) ⊆ I ( G ) ⊆ I ( P ) I ℓ Example: C ⊥ D ∣ B , F ? algorithm: directed separation ( D-separation )

  26. Three canonical settings Three canonical settings for three random variables 1. causal / evidence trail Z Y X P ( X , Y , Z ) = P ( X ) P ( Y ∣ X ) P ( Z ∣ Y )

  27.  Three canonical settings Three canonical settings for three random variables 1. causal / evidence trail Z Y X P ( X , Y , Z ) = P ( X ) P ( Y ∣ X ) P ( Z ∣ Y ) marginal independence: P ( X , Z ) = P ( X ) P ( Z ) conditional Independence: P ( X , Y , Z ) P ( X ) P ( Y ∣ X ) P ( Z ∣ Y ) P ( Z ∣ X , Y ) = = = P ( Z ∣ Y ) P ( X , Y ) P ( X ) P ( Y ∣ X )

  28. Three canonical settings Three canonical settings 2. common cause Y Z X P ( X , Y , Z ) = P ( Y ) P ( X ∣ Y ) P ( Z ∣ Y )

  29.  Three canonical settings Three canonical settings 2. common cause Y Z X P ( X , Y , Z ) = P ( Y ) P ( X ∣ Y ) P ( Z ∣ Y ) marginal independence: P ( X , Z ) = P ( X ) P ( Z ) conditional independence: P ( X , Y , Z ) P ( X , Z ∣ Y ) = = P ( X ∣ Y ) P ( Z ∣ Y ) P ( Y )

  30. Three canonical settings Three canonical settings X 3. common effect Z a.k.a. collider, v-structure Y P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z )

  31. Three canonical settings Three canonical settings X 3. common effect Z a.k.a. collider, v-structure Y P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z ) marginal independence: P ( X , Z ) = P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z ) = P ( X ) P ( Z ) ∑ Y ∑ Y conditional independence: P ( X , Y , Z )  P ( X , Z ∣ Y ) = = P ( X ∣ Y ) P ( Z ∣ Y ) P ( Y )

  32.  Three canonical settings Three canonical settings 3. common effect X Z Y conditional Independence: w P ( X , Z ∣ W ) = P ( X ∣ W ) P ( Z ∣ W ) even observing a descendant of Y makes X, Z dependent

  33. Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X Y Z 1 2 1 1 2 X 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 Z 1

  34. Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X X Y Z 2 1 2 1 1 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 so far X ⊥ Y ∣ Z Z 1

  35. Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X Y Z 1 2 1 1 2 X 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 had we not observed Z 1 ( X , X ⊥ ∣ ) ∈ I ( G ) Y Z 1 2 1 2 Z 1

  36. D-seperation D-seperation (a.k.a. Bayes-Ball algorithm) X ⊥ Y ∣ Z ? See whether at least one ball from X reaches Y Z is shaded image from:https://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

  37. D-separation: D-separation: algorithm algorithm Linear time complexity input: graph G and X, Y, Z output: X ⊥ Y ∣ Z ? mark the variables in Z and all of their ancestors in G breadth-first-search starting from X stop any trail that reaches a blocked node a node in Y is reached? unmarked middle of a collider (V-structure) in Z and not a collider

  38. D-separation D-separation quiz quiz

  39. D-separation D-separation quiz quiz G ⊥ S ∣ ∅?

  40. D-separation D-separation quiz quiz G ⊥ S ∣ ∅?

  41. D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ?

  42. D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ?

  43. D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅?

  44. D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅?

  45. D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅? D , L ⊥ S ∣ I , G ?

  46. D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅? D , L ⊥ S ∣ I , G ?

  47. Summary Summary graph and distribution are combined: factorization of the distribution according to the graph P ( X ) = G P ( X ∣ ) ∏ i Pa i X i conditional independencies of the distribution inferred from the graph local CI: ⊥ ∣ X NonDescendents Parents i X X i i global CI: D-separation

  48. Summary Summary factorization of the distribution identify the same local conditional independencies family of distributions global conditional independencies

  49. Bonus slides Bonus slides

  50. Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) P factorizes on both of these graphs

  51. Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) P factorizes on both of these graphs From d-separation algorithm it is sufficient same undirected skeleton same v-structures

  52. Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) different v-structures, yet ′ I ( G ) = I ( G ) = ∅

  53. Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) different v-structures, yet ′ I ( G ) = I ( G ) = ∅ here, v-structures are irrelevant for I-equivalent because: parents are connected ( moral parents! )

  54. Equivalence class of DAGs Equivalence class of DAGs Two DAGs are I-equivalent if I ( G ) = I ( G ) ′ same undirected skeleton ′ I ( G ) = I ( G ) ⇔ same immoralities

  55. Equivalence class of DAGs Equivalence class of DAGs Two DAGs are I-equivalent if I ( G ) = I ( G ) ′ same undirected skeleton ′ I ( G ) = I ( G ) ⇔ same immoralities X X Y Y X ⊥ Y ∣ Z ? Z Z

  56. I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? X Y X Y W W Z Z

  57. I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? no! X Y X Y W W Z Z

  58. I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? no! X Y X Y W W Z Z X ⊥ Z ∣ W

  59. Minimal I-map Minimal I-map which graph G to use for P? G is minimal I-map for P: G is an I-map for P: I ( G ) ⊆ I ( P ) removing any edge destroys this property Example: P ( X , Y , Z , W ) = P ( X ∣ Y , Z ) P ( W ) P ( Y ∣ Z ) P ( Z ) Z Y Z Z Z Y Y Y X X X X W W W W I­MAP min. I­MAP NOT an I­MAP min. I­MAP

  60. Minimal I-map Minimal I-map from CI from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G X X X 1 i n for i=1...n find minimal s.t. U ⊆ { X U ∣ U ) , … , X } ( X ⊥ , … , X − X 1 i −1 1 i −1 i set U ← ⊥ ∣ Pa X NonDesc Pa X i X X i i i

  61. Minimal I-map from CI Minimal I-map from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G different orderings give different graphs

  62. Minimal I-map from CI Minimal I-map from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G different orderings give different graphs Example: L,S,G,I,D L,D,S,I,G D,I,S,G,L (a topological ordering)

  63. Perfect MAP ( Perfect MAP (P-MAP P-MAP) which graph G to use for P? L,D,S,I,G D,I,S,G,L L,S,G,I,D all the graphs above are minimal I-MAPs I ( G ) ⊆ I ( P ) Perfect MAP: I ( G ) = I ( P )

  64. Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN

Recommend


More recommend