Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ?
Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ?
Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ?
Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ? why?
Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ? why? read from the graph?
Conditional independencies (CI): Conditional independencies (CI): notation notation I ( P ) 1. set of all CIs of the distribution P 2. set of local CIs from the graph (DAG) ( G ) I ℓ I ( G ) 3. set of all ( global ) CIs from the graph
Local conditional independencies ( Local conditional independencies (CI CIs) s) for any node X ⊥ ∣ X NonDescendents Parents i i X X i i ( G ) = { I D ⊥ I , S ℓ I ⊥ D G ⊥ S ∣ I , D S ⊥ G , L , D ∣ I L ⊥ D , I , S ∣ G } graph G
Local CIs Local CIs from factorization from factorization use the factorized form P ( X ) = P ( X ∣ ) ∏ i Pa i X i to show ∀ X i P ( X , NonDesc ∣ ) = P ( X ∣ ) P ( NonDesc ∣ ) Pa Pa Pa i X X i X X X i i i i i which means ⊥ ∣ X NonDesc Pa i X X i i
Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G )
Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( G , S ∣ I ) = = P ( D , I , G , S , L ) ∑ d , g , s , l
Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l
Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( I ) P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( I ) P ( D ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l
Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( I ) P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( I ) P ( D ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( S ∣ I ) P ( G ∣ I ) 1
Factorization Factorization from local CIs from local CIs from local CI s ( G ) = { X ⊥ ∣ ∣ i } I NonDesc Pa ℓ i X X i i find a topological ordering (parents before children): , … , X X i i 1 n use the chain rule n P ( X ) = P ( X ) P ( X ∣ , … , X ) 1 ∏ j =2 X i i i i 1 j −1 j simplify using local CIs n P ( X ) = P ( X ) P ( X ∣ ) 1 ∏ j =2 Pa i i X j i j
Factorization Factorization from local CIs: from local CIs: example example local CIs ( G ) = { I ( D ⊥ I , S ), ( I ⊥ D ) , ( G ⊥ S ∣ I ), ℓ ( S ⊥ G , L , D ∣ I ), ( L ⊥ D , I , S ∣ G ) } a topological ordering: D, I, G, L, S use the chain rule P ( D , I , G , S , L ) = P ( D ) P ( I ∣ D ) P ( G ∣ D , I ) P ( L ∣ D , I , G ) P ( S ∣ D , I , G , L ) simplify using ( G ) I ℓ P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( L ∣ G ) P ( S ∣ I )
⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to G
⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to ( G ) ⊆ I ( P ) I G ℓ
⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to ( G ) ⊆ I ( P ) I G ℓ is an I-map for P G it does not mislead us about independencies in P
Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN
Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN Example: { 1/12, if x ⊗ y ⊗ z = 0 p ( x , y , z ) = 1/6, if x ⊗ y ⊗ z = 1 ( X ⊥ Y ), ( Y ⊥ Z ), ( X ⊥ Z ) ∈ I ( P ) ( X ⊥ Y ∣ Z ), ( Y ⊥ Z ∣ Z ), ( X ⊥ Z ∣ Y ) ∈ / I ( P )
Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN Example: { 1/12, if x ⊗ y ⊗ z = 0 p ( x , y , z ) = 1/6, if x ⊗ y ⊗ z = 1 ( X ⊥ Y ), ( Y ⊥ Z ), ( X ⊥ Z ) ∈ I ( P ) ( X ⊥ Y ∣ Z ), ( Y ⊥ Z ∣ Z ), ( X ⊥ Z ∣ Y ) ∈ / I ( P )
Summary Summary so far so far simplification of the chain rule P ( X ) = P ( X ∣ ) ∏ i Pa i X i Bayes-net represented using a DAG naive Bayes local conditional independencies I = { X ⊥ ∣ ∣ i } NonDesc Pa i X X i i hold in a Bayes-net imply a Bayes-net Note: motivation is not just compressed representation, but faster inference and learning as well
Global Global CIs CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs
Global Global CIs CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs ⇒ factorized form of P global CIs ( G ) ⊆ I ( G ) ⊆ I ( P ) I ℓ
Global CIs Global CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs ⇒ factorized form of P global CIs ( G ) ⊆ I ( G ) ⊆ I ( P ) I ℓ Example: C ⊥ D ∣ B , F ? algorithm: directed separation ( D-separation )
Three canonical settings Three canonical settings for three random variables 1. causal / evidence trail Z Y X P ( X , Y , Z ) = P ( X ) P ( Y ∣ X ) P ( Z ∣ Y )
Three canonical settings Three canonical settings for three random variables 1. causal / evidence trail Z Y X P ( X , Y , Z ) = P ( X ) P ( Y ∣ X ) P ( Z ∣ Y ) marginal independence: P ( X , Z ) = P ( X ) P ( Z ) conditional Independence: P ( X , Y , Z ) P ( X ) P ( Y ∣ X ) P ( Z ∣ Y ) P ( Z ∣ X , Y ) = = = P ( Z ∣ Y ) P ( X , Y ) P ( X ) P ( Y ∣ X )
Three canonical settings Three canonical settings 2. common cause Y Z X P ( X , Y , Z ) = P ( Y ) P ( X ∣ Y ) P ( Z ∣ Y )
Three canonical settings Three canonical settings 2. common cause Y Z X P ( X , Y , Z ) = P ( Y ) P ( X ∣ Y ) P ( Z ∣ Y ) marginal independence: P ( X , Z ) = P ( X ) P ( Z ) conditional independence: P ( X , Y , Z ) P ( X , Z ∣ Y ) = = P ( X ∣ Y ) P ( Z ∣ Y ) P ( Y )
Three canonical settings Three canonical settings X 3. common effect Z a.k.a. collider, v-structure Y P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z )
Three canonical settings Three canonical settings X 3. common effect Z a.k.a. collider, v-structure Y P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z ) marginal independence: P ( X , Z ) = P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z ) = P ( X ) P ( Z ) ∑ Y ∑ Y conditional independence: P ( X , Y , Z ) P ( X , Z ∣ Y ) = = P ( X ∣ Y ) P ( Z ∣ Y ) P ( Y )
Three canonical settings Three canonical settings 3. common effect X Z Y conditional Independence: w P ( X , Z ∣ W ) = P ( X ∣ W ) P ( Z ∣ W ) even observing a descendant of Y makes X, Z dependent
Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X Y Z 1 2 1 1 2 X 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 Z 1
Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X X Y Z 2 1 2 1 1 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 so far X ⊥ Y ∣ Z Z 1
Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X Y Z 1 2 1 1 2 X 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 had we not observed Z 1 ( X , X ⊥ ∣ ) ∈ I ( G ) Y Z 1 2 1 2 Z 1
D-seperation D-seperation (a.k.a. Bayes-Ball algorithm) X ⊥ Y ∣ Z ? See whether at least one ball from X reaches Y Z is shaded image from:https://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html
D-separation: D-separation: algorithm algorithm Linear time complexity input: graph G and X, Y, Z output: X ⊥ Y ∣ Z ? mark the variables in Z and all of their ancestors in G breadth-first-search starting from X stop any trail that reaches a blocked node a node in Y is reached? unmarked middle of a collider (V-structure) in Z and not a collider
D-separation D-separation quiz quiz
D-separation D-separation quiz quiz G ⊥ S ∣ ∅?
D-separation D-separation quiz quiz G ⊥ S ∣ ∅?
D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ?
D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ?
D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅?
D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅?
D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅? D , L ⊥ S ∣ I , G ?
D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅? D , L ⊥ S ∣ I , G ?
Summary Summary graph and distribution are combined: factorization of the distribution according to the graph P ( X ) = G P ( X ∣ ) ∏ i Pa i X i conditional independencies of the distribution inferred from the graph local CI: ⊥ ∣ X NonDescendents Parents i X X i i global CI: D-separation
Summary Summary factorization of the distribution identify the same local conditional independencies family of distributions global conditional independencies
Bonus slides Bonus slides
Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) P factorizes on both of these graphs
Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) P factorizes on both of these graphs From d-separation algorithm it is sufficient same undirected skeleton same v-structures
Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) different v-structures, yet ′ I ( G ) = I ( G ) = ∅
Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) different v-structures, yet ′ I ( G ) = I ( G ) = ∅ here, v-structures are irrelevant for I-equivalent because: parents are connected ( moral parents! )
Equivalence class of DAGs Equivalence class of DAGs Two DAGs are I-equivalent if I ( G ) = I ( G ) ′ same undirected skeleton ′ I ( G ) = I ( G ) ⇔ same immoralities
Equivalence class of DAGs Equivalence class of DAGs Two DAGs are I-equivalent if I ( G ) = I ( G ) ′ same undirected skeleton ′ I ( G ) = I ( G ) ⇔ same immoralities X X Y Y X ⊥ Y ∣ Z ? Z Z
I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? X Y X Y W W Z Z
I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? no! X Y X Y W W Z Z
I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? no! X Y X Y W W Z Z X ⊥ Z ∣ W
Minimal I-map Minimal I-map which graph G to use for P? G is minimal I-map for P: G is an I-map for P: I ( G ) ⊆ I ( P ) removing any edge destroys this property Example: P ( X , Y , Z , W ) = P ( X ∣ Y , Z ) P ( W ) P ( Y ∣ Z ) P ( Z ) Z Y Z Z Z Y Y Y X X X X W W W W IMAP min. IMAP NOT an IMAP min. IMAP
Minimal I-map Minimal I-map from CI from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G X X X 1 i n for i=1...n find minimal s.t. U ⊆ { X U ∣ U ) , … , X } ( X ⊥ , … , X − X 1 i −1 1 i −1 i set U ← ⊥ ∣ Pa X NonDesc Pa X i X X i i i
Minimal I-map from CI Minimal I-map from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G different orderings give different graphs
Minimal I-map from CI Minimal I-map from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G different orderings give different graphs Example: L,S,G,I,D L,D,S,I,G D,I,S,G,L (a topological ordering)
Perfect MAP ( Perfect MAP (P-MAP P-MAP) which graph G to use for P? L,D,S,I,G D,I,S,G,L L,S,G,I,D all the graphs above are minimal I-MAPs I ( G ) ⊆ I ( P ) Perfect MAP: I ( G ) = I ( P )
Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN
Recommend
More recommend