directed graphical models

Directed Graphical Models: Bayesian Networks Probabilistic - PowerPoint PPT Presentation

Directed Graphical Models: Bayesian Networks Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2018 Basics Multivariate distributions with large number of variables Independency assumptions are useful

  1. Directed Graphical Models: Bayesian Networks Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2018

  2. Basics  Multivariate distributions with large number of variables  Independency assumptions are useful  Independence and conditional independence relationships simplify representation and alleviate inference complexities  Bayesian networks enable us to i ncorporate domain knowledge and structures  Modular combination of heterogeneous parts  Combining data and knowledge (Bayesian philosophy) 2

  3. Conditional and marginal independence  𝑌 and 𝑍 are conditionally independent given 𝑎 if: 𝑌 ⊥ 𝑍|𝑎 𝑄 𝑌 𝑍, 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑌, 𝑍 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑍 𝑎 𝑄 𝑍 𝑌, 𝑎 = 𝑄 𝑍 𝑎 ∀𝑦 ∈ 𝑊𝑏𝑚 𝑌 , 𝑧 ∈ 𝑊𝑏𝑚 𝑍 , 𝑨 ∈ 𝑊𝑏𝑚 𝑎 𝑄 𝑌 = 𝑦, 𝑍 = 𝑧 𝑎 = 𝑨 = 𝑄 𝑌 = 𝑦 𝑎 = 𝑨 𝑄 𝑍 = 𝑧 𝑎 = 𝑨  𝑌 and 𝑍 are marginal independent if: 𝑌 ⊥ 𝑍|∅ 𝑄 𝑌 𝑍 = 𝑄(𝑌) 𝑄 𝑌, 𝑍 = 𝑄 𝑌 𝑄(𝑍) 𝑄 𝑍 𝑌 = 𝑄(𝑍) 3

  4. Bayesian network definition  Bayesian Network  Qualitative specification by a Directed Acyclic Graph (DAG)  Each node denotes a random variable  Edges denote dependencies  𝑌 → 𝑍 shows a " direct influence “ of 𝑌 on 𝑍 ( 𝑌 is a parent of 𝑍 )  Quantitative specification by CPDs  CPD for each node 𝑌 𝑗 defines 𝑄(𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 ))  Bayesian Network represents a joint distribution over variables (via DAG and CPDs) compactly in a factorized way: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄 (𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) 𝑗=1 4

  5. Burglary example John do not perceive minor earthquakes John do not perceive burglaries directly 5

  6. Burglary example  Bayesian networks define joint distribution (over the variables) in terms of the graph structure and conditional probability distributions 𝑄 𝐶, 𝐹, 𝐵, 𝐾, 𝑁 = 𝑄 𝐶 𝑄 𝐹 𝑄 𝐵 𝐶, 𝐹 𝑄 𝐾 𝐵 𝑄(𝑁|𝐵) 6

  7. Burglary example: DAG + CPTs 𝑄(𝐵 = 𝑢|𝐶, 𝐹) CPDs as quantitative specification 𝑄(𝐾 = 𝑢|𝐵) 𝑄(𝑁 = 𝑢|𝐵) 7

  8. Burglary example: full joint probability  𝑄 𝐾, 𝑁, 𝐵, 𝐶, 𝐹 = 𝑄(𝐾|𝐵) 𝑄(𝑁|𝐵) 𝑄(𝐵|𝐶, 𝐹) 𝑄 (𝐶) 𝑄 (𝐹)  𝑄 𝐾 = 𝑢, 𝑁 = 𝑢, 𝐵 = 𝑢, 𝐶 = 𝑔, 𝐹 = 𝑔 =  𝑄(𝐾 = 𝑢|𝐵 = 𝑢) 𝑄(𝑁 = 𝑢|𝐵 = 𝑢) 𝑄(𝐵 = 𝑢|𝐶 = 𝑔, 𝐹 = 𝑔) 𝑄 (𝐶 = 𝑔) 𝑄 (𝐹 = 𝑔)  = 0.9 × 0.7 × 0.001 × 0.999 × 0.998 = 0.000628 Short-hands 𝐾 = 𝑢: 𝐾𝑝ℎ𝑜𝐷𝑏𝑚𝑚𝑡 = 𝑈𝑠𝑣𝑓 𝐶 = 𝑔: 𝐶𝑣𝑠𝑕𝑚𝑏𝑠𝑧 = 𝐺𝑏𝑚𝑡𝑓 … 8

  9. Burglary example: inference  Conditional probability distribution: 𝑄(𝐾=𝑢,𝑁=𝑔,𝐶=𝑢)  𝑄(𝐶 = 𝑢|𝐾 = 𝑢, 𝑁 = 𝑔) = 𝑄(𝐾=𝑢,𝑁=𝑔) 𝐵 𝐹 𝑄(𝐾=𝑢,𝑁=𝑔,𝐵,𝐶,𝐹) = 𝐶 𝐵 𝐹 𝑄(𝐾=𝑢,𝑁=𝑔,𝐵,𝐶,𝐹) 9

  10. Student example 𝑄(𝐸 = 𝑢) 𝑄(𝐽 = 𝑢) Intelligence Difficulty 0.65 0.55 𝑄(𝐻|𝐽, 𝐸) 𝐽 𝐸 Grade 𝐻 = 1 𝐻 = 2 𝐻 = 3 SAT 𝑔 𝑔 0.3 0.4 0.3 𝐽 𝑄(𝑇 = 1|𝐽) 𝑔 𝑢 0.05 0.25 0.7 𝑔 0.1 𝑢 𝑔 0.9 0.08 0.02 Letter 𝑢 0.7 𝑢 𝑢 0.5 0.3 0.2 𝐻 𝑄(𝑀 = 𝑢|𝐻) 1 0.9 2 0.5 3 0.05 10

  11. Continuous variables example  Linear Gaussian 𝑌~𝑂(0,1) 𝑌 𝑍|𝑌 ~ 𝑂(𝑐 + 𝑌, 𝜏) 𝑞(𝑧|𝑦) 𝑍 𝐶 𝐵 𝑧 𝑦 𝑐 = 0.5 𝜏 = 0.1 11

  12. Missing edges  The joint distribution is represented by the chain rule generally: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄(𝑌 1 ) 𝑄(𝑌 𝑗 |𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=2  Equivalent to a graph in which all 𝑌 1 , … , 𝑌 𝑗−1 are parents of 𝑌 𝑗  Missing edges imply conditional independencies.  If we use a DAG that is not complete:  we remove some links, some of the conditioned variables are missing 12

  13. Compact representation  A CPT for a Boolean variable with k Boolean parents requires:  2 𝑙 rows: different combinations of parent values  𝑙 = 0 : one row showing the prior probability  If each variable has no more than 𝑙 parents  Full joint distribution requires 2 𝑜 − 1 numbers  Bayesian network requires at most 𝑜 × 2 𝑙 numbers (linear with 𝑜 )  ⇒ Exponential reduction in number of parameters 13

  14. Bayesian network semantics  Local independencies :  Each node is conditionally independent of its non-descendants given its parents 𝑌 𝑗 ⊥ Non_Descendants 𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )  Are local independencies all of the conditional independencies implied by a BN? 14

  15. Factorization & independence  Let 𝐻 be a graph over 𝑌 1 , … , 𝑌 𝑜 , distribution 𝑄 factorizes over 𝐻 if: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄 (𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) 𝑗=1  Factorization ⇒ Independence  If 𝑄 factorizes over 𝐻 , then any variable in 𝑄 is independent of its non- descendants given its parents (in the graph 𝐻 )  Factorization according to 𝐻 implies the associated conditional independencies.  Independence ⇒ Factorization  If any variable in the distribution 𝑄 is independent of its non-descendants given its parents (in the graph 𝐻 ) then 𝑄 factorizes over 𝐻  Conditional independencies imply factorization of the joint distribution (into a product of simpler terms) 15

  16. Independence ⇒ factorization  Consider the chain rule: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄(𝑌 𝑗 |𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=1  We can simplify it through conditional independencies assumptions  Given using 𝑌 𝑗 ⫫ Non_Descendants 𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 ) we can show 𝑄 𝑌 𝑗 𝑌 1 , 𝑌 2 , … , 𝑌 𝑗−1 ) = 𝑄(𝑌 𝑗 | 𝑄𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 𝑗 )) 16

  17. Equivalence Theorem  For a graph G: • Let D1 denote the family of all distributions that satisfy conditional independencies of G • Let D2 denote the family of all distributions that factor according to G • ⇒ D1 ≡ D2. 17

  18. Other independencies  Are there other independences that hold for every distribution 𝑄 that factorizes over 𝐻 ?  According to the graphical criterion called D-separation, we can find independencies from the graph  If 𝑄 factorizes over 𝐻 , can we read these independencies from the structure of 𝐻 ? 18

  19. Basic structures  𝑌 ⊥ 𝑍|𝑎 X Z Y  𝑌 ⊥ 𝑍|𝑎 Z X Y X Y  𝑌 ⊥ 𝑍 Z Explaining away 19

  20. Explaining away  When we condition on 𝑎 are 𝑌 and 𝑍 are independent? X Y Z 𝑄 𝑌, 𝑍, 𝑎 = 𝑄 𝑌 𝑄 𝑍 𝑄(𝑎|𝑌, 𝑍)  𝑌 and 𝑍 are marginally independent but given 𝑎 they are conditionally dependent  This is called explaining away  Two coins example 20

  21. D-separation  Let 𝐵, 𝐶, 𝐷 denote three disjoint sets of nodes, 𝐵 is d- separated from 𝐶 by 𝐷 iff 𝑩 ⊥ 𝑪|𝑫  𝐵 is d-separated from 𝐶 by 𝐷 if all undirected paths between 𝐵 and 𝐶 are blocked by 𝐷 21

  22. Undirected path blocking  Head-to-tail at a node 𝑎 ∈ 𝐷 Y X Z 𝑍 ∈ 𝐶 𝑎 ∈ 𝐷 𝑌 ∈ 𝐵  Tail-to-tail at a node 𝑎 ∈ 𝐷 Y X Z 𝑍 ∈ 𝐶 𝑎 ∈ 𝐷 𝑌 ∈ 𝐵  Head-to-head (i.e., v-structure) at a node 𝑎 ( 𝑎 ∉ 𝐷 & none of its descendants are in 𝐷 ) Y X Z 𝑍 ∈ 𝐶 𝑌 ∈ 𝐵 22

  23. Undirected path blocking 𝐵 𝐷 𝐶 … … In all trails (undirected paths) between A and B: • A node in the path is in 𝐷 and … … the path at the node do not meet head-to-head. … … Or a head-to-head node in the • path, and neither the node, nor … any of its descendants, is in C … 𝐵 ⊥ 𝐶|𝐷 23

  24. D-separation: active trail view  Definition: 𝑌 and 𝑍 are d-separated in 𝐻 given 𝑎 if there is no active trail in 𝐻 between 𝑌 and 𝑍 given 𝑎  A trail between 𝑌 and 𝑍 is active :  for any v-structure node 𝑉 in the trail 𝑌 … ⟶ 𝑉 ⟵ ⋯ 𝑍 , either 𝑉 or one of its descendants are in 𝑎  other nodes in this trail are not in 𝑎 24

  25. D-separation: example 𝑆⊥𝐻|𝐽 Intelligence Difficulty 𝑆⊥𝐸|𝐽 𝑆 ⊥ 𝐸|𝐻 Grade Rank 𝑆 ⊥ 𝐸|𝑀 𝑆 ⊥ 𝑀|𝐻 Letter 𝐸 ⊥ 𝑀|𝐻 25

  26. Markov Blanket in Bayesian Network  A variable is conditionally independent of all other variables given its Markov blanket  Markov blanket of a node:  All parents  Children  Co-parents of children 26

  27. D-Separation: soundness & completeness  Soundness : Any conditional independence properties that we can derive from 𝐻 should hold for the probability distribution that factorize over 𝐻  Theorem : If 𝑄 factorizes over 𝐻 , and d-sep G (𝒀, 𝒁|𝒂) then 𝑄 satisfies 𝒀 ⊥ 𝒁|𝒂  Weak completeness :  For almost all distributions 𝑄 that factorize over 𝐻 , if 𝒀 ⊥ 𝒁|𝒂 is in 𝑄 then 𝒀 and 𝒁 are d-separated given 𝒂 in the graph 𝐻  There can be independencies in 𝑄 that are not found by conditional independence properties of 𝐻 27


More recommend