 
              Directed Graphical Models: Bayesian Networks Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2018
Basics  Multivariate distributions with large number of variables  Independency assumptions are useful  Independence and conditional independence relationships simplify representation and alleviate inference complexities  Bayesian networks enable us to i ncorporate domain knowledge and structures  Modular combination of heterogeneous parts  Combining data and knowledge (Bayesian philosophy) 2
Conditional and marginal independence  𝑌 and 𝑍 are conditionally independent given 𝑎 if: 𝑌 ⊥ 𝑍|𝑎 𝑄 𝑌 𝑍, 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑌, 𝑍 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑍 𝑎 𝑄 𝑍 𝑌, 𝑎 = 𝑄 𝑍 𝑎 ∀𝑦 ∈ 𝑊𝑏𝑚 𝑌 , 𝑧 ∈ 𝑊𝑏𝑚 𝑍 , 𝑨 ∈ 𝑊𝑏𝑚 𝑎 𝑄 𝑌 = 𝑦, 𝑍 = 𝑧 𝑎 = 𝑨 = 𝑄 𝑌 = 𝑦 𝑎 = 𝑨 𝑄 𝑍 = 𝑧 𝑎 = 𝑨  𝑌 and 𝑍 are marginal independent if: 𝑌 ⊥ 𝑍|∅ 𝑄 𝑌 𝑍 = 𝑄(𝑌) 𝑄 𝑌, 𝑍 = 𝑄 𝑌 𝑄(𝑍) 𝑄 𝑍 𝑌 = 𝑄(𝑍) 3
Bayesian network definition  Bayesian Network  Qualitative specification by a Directed Acyclic Graph (DAG)  Each node denotes a random variable  Edges denote dependencies  𝑌 → 𝑍 shows a " direct influence “ of 𝑌 on 𝑍 ( 𝑌 is a parent of 𝑍 )  Quantitative specification by CPDs  CPD for each node 𝑌 𝑗 defines 𝑄(𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 ))  Bayesian Network represents a joint distribution over variables (via DAG and CPDs) compactly in a factorized way: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄 (𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) 𝑗=1 4
Burglary example John do not perceive minor earthquakes John do not perceive burglaries directly 5
Burglary example  Bayesian networks define joint distribution (over the variables) in terms of the graph structure and conditional probability distributions 𝑄 𝐶, 𝐹, 𝐵, 𝐾, 𝑁 = 𝑄 𝐶 𝑄 𝐹 𝑄 𝐵 𝐶, 𝐹 𝑄 𝐾 𝐵 𝑄(𝑁|𝐵) 6
Burglary example: DAG + CPTs 𝑄(𝐵 = 𝑢|𝐶, 𝐹) CPDs as quantitative specification 𝑄(𝐾 = 𝑢|𝐵) 𝑄(𝑁 = 𝑢|𝐵) 7
Burglary example: full joint probability  𝑄 𝐾, 𝑁, 𝐵, 𝐶, 𝐹 = 𝑄(𝐾|𝐵) 𝑄(𝑁|𝐵) 𝑄(𝐵|𝐶, 𝐹) 𝑄 (𝐶) 𝑄 (𝐹)  𝑄 𝐾 = 𝑢, 𝑁 = 𝑢, 𝐵 = 𝑢, 𝐶 = 𝑔, 𝐹 = 𝑔 =  𝑄(𝐾 = 𝑢|𝐵 = 𝑢) 𝑄(𝑁 = 𝑢|𝐵 = 𝑢) 𝑄(𝐵 = 𝑢|𝐶 = 𝑔, 𝐹 = 𝑔) 𝑄 (𝐶 = 𝑔) 𝑄 (𝐹 = 𝑔)  = 0.9 × 0.7 × 0.001 × 0.999 × 0.998 = 0.000628 Short-hands 𝐾 = 𝑢: 𝐾𝑝ℎ𝑜𝐷𝑏𝑚𝑚𝑡 = 𝑈𝑠𝑣𝑓 𝐶 = 𝑔: 𝐶𝑣𝑠𝑚𝑏𝑠𝑧 = 𝐺𝑏𝑚𝑡𝑓 … 8
Burglary example: inference  Conditional probability distribution: 𝑄(𝐾=𝑢,𝑁=𝑔,𝐶=𝑢)  𝑄(𝐶 = 𝑢|𝐾 = 𝑢, 𝑁 = 𝑔) = 𝑄(𝐾=𝑢,𝑁=𝑔) 𝐵 𝐹 𝑄(𝐾=𝑢,𝑁=𝑔,𝐵,𝐶,𝐹) = 𝐶 𝐵 𝐹 𝑄(𝐾=𝑢,𝑁=𝑔,𝐵,𝐶,𝐹) 9
Student example 𝑄(𝐸 = 𝑢) 𝑄(𝐽 = 𝑢) Intelligence Difficulty 0.65 0.55 𝑄(𝐻|𝐽, 𝐸) 𝐽 𝐸 Grade 𝐻 = 1 𝐻 = 2 𝐻 = 3 SAT 𝑔 𝑔 0.3 0.4 0.3 𝐽 𝑄(𝑇 = 1|𝐽) 𝑔 𝑢 0.05 0.25 0.7 𝑔 0.1 𝑢 𝑔 0.9 0.08 0.02 Letter 𝑢 0.7 𝑢 𝑢 0.5 0.3 0.2 𝐻 𝑄(𝑀 = 𝑢|𝐻) 1 0.9 2 0.5 3 0.05 10
Continuous variables example  Linear Gaussian 𝑌~𝑂(0,1) 𝑌 𝑍|𝑌 ~ 𝑂(𝑐 + 𝑌, 𝜏) 𝑞(𝑧|𝑦) 𝑍 𝐶 𝐵 𝑧 𝑦 𝑐 = 0.5 𝜏 = 0.1 11
Missing edges  The joint distribution is represented by the chain rule generally: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄(𝑌 1 ) 𝑄(𝑌 𝑗 |𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=2  Equivalent to a graph in which all 𝑌 1 , … , 𝑌 𝑗−1 are parents of 𝑌 𝑗  Missing edges imply conditional independencies.  If we use a DAG that is not complete:  we remove some links, some of the conditioned variables are missing 12
Compact representation  A CPT for a Boolean variable with k Boolean parents requires:  2 𝑙 rows: different combinations of parent values  𝑙 = 0 : one row showing the prior probability  If each variable has no more than 𝑙 parents  Full joint distribution requires 2 𝑜 − 1 numbers  Bayesian network requires at most 𝑜 × 2 𝑙 numbers (linear with 𝑜 )  ⇒ Exponential reduction in number of parameters 13
Bayesian network semantics  Local independencies :  Each node is conditionally independent of its non-descendants given its parents 𝑌 𝑗 ⊥ Non_Descendants 𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )  Are local independencies all of the conditional independencies implied by a BN? 14
Factorization & independence  Let 𝐻 be a graph over 𝑌 1 , … , 𝑌 𝑜 , distribution 𝑄 factorizes over 𝐻 if: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄 (𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) 𝑗=1  Factorization ⇒ Independence  If 𝑄 factorizes over 𝐻 , then any variable in 𝑄 is independent of its non- descendants given its parents (in the graph 𝐻 )  Factorization according to 𝐻 implies the associated conditional independencies.  Independence ⇒ Factorization  If any variable in the distribution 𝑄 is independent of its non-descendants given its parents (in the graph 𝐻 ) then 𝑄 factorizes over 𝐻  Conditional independencies imply factorization of the joint distribution (into a product of simpler terms) 15
Independence ⇒ factorization  Consider the chain rule: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄(𝑌 𝑗 |𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=1  We can simplify it through conditional independencies assumptions  Given using 𝑌 𝑗 ⫫ Non_Descendants 𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 ) we can show 𝑄 𝑌 𝑗 𝑌 1 , 𝑌 2 , … , 𝑌 𝑗−1 ) = 𝑄(𝑌 𝑗 | 𝑄𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 𝑗 )) 16
Equivalence Theorem  For a graph G: • Let D1 denote the family of all distributions that satisfy conditional independencies of G • Let D2 denote the family of all distributions that factor according to G • ⇒ D1 ≡ D2. 17
Other independencies  Are there other independences that hold for every distribution 𝑄 that factorizes over 𝐻 ?  According to the graphical criterion called D-separation, we can find independencies from the graph  If 𝑄 factorizes over 𝐻 , can we read these independencies from the structure of 𝐻 ? 18
Basic structures  𝑌 ⊥ 𝑍|𝑎 X Z Y  𝑌 ⊥ 𝑍|𝑎 Z X Y X Y  𝑌 ⊥ 𝑍 Z Explaining away 19
Explaining away  When we condition on 𝑎 are 𝑌 and 𝑍 are independent? X Y Z 𝑄 𝑌, 𝑍, 𝑎 = 𝑄 𝑌 𝑄 𝑍 𝑄(𝑎|𝑌, 𝑍)  𝑌 and 𝑍 are marginally independent but given 𝑎 they are conditionally dependent  This is called explaining away  Two coins example 20
D-separation  Let 𝐵, 𝐶, 𝐷 denote three disjoint sets of nodes, 𝐵 is d- separated from 𝐶 by 𝐷 iff 𝑩 ⊥ 𝑪|𝑫  𝐵 is d-separated from 𝐶 by 𝐷 if all undirected paths between 𝐵 and 𝐶 are blocked by 𝐷 21
Undirected path blocking  Head-to-tail at a node 𝑎 ∈ 𝐷 Y X Z 𝑍 ∈ 𝐶 𝑎 ∈ 𝐷 𝑌 ∈ 𝐵  Tail-to-tail at a node 𝑎 ∈ 𝐷 Y X Z 𝑍 ∈ 𝐶 𝑎 ∈ 𝐷 𝑌 ∈ 𝐵  Head-to-head (i.e., v-structure) at a node 𝑎 ( 𝑎 ∉ 𝐷 & none of its descendants are in 𝐷 ) Y X Z 𝑍 ∈ 𝐶 𝑌 ∈ 𝐵 22
Undirected path blocking 𝐵 𝐷 𝐶 … … In all trails (undirected paths) between A and B: • A node in the path is in 𝐷 and … … the path at the node do not meet head-to-head. … … Or a head-to-head node in the • path, and neither the node, nor … any of its descendants, is in C … 𝐵 ⊥ 𝐶|𝐷 23
D-separation: active trail view  Definition: 𝑌 and 𝑍 are d-separated in 𝐻 given 𝑎 if there is no active trail in 𝐻 between 𝑌 and 𝑍 given 𝑎  A trail between 𝑌 and 𝑍 is active :  for any v-structure node 𝑉 in the trail 𝑌 … ⟶ 𝑉 ⟵ ⋯ 𝑍 , either 𝑉 or one of its descendants are in 𝑎  other nodes in this trail are not in 𝑎 24
D-separation: example 𝑆⊥𝐻|𝐽 Intelligence Difficulty 𝑆⊥𝐸|𝐽 𝑆 ⊥ 𝐸|𝐻 Grade Rank 𝑆 ⊥ 𝐸|𝑀 𝑆 ⊥ 𝑀|𝐻 Letter 𝐸 ⊥ 𝑀|𝐻 25
Markov Blanket in Bayesian Network  A variable is conditionally independent of all other variables given its Markov blanket  Markov blanket of a node:  All parents  Children  Co-parents of children 26
D-Separation: soundness & completeness  Soundness : Any conditional independence properties that we can derive from 𝐻 should hold for the probability distribution that factorize over 𝐻  Theorem : If 𝑄 factorizes over 𝐻 , and d-sep G (𝒀, 𝒁|𝒂) then 𝑄 satisfies 𝒀 ⊥ 𝒁|𝒂  Weak completeness :  For almost all distributions 𝑄 that factorize over 𝐻 , if 𝒀 ⊥ 𝒁|𝒂 is in 𝑄 then 𝒀 and 𝒁 are d-separated given 𝒂 in the graph 𝐻  There can be independencies in 𝑄 that are not found by conditional independence properties of 𝐻 27
Recommend
More recommend