Directed Graphical Models: Bayesian Networks Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2018
Basics Multivariate distributions with large number of variables Independency assumptions are useful Independence and conditional independence relationships simplify representation and alleviate inference complexities Bayesian networks enable us to i ncorporate domain knowledge and structures Modular combination of heterogeneous parts Combining data and knowledge (Bayesian philosophy) 2
Conditional and marginal independence 𝑌 and 𝑍 are conditionally independent given 𝑎 if: 𝑌 ⊥ 𝑍|𝑎 𝑄 𝑌 𝑍, 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑌, 𝑍 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑍 𝑎 𝑄 𝑍 𝑌, 𝑎 = 𝑄 𝑍 𝑎 ∀𝑦 ∈ 𝑊𝑏𝑚 𝑌 , 𝑧 ∈ 𝑊𝑏𝑚 𝑍 , 𝑨 ∈ 𝑊𝑏𝑚 𝑎 𝑄 𝑌 = 𝑦, 𝑍 = 𝑧 𝑎 = 𝑨 = 𝑄 𝑌 = 𝑦 𝑎 = 𝑨 𝑄 𝑍 = 𝑧 𝑎 = 𝑨 𝑌 and 𝑍 are marginal independent if: 𝑌 ⊥ 𝑍|∅ 𝑄 𝑌 𝑍 = 𝑄(𝑌) 𝑄 𝑌, 𝑍 = 𝑄 𝑌 𝑄(𝑍) 𝑄 𝑍 𝑌 = 𝑄(𝑍) 3
Bayesian network definition Bayesian Network Qualitative specification by a Directed Acyclic Graph (DAG) Each node denotes a random variable Edges denote dependencies 𝑌 → 𝑍 shows a " direct influence “ of 𝑌 on 𝑍 ( 𝑌 is a parent of 𝑍 ) Quantitative specification by CPDs CPD for each node 𝑌 𝑗 defines 𝑄(𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) Bayesian Network represents a joint distribution over variables (via DAG and CPDs) compactly in a factorized way: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄 (𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) 𝑗=1 4
Burglary example John do not perceive minor earthquakes John do not perceive burglaries directly 5
Burglary example Bayesian networks define joint distribution (over the variables) in terms of the graph structure and conditional probability distributions 𝑄 𝐶, 𝐹, 𝐵, 𝐾, 𝑁 = 𝑄 𝐶 𝑄 𝐹 𝑄 𝐵 𝐶, 𝐹 𝑄 𝐾 𝐵 𝑄(𝑁|𝐵) 6
Burglary example: DAG + CPTs 𝑄(𝐵 = 𝑢|𝐶, 𝐹) CPDs as quantitative specification 𝑄(𝐾 = 𝑢|𝐵) 𝑄(𝑁 = 𝑢|𝐵) 7
Burglary example: full joint probability 𝑄 𝐾, 𝑁, 𝐵, 𝐶, 𝐹 = 𝑄(𝐾|𝐵) 𝑄(𝑁|𝐵) 𝑄(𝐵|𝐶, 𝐹) 𝑄 (𝐶) 𝑄 (𝐹) 𝑄 𝐾 = 𝑢, 𝑁 = 𝑢, 𝐵 = 𝑢, 𝐶 = 𝑔, 𝐹 = 𝑔 = 𝑄(𝐾 = 𝑢|𝐵 = 𝑢) 𝑄(𝑁 = 𝑢|𝐵 = 𝑢) 𝑄(𝐵 = 𝑢|𝐶 = 𝑔, 𝐹 = 𝑔) 𝑄 (𝐶 = 𝑔) 𝑄 (𝐹 = 𝑔) = 0.9 × 0.7 × 0.001 × 0.999 × 0.998 = 0.000628 Short-hands 𝐾 = 𝑢: 𝐾𝑝ℎ𝑜𝐷𝑏𝑚𝑚𝑡 = 𝑈𝑠𝑣𝑓 𝐶 = 𝑔: 𝐶𝑣𝑠𝑚𝑏𝑠𝑧 = 𝐺𝑏𝑚𝑡𝑓 … 8
Burglary example: inference Conditional probability distribution: 𝑄(𝐾=𝑢,𝑁=𝑔,𝐶=𝑢) 𝑄(𝐶 = 𝑢|𝐾 = 𝑢, 𝑁 = 𝑔) = 𝑄(𝐾=𝑢,𝑁=𝑔) 𝐵 𝐹 𝑄(𝐾=𝑢,𝑁=𝑔,𝐵,𝐶,𝐹) = 𝐶 𝐵 𝐹 𝑄(𝐾=𝑢,𝑁=𝑔,𝐵,𝐶,𝐹) 9
Student example 𝑄(𝐸 = 𝑢) 𝑄(𝐽 = 𝑢) Intelligence Difficulty 0.65 0.55 𝑄(𝐻|𝐽, 𝐸) 𝐽 𝐸 Grade 𝐻 = 1 𝐻 = 2 𝐻 = 3 SAT 𝑔 𝑔 0.3 0.4 0.3 𝐽 𝑄(𝑇 = 1|𝐽) 𝑔 𝑢 0.05 0.25 0.7 𝑔 0.1 𝑢 𝑔 0.9 0.08 0.02 Letter 𝑢 0.7 𝑢 𝑢 0.5 0.3 0.2 𝐻 𝑄(𝑀 = 𝑢|𝐻) 1 0.9 2 0.5 3 0.05 10
Continuous variables example Linear Gaussian 𝑌~𝑂(0,1) 𝑌 𝑍|𝑌 ~ 𝑂(𝑐 + 𝑌, 𝜏) 𝑞(𝑧|𝑦) 𝑍 𝐶 𝐵 𝑧 𝑦 𝑐 = 0.5 𝜏 = 0.1 11
Missing edges The joint distribution is represented by the chain rule generally: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄(𝑌 1 ) 𝑄(𝑌 𝑗 |𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=2 Equivalent to a graph in which all 𝑌 1 , … , 𝑌 𝑗−1 are parents of 𝑌 𝑗 Missing edges imply conditional independencies. If we use a DAG that is not complete: we remove some links, some of the conditioned variables are missing 12
Compact representation A CPT for a Boolean variable with k Boolean parents requires: 2 𝑙 rows: different combinations of parent values 𝑙 = 0 : one row showing the prior probability If each variable has no more than 𝑙 parents Full joint distribution requires 2 𝑜 − 1 numbers Bayesian network requires at most 𝑜 × 2 𝑙 numbers (linear with 𝑜 ) ⇒ Exponential reduction in number of parameters 13
Bayesian network semantics Local independencies : Each node is conditionally independent of its non-descendants given its parents 𝑌 𝑗 ⊥ Non_Descendants 𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 ) Are local independencies all of the conditional independencies implied by a BN? 14
Factorization & independence Let 𝐻 be a graph over 𝑌 1 , … , 𝑌 𝑜 , distribution 𝑄 factorizes over 𝐻 if: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄 (𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) 𝑗=1 Factorization ⇒ Independence If 𝑄 factorizes over 𝐻 , then any variable in 𝑄 is independent of its non- descendants given its parents (in the graph 𝐻 ) Factorization according to 𝐻 implies the associated conditional independencies. Independence ⇒ Factorization If any variable in the distribution 𝑄 is independent of its non-descendants given its parents (in the graph 𝐻 ) then 𝑄 factorizes over 𝐻 Conditional independencies imply factorization of the joint distribution (into a product of simpler terms) 15
Independence ⇒ factorization Consider the chain rule: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄(𝑌 𝑗 |𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=1 We can simplify it through conditional independencies assumptions Given using 𝑌 𝑗 ⫫ Non_Descendants 𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 ) we can show 𝑄 𝑌 𝑗 𝑌 1 , 𝑌 2 , … , 𝑌 𝑗−1 ) = 𝑄(𝑌 𝑗 | 𝑄𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 𝑗 )) 16
Equivalence Theorem For a graph G: • Let D1 denote the family of all distributions that satisfy conditional independencies of G • Let D2 denote the family of all distributions that factor according to G • ⇒ D1 ≡ D2. 17
Other independencies Are there other independences that hold for every distribution 𝑄 that factorizes over 𝐻 ? According to the graphical criterion called D-separation, we can find independencies from the graph If 𝑄 factorizes over 𝐻 , can we read these independencies from the structure of 𝐻 ? 18
Basic structures 𝑌 ⊥ 𝑍|𝑎 X Z Y 𝑌 ⊥ 𝑍|𝑎 Z X Y X Y 𝑌 ⊥ 𝑍 Z Explaining away 19
Explaining away When we condition on 𝑎 are 𝑌 and 𝑍 are independent? X Y Z 𝑄 𝑌, 𝑍, 𝑎 = 𝑄 𝑌 𝑄 𝑍 𝑄(𝑎|𝑌, 𝑍) 𝑌 and 𝑍 are marginally independent but given 𝑎 they are conditionally dependent This is called explaining away Two coins example 20
D-separation Let 𝐵, 𝐶, 𝐷 denote three disjoint sets of nodes, 𝐵 is d- separated from 𝐶 by 𝐷 iff 𝑩 ⊥ 𝑪|𝑫 𝐵 is d-separated from 𝐶 by 𝐷 if all undirected paths between 𝐵 and 𝐶 are blocked by 𝐷 21
Undirected path blocking Head-to-tail at a node 𝑎 ∈ 𝐷 Y X Z 𝑍 ∈ 𝐶 𝑎 ∈ 𝐷 𝑌 ∈ 𝐵 Tail-to-tail at a node 𝑎 ∈ 𝐷 Y X Z 𝑍 ∈ 𝐶 𝑎 ∈ 𝐷 𝑌 ∈ 𝐵 Head-to-head (i.e., v-structure) at a node 𝑎 ( 𝑎 ∉ 𝐷 & none of its descendants are in 𝐷 ) Y X Z 𝑍 ∈ 𝐶 𝑌 ∈ 𝐵 22
Undirected path blocking 𝐵 𝐷 𝐶 … … In all trails (undirected paths) between A and B: • A node in the path is in 𝐷 and … … the path at the node do not meet head-to-head. … … Or a head-to-head node in the • path, and neither the node, nor … any of its descendants, is in C … 𝐵 ⊥ 𝐶|𝐷 23
D-separation: active trail view Definition: 𝑌 and 𝑍 are d-separated in 𝐻 given 𝑎 if there is no active trail in 𝐻 between 𝑌 and 𝑍 given 𝑎 A trail between 𝑌 and 𝑍 is active : for any v-structure node 𝑉 in the trail 𝑌 … ⟶ 𝑉 ⟵ ⋯ 𝑍 , either 𝑉 or one of its descendants are in 𝑎 other nodes in this trail are not in 𝑎 24
D-separation: example 𝑆⊥𝐻|𝐽 Intelligence Difficulty 𝑆⊥𝐸|𝐽 𝑆 ⊥ 𝐸|𝐻 Grade Rank 𝑆 ⊥ 𝐸|𝑀 𝑆 ⊥ 𝑀|𝐻 Letter 𝐸 ⊥ 𝑀|𝐻 25
Markov Blanket in Bayesian Network A variable is conditionally independent of all other variables given its Markov blanket Markov blanket of a node: All parents Children Co-parents of children 26
D-Separation: soundness & completeness Soundness : Any conditional independence properties that we can derive from 𝐻 should hold for the probability distribution that factorize over 𝐻 Theorem : If 𝑄 factorizes over 𝐻 , and d-sep G (𝒀, 𝒁|𝒂) then 𝑄 satisfies 𝒀 ⊥ 𝒁|𝒂 Weak completeness : For almost all distributions 𝑄 that factorize over 𝐻 , if 𝒀 ⊥ 𝒁|𝒂 is in 𝑄 then 𝒀 and 𝒁 are d-separated given 𝒂 in the graph 𝐻 There can be independencies in 𝑄 that are not found by conditional independence properties of 𝐻 27
Recommend
More recommend