Directed Graphical Models: Bayesian Networks Probabilistic - PowerPoint PPT Presentation

Directed Graphical Models: Bayesian Networks Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2018

Basics  Multivariate distributions with large number of variables  Independency assumptions are useful  Independence and conditional independence relationships simplify representation and alleviate inference complexities  Bayesian networks enable us to i ncorporate domain knowledge and structures  Modular combination of heterogeneous parts  Combining data and knowledge (Bayesian philosophy) 2

Conditional and marginal independence  𝑌 and 𝑍 are conditionally independent given 𝑎 if: 𝑌 ⊥ 𝑍|𝑎 𝑄 𝑌 𝑍, 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑌, 𝑍 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑍 𝑎 𝑄 𝑍 𝑌, 𝑎 = 𝑄 𝑍 𝑎 ∀𝑦 ∈ 𝑊𝑏𝑚 𝑌 , 𝑧 ∈ 𝑊𝑏𝑚 𝑍 , 𝑨 ∈ 𝑊𝑏𝑚 𝑎 𝑄 𝑌 = 𝑦, 𝑍 = 𝑧 𝑎 = 𝑨 = 𝑄 𝑌 = 𝑦 𝑎 = 𝑨 𝑄 𝑍 = 𝑧 𝑎 = 𝑨  𝑌 and 𝑍 are marginal independent if: 𝑌 ⊥ 𝑍|∅ 𝑄 𝑌 𝑍 = 𝑄(𝑌) 𝑄 𝑌, 𝑍 = 𝑄 𝑌 𝑄(𝑍) 𝑄 𝑍 𝑌 = 𝑄(𝑍) 3

Bayesian network definition  Bayesian Network  Qualitative specification by a Directed Acyclic Graph (DAG)  Each node denotes a random variable  Edges denote dependencies  𝑌 → 𝑍 shows a " direct influence “ of 𝑌 on 𝑍 ( 𝑌 is a parent of 𝑍 )  Quantitative specification by CPDs  CPD for each node 𝑌 𝑗 defines 𝑄(𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 ))  Bayesian Network represents a joint distribution over variables (via DAG and CPDs) compactly in a factorized way: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄 (𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) 𝑗=1 4

Burglary example John do not perceive minor earthquakes John do not perceive burglaries directly 5

Burglary example  Bayesian networks define joint distribution (over the variables) in terms of the graph structure and conditional probability distributions 𝑄 𝐶, 𝐹, 𝐵, 𝐾, 𝑁 = 𝑄 𝐶 𝑄 𝐹 𝑄 𝐵 𝐶, 𝐹 𝑄 𝐾 𝐵 𝑄(𝑁|𝐵) 6

Burglary example: DAG + CPTs 𝑄(𝐵 = 𝑢|𝐶, 𝐹) CPDs as quantitative specification 𝑄(𝐾 = 𝑢|𝐵) 𝑄(𝑁 = 𝑢|𝐵) 7

Burglary example: full joint probability  𝑄 𝐾, 𝑁, 𝐵, 𝐶, 𝐹 = 𝑄(𝐾|𝐵) 𝑄(𝑁|𝐵) 𝑄(𝐵|𝐶, 𝐹) 𝑄 (𝐶) 𝑄 (𝐹)  𝑄 𝐾 = 𝑢, 𝑁 = 𝑢, 𝐵 = 𝑢, 𝐶 = 𝑔, 𝐹 = 𝑔 =  𝑄(𝐾 = 𝑢|𝐵 = 𝑢) 𝑄(𝑁 = 𝑢|𝐵 = 𝑢) 𝑄(𝐵 = 𝑢|𝐶 = 𝑔, 𝐹 = 𝑔) 𝑄 (𝐶 = 𝑔) 𝑄 (𝐹 = 𝑔)  = 0.9 × 0.7 × 0.001 × 0.999 × 0.998 = 0.000628 Short-hands 𝐾 = 𝑢: 𝐾𝑝ℎ𝑜𝐷𝑏𝑚𝑚𝑡 = 𝑈𝑠𝑣𝑓 𝐶 = 𝑔: 𝐶𝑣𝑠𝑕𝑚𝑏𝑠𝑧 = 𝐺𝑏𝑚𝑡𝑓 … 8

Burglary example: inference  Conditional probability distribution: 𝑄(𝐾=𝑢,𝑁=𝑔,𝐶=𝑢)  𝑄(𝐶 = 𝑢|𝐾 = 𝑢, 𝑁 = 𝑔) = 𝑄(𝐾=𝑢,𝑁=𝑔) 𝐵 𝐹 𝑄(𝐾=𝑢,𝑁=𝑔,𝐵,𝐶,𝐹) = 𝐶 𝐵 𝐹 𝑄(𝐾=𝑢,𝑁=𝑔,𝐵,𝐶,𝐹) 9

Student example 𝑄(𝐸 = 𝑢) 𝑄(𝐽 = 𝑢) Intelligence Difficulty 0.65 0.55 𝑄(𝐻|𝐽, 𝐸) 𝐽 𝐸 Grade 𝐻 = 1 𝐻 = 2 𝐻 = 3 SAT 𝑔 𝑔 0.3 0.4 0.3 𝐽 𝑄(𝑇 = 1|𝐽) 𝑔 𝑢 0.05 0.25 0.7 𝑔 0.1 𝑢 𝑔 0.9 0.08 0.02 Letter 𝑢 0.7 𝑢 𝑢 0.5 0.3 0.2 𝐻 𝑄(𝑀 = 𝑢|𝐻) 1 0.9 2 0.5 3 0.05 10

Continuous variables example  Linear Gaussian 𝑌~𝑂(0,1) 𝑌 𝑍|𝑌 ~ 𝑂(𝑐 + 𝑌, 𝜏) 𝑞(𝑧|𝑦) 𝑍 𝐶 𝐵 𝑧 𝑦 𝑐 = 0.5 𝜏 = 0.1 11

Missing edges  The joint distribution is represented by the chain rule generally: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄(𝑌 1 ) 𝑄(𝑌 𝑗 |𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=2  Equivalent to a graph in which all 𝑌 1 , … , 𝑌 𝑗−1 are parents of 𝑌 𝑗  Missing edges imply conditional independencies.  If we use a DAG that is not complete:  we remove some links, some of the conditioned variables are missing 12

Compact representation  A CPT for a Boolean variable with k Boolean parents requires:  2 𝑙 rows: different combinations of parent values  𝑙 = 0 : one row showing the prior probability  If each variable has no more than 𝑙 parents  Full joint distribution requires 2 𝑜 − 1 numbers  Bayesian network requires at most 𝑜 × 2 𝑙 numbers (linear with 𝑜 )  ⇒ Exponential reduction in number of parameters 13

Bayesian network semantics  Local independencies :  Each node is conditionally independent of its non-descendants given its parents 𝑌 𝑗 ⊥ Non_Descendants 𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )  Are local independencies all of the conditional independencies implied by a BN? 14

Factorization & independence  Let 𝐻 be a graph over 𝑌 1 , … , 𝑌 𝑜 , distribution 𝑄 factorizes over 𝐻 if: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄 (𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 )) 𝑗=1  Factorization ⇒ Independence  If 𝑄 factorizes over 𝐻 , then any variable in 𝑄 is independent of its non- descendants given its parents (in the graph 𝐻 )  Factorization according to 𝐻 implies the associated conditional independencies.  Independence ⇒ Factorization  If any variable in the distribution 𝑄 is independent of its non-descendants given its parents (in the graph 𝐻 ) then 𝑄 factorizes over 𝐻  Conditional independencies imply factorization of the joint distribution (into a product of simpler terms) 15

Independence ⇒ factorization  Consider the chain rule: 𝑜 𝑄(𝑌 1 , … , 𝑌 𝑜 ) = 𝑄(𝑌 𝑗 |𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=1  We can simplify it through conditional independencies assumptions  Given using 𝑌 𝑗 ⫫ Non_Descendants 𝑌 𝑗 | 𝑄𝑏(𝑌 𝑗 ) we can show 𝑄 𝑌 𝑗 𝑌 1 , 𝑌 2 , … , 𝑌 𝑗−1 ) = 𝑄(𝑌 𝑗 | 𝑄𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 𝑗 )) 16

Equivalence Theorem  For a graph G: • Let D1 denote the family of all distributions that satisfy conditional independencies of G • Let D2 denote the family of all distributions that factor according to G • ⇒ D1 ≡ D2. 17

Other independencies  Are there other independences that hold for every distribution 𝑄 that factorizes over 𝐻 ?  According to the graphical criterion called D-separation, we can find independencies from the graph  If 𝑄 factorizes over 𝐻 , can we read these independencies from the structure of 𝐻 ? 18

Basic structures  𝑌 ⊥ 𝑍|𝑎 X Z Y  𝑌 ⊥ 𝑍|𝑎 Z X Y X Y  𝑌 ⊥ 𝑍 Z Explaining away 19

Explaining away  When we condition on 𝑎 are 𝑌 and 𝑍 are independent? X Y Z 𝑄 𝑌, 𝑍, 𝑎 = 𝑄 𝑌 𝑄 𝑍 𝑄(𝑎|𝑌, 𝑍)  𝑌 and 𝑍 are marginally independent but given 𝑎 they are conditionally dependent  This is called explaining away  Two coins example 20

D-separation  Let 𝐵, 𝐶, 𝐷 denote three disjoint sets of nodes, 𝐵 is d- separated from 𝐶 by 𝐷 iff 𝑩 ⊥ 𝑪|𝑫  𝐵 is d-separated from 𝐶 by 𝐷 if all undirected paths between 𝐵 and 𝐶 are blocked by 𝐷 21

Undirected path blocking  Head-to-tail at a node 𝑎 ∈ 𝐷 Y X Z 𝑍 ∈ 𝐶 𝑎 ∈ 𝐷 𝑌 ∈ 𝐵  Tail-to-tail at a node 𝑎 ∈ 𝐷 Y X Z 𝑍 ∈ 𝐶 𝑎 ∈ 𝐷 𝑌 ∈ 𝐵  Head-to-head (i.e., v-structure) at a node 𝑎 ( 𝑎 ∉ 𝐷 & none of its descendants are in 𝐷 ) Y X Z 𝑍 ∈ 𝐶 𝑌 ∈ 𝐵 22

Undirected path blocking 𝐵 𝐷 𝐶 … … In all trails (undirected paths) between A and B: • A node in the path is in 𝐷 and … … the path at the node do not meet head-to-head. … … Or a head-to-head node in the • path, and neither the node, nor … any of its descendants, is in C … 𝐵 ⊥ 𝐶|𝐷 23

D-separation: active trail view  Definition: 𝑌 and 𝑍 are d-separated in 𝐻 given 𝑎 if there is no active trail in 𝐻 between 𝑌 and 𝑍 given 𝑎  A trail between 𝑌 and 𝑍 is active :  for any v-structure node 𝑉 in the trail 𝑌 … ⟶ 𝑉 ⟵ ⋯ 𝑍 , either 𝑉 or one of its descendants are in 𝑎  other nodes in this trail are not in 𝑎 24

D-separation: example 𝑆⊥𝐻|𝐽 Intelligence Difficulty 𝑆⊥𝐸|𝐽 𝑆 ⊥ 𝐸|𝐻 Grade Rank 𝑆 ⊥ 𝐸|𝑀 𝑆 ⊥ 𝑀|𝐻 Letter 𝐸 ⊥ 𝑀|𝐻 25

Markov Blanket in Bayesian Network  A variable is conditionally independent of all other variables given its Markov blanket  Markov blanket of a node:  All parents  Children  Co-parents of children 26

D-Separation: soundness & completeness  Soundness : Any conditional independence properties that we can derive from 𝐻 should hold for the probability distribution that factorize over 𝐻  Theorem : If 𝑄 factorizes over 𝐻 , and d-sep G (𝒀, 𝒁|𝒂) then 𝑄 satisfies 𝒀 ⊥ 𝒁|𝒂  Weak completeness :  For almost all distributions 𝑄 that factorize over 𝐻 , if 𝒀 ⊥ 𝒁|𝒂 is in 𝑄 then 𝒀 and 𝒁 are d-separated given 𝒂 in the graph 𝐻  There can be independencies in 𝑄 that are not found by conditional independence properties of 𝐻 27

Directed Graphical Models: Bayesian Networks Probabilistic - PowerPoint PPT Presentation

Directed Graphical Models: Bayesian Networks Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2018 Basics Multivariate distributions with large number of variables Independency assumptions are useful

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Graphical Models Graphical Models Relationship between the directed & undirected models

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Directed Graphical Models + Undirected Graphical Models Matt Gormley Lecture 7 Sep. 18, 2019

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Two types of GMs Directed edges give causality relationships ( Bayesian Network or Directed

Bayesian Networks Lab Andrea Passerini and Paolo Dragone Machine Learning BN Lab The software

Bayesian Belief Network 14.4 Inference Decision Theoretic Agents Introduction to Probability

1 CSE 6242 Fall 15 Capstone Project Team Advisor Matt Garvey Dr. Polo Chau Nilaksh Das

Bayesian networks Chapter 14.13 Chapter 14.13 1 Outline Syntax Semantics

rSMILE, an interface to the Bayesian Network package GeNIe/SMILE Roman Klinger, Christoph M.

Homers Iliad Book 24 The Ransom of Hectors Body Clst 181SK Ancient Greece and the

Refactoring Section 7.2.1 (JIAs) OTHER SOURCES Code Evolution Programs evolve and code is

CSCI 3210: Computational Game Theory Market Equilibria: An Algorithmic Perspective Ref: Ch 5

Directed Graphical Models: Bayesian Networks Probabilistic - PowerPoint PPT Presentation

Directed Graphical Models: Bayesian Networks Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2018 Basics Multivariate distributions with large number of variables Independency assumptions are useful

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Directed Graphical Models + Undirected Graphical Models Matt Gormley Lecture 7 Sep. 18, 2019

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Two types of GMs Directed edges give causality relationships ( Bayesian Network or Directed

Bayesian Networks Lab Andrea Passerini and Paolo Dragone Machine Learning BN Lab The software

Bayesian Belief Network 14.4 Inference Decision Theoretic Agents Introduction to Probability

1 CSE 6242 Fall 15 Capstone Project Team Advisor Matt Garvey Dr. Polo Chau Nilaksh Das

Bayesian networks Chapter 14.13 Chapter 14.13 1 Outline Syntax Semantics

rSMILE, an interface to the Bayesian Network package GeNIe/SMILE Roman Klinger, Christoph M.

Homers Iliad Book 24 The Ransom of Hectors Body Clst 181SK Ancient Greece and the

Refactoring Section 7.2.1 (JIAs) OTHER SOURCES Code Evolution Programs evolve and code is

CSCI 3210: Computational Game Theory Market Equilibria: An Algorithmic Perspective Ref: Ch 5

Graphical Models Graphical Models Relationship between the directed & undirected models