Graphical Models CS 6355: Structured Prediction 1
So far… We discussed sequence labeling tasks: • HMM: Hidden Markov Models • MEMM: Maximum Entropy Markov Models • CRF: Conditional Random Fields All these models use a linear chain structure to describe the interactions between random variables. y t-1 y t y t-1 y t y t-1 y t x t x t x t HMM MEMM CRF 2
This lecture Graphical models – Directed: Bayesian Networks – Undirected: Markov Networks (Markov Random Field) • Representations • Inference • Learning 3
Probabilistic Graphical Models Languages that represent probability distributions over multiple random • variables – Directed or undirected graphs Encodes conditional independence assumptions • Or equivalently, encodes factorization of joint probabilities. • General machinery for • – Algorithms for computing marginal and conditional probabilities • Recall that we have been looking at most probable states so far • Exploiting graph structure – An “inference engine” – Can introduce prior probability distributions Because parameters are also random variables • 4
� Bayesian Network Decompose joint probability via a directed acyclic graph – Nodes represent random variables – Edges represent conditional dependencies – Each node is associated with a conditional probability table 𝑄 𝑨 # , 𝑨 % , ⋯ 𝑨 ' = ) 𝑄 𝑨 * ∣ Parents 𝑨 * * 5
� Bayesian Network Decompose joint probability via a directed acyclic graph – Nodes represent random variables – Edges represent conditional dependencies – Each node is associated with a conditional probability table 𝑄 𝑨 # , 𝑨 % , ⋯ 𝑨 ' = ) 𝑄 𝑨 * ∣ Parents 𝑨 * * 6 Example from Russell and Norvig
� Bayesian Network Decompose joint probability via a directed acyclic graph – Nodes represent random variables – Edges represent conditional dependencies – Each node is associated with a conditional probability table 𝑄 𝑨 # , 𝑨 % , ⋯ 𝑨 ' = ) 𝑄 𝑨 * ∣ Parents 𝑨 * * Joint probability 𝑄 𝐶, 𝐹, 𝐵, 𝐾, 𝑁 = 𝑄 𝐶 ⋅ 𝑄 𝐹 ⋅ 𝑄 𝐵 𝐶, 𝐹 ⋅ 𝑄 𝐾 𝐵 ⋅ 𝑄 𝑁 𝐵 7 Example from Russell and Norvig
� Bayesian Network Decompose joint probability via a directed acyclic graph – Nodes represent random variables – Edges represent conditional dependencies – Each node is associated with a conditional probability table 𝑄 𝑨 # , 𝑨 % , ⋯ 𝑨 ' = ) 𝑄 𝑨 * ∣ Parents 𝑨 * * Joint probability 𝑄 𝐶, 𝐹, 𝐵, 𝐾, 𝑁 = 𝑄 𝐶 ⋅ 𝑄 𝐹 ⋅ 𝑄 𝐵 𝐶, 𝐹 ⋅ 𝑄 𝐾 𝐵 ⋅ 𝑄 𝑁 𝐵 The network and its parameters are a compact representation of the joint probability distribution 8 Example from Russell and Norvig
� Bayesian Network Decompose joint probability via a directed acyclic graph – Nodes represent random variables – Edges represent conditional dependencies – Each node is associated with a conditional probability table 𝑄 𝑨 # , 𝑨 % , ⋯ 𝑨 ' = ) 𝑄 𝑨 * ∣ Parents 𝑨 * * We can ask questions like: “What is the probability that Mary calls if there is • an earthquake?” “If John called and Mary did not call, what is the • probability that there was a burglary?” 9 Example from Russell and Norvig
Independence Assumptions of a BN If X, Y, Z are random variables, we write • X ⊥ 𝑍 to say “ X is independent of Y ” and • X ⊥ 𝑍 ∣ 𝑎 to say “ X is independent of Y given 𝑎 ” 10 Example from Daphne Koller
Independence Assumptions of a BN If X, Y, Z are random variables, we write X ⊥ 𝑍 to say “ X is independent of Y ” and • X ⊥ 𝑍 ∣ 𝑎 to say “ X is independent of Y given 𝑎 ” • 11 Example from Daphne Koller
Independence Assumptions of a BN If X, Y, Z are random variables, we write X ⊥ 𝑍 to say “ X is independent of Y ” and • X ⊥ 𝑍 ∣ 𝑎 to say “ X is independent of Y given 𝑎 ” • Local independencies : A node is independent with its non-descendants given its parents 𝑌 * ⊥ NonDescendants 𝑌 * ∣ Parents 𝑌 * 12 Example from Daphne Koller
Independence Assumptions of a BN If X, Y, Z are random variables, we write X ⊥ 𝑍 to say “ X is independent of Y ” and • X ⊥ 𝑍 ∣ 𝑎 to say “ X is independent of Y given 𝑎 ” • Local independencies : A node is independent with its non-descendants given its parents 𝑌 * ⊥ NonDescendants 𝑌 * ∣ Parents 𝑌 * Examples: – 𝐺𝑚𝑣 ⊥ 𝐼𝑏𝑧𝑔𝑓𝑤𝑓𝑠 ∣ 𝑇𝑓𝑏𝑡𝑝𝑜 – 𝐷𝑝𝑜𝑓𝑡𝑢𝑗𝑝𝑜 ⊥ 𝑇𝑓𝑏𝑡𝑝𝑜 ∣ 𝐺𝑚𝑣, 𝐼𝑏𝑧𝑔𝑓𝑤𝑓𝑠 13 Example from Daphne Koller
Independence Assumptions of a BN If X, Y, Z are random variables, we write X ⊥ 𝑍 to say “ X is independent of Y ” and • X ⊥ 𝑍 ∣ 𝑎 to say “ X is independent of Y given 𝑎 ” • Local independencies : A node is independent with its non-descendants given its parents 𝑌 * ⊥ NonDescendants 𝑌 * ∣ Parents 𝑌 * Examples: – 𝐺𝑚𝑣 ⊥ 𝐼𝑏𝑧𝑔𝑓𝑤𝑓𝑠 ∣ 𝑇𝑓𝑏𝑡𝑝𝑜 – 𝐷𝑝𝑜𝑓𝑡𝑢𝑗𝑝𝑜 ⊥ 𝑇𝑓𝑏𝑡𝑝𝑜 ∣ 𝐺𝑚𝑣, 𝐼𝑏𝑧𝑔𝑓𝑤𝑓𝑠 Parents of a node shield it from influence of ancestors and non-descendants… … but information about descendants can influence beliefs about a node. 14 Example from Daphne Koller
Independence Assumptions of a BN If X, Y, Z are random variables, we write X ⊥ 𝑍 to say “ X is independent of Y ” and • X ⊥ 𝑍 ∣ 𝑎 to say “ X is independent of Y given 𝑎 ” • Topological independencies : A node is independent of all other nodes given its parents, children and children’s parents, together called the node’s Markov Blanket 𝑌 * ⊥ 𝑌 Y ∣ MarkovBlanket 𝑌 * 15 Example from Daphne Koller
Independence Assumptions of a BN If X, Y, Z are random variables, we write X ⊥ 𝑍 to say “ X is independent of Y ” and • X ⊥ 𝑍 ∣ 𝑎 to say “ X is independent of Y given 𝑎 ” • Topological independencies : A node is independent of all other nodes given its parents, children and children’s parents, together called the node’s Markov Blanket 𝑌 * ⊥ 𝑌 Y ∣ MarkovBlanket 𝑌 * Example: The Markov blanket of 𝐼𝑏𝑧𝑔𝑓𝑤𝑓𝑠 is the set {𝑇𝑓𝑏𝑡𝑝𝑜, 𝐷𝑝𝑜𝑓𝑡𝑢𝑗𝑝𝑜, 𝐺𝑚𝑣} . If we know these variables, 𝐼𝑏𝑧𝑔𝑓𝑤𝑓𝑠 is independent of 𝑁𝑣𝑡𝑑𝑚𝑓𝑄𝑏𝑗𝑜 16 Example from Daphne Koller
Independence Assumptions of a BN If X, Y, Z are random variables, we write X ⊥ 𝑍 to say “ X is independent of Y ” and • X ⊥ 𝑍 ∣ 𝑎 to say “ X is independent of Y given 𝑎 ” • Topological independencies : A node is independent of all other nodes given its parents, children and children’s parents, together called the node’s Markov Blanket 𝑌 * ⊥ 𝑌 Y ∣ MarkovBlanket 𝑌 * Example: The Markov blanket of 𝐼𝑏𝑧𝑔𝑓𝑤𝑓𝑠 is the set {𝑇𝑓𝑏𝑡𝑝𝑜, 𝐷𝑝𝑜𝑓𝑡𝑢𝑗𝑝𝑜, 𝐺𝑚𝑣} . If we know these variables, 𝐼𝑏𝑧𝑔𝑓𝑤𝑓𝑠 is independent of 𝑁𝑣𝑡𝑑𝑚𝑓𝑄𝑏𝑗𝑜 The Markov blanket of a node shields it from influence of any other node 17 Example from Daphne Koller
Independence Assumptions of a BN Local independencies : A node is independent • with its non-descendants given its parents. ( X i ⊥ NonDescendants( X i ) | Parents( X i )) • Topological independencies : A node is independent of all other nodes given its parents, children and children’s parents —that is given its Markov Blanket. ( X i ? X j | MB( X i )) for all j 6 = i More general notions of independencies exist. • 18 Example from Daphne Koller
Independence Assumptions of a BN Local independencies : A node is independent • with its non-descendants given its parents. ( X i ⊥ NonDescendants( X i ) | Parents( X i )) • Topological independencies : A node is independent of all other nodes given its parents, children and children’s parents —that is given its Markov Blanket. ( X i ? X j | MB( X i )) for all j 6 = i More general notions of independencies exist. • Where do the independence assumptions come from? 19 Example from Daphne Koller
Independence Assumptions of a BN Local independencies : A node is independent • with its non-descendants given its parents. ( X i ⊥ NonDescendants( X i ) | Parents( X i )) • Topological independencies : A node is independent of all other nodes given its parents, children and children’s parents —that is given its Markov Blanket. ( X i ? X j | MB( X i )) for all j 6 = i More general notions of independencies exist. • Where do the independence assumptions come from? Domain knowledge 20 Example from Daphne Koller
We have seen Bayesian networks before • The naïve Bayes model is a simple Bayesian Network – The naïve Bayes assumption is an example of an independence assumption • The hidden Markov model is another Bayesian network 21
Recommend
More recommend