cs480 680 machine learning lecture 8 january 30 th 2020
play

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical - PowerPoint PPT Presentation

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1 Outline Graphical Model Bayesian Network Conditional Independency


  1. CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1

  2. Outline • Graphical Model • Bayesian Network • Conditional Independency • Naïve Bayes University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 2

  3. Review: Probability Theory • Sum rule (marginal distributions) 𝑞 𝑦 = $ 𝑞(𝑦, 𝑧) % • Product rule 𝑞 𝑦, 𝑧 = 𝑞 𝑦 𝑧 𝑞 𝑧 From these we have Bayes’ theorem 𝑞 𝑧 𝑦 = 𝑞 𝑦 𝑧 𝑞 𝑧 𝑞 𝑦 The normalization factor 𝑞 𝑦 = * 𝑞 𝑦 𝑧 𝑞 𝑧 𝑒𝑧 University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 3

  4. Graphical Models • Graphical Models (GMs) are depictions of independence/dependence relationships for distributions in a probabilistic model. The whole goal of graphical model is to show the conditional independent property of probability distributions. • GM is a framework for representing, reasoning with, and learning complex problem. • A graph comprises nodes connected by links (edges). • Each node corresponds to a random variable, 𝑌 , and has a value corresponding to the probability of the random variable, 𝑄(𝑌) . • If there is a directed edge from node 𝑌 to node 𝑍 , this indicates that 𝑌 has a direct influence on 𝑍 . This influence is specified by the conditional probability directed acyclic 𝑄(𝑍|𝑌) . • The graph captures the way in which the joint distribution over all of the random variables can be decomposed into a product of factors each depending only on a subset of the variables. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 4

  5. Bayesian Networks Example: Tracey leaves her house and realises that her grass is wet Rain causes the grass to get wet: The random variables are binary; they are either true or false. • The probability of raining through a day 𝑄(𝑆) = 0.4 • The chance that grass gets wet when it rains 𝑄 𝑋 𝑆 = 0.9 • When it doesn’t rain enough to consider the grass is wet enough 𝑄(~𝑋|𝑆) = 0.1 • The probability of grass gets wet without raining, i.e. when a sprinkler is used. 𝑄(𝑋|~𝑆) = 0.2 • The probability of grass is not wet given that it doesn’t rain 𝑄(~𝑋|~𝑆) = 0.8 University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 5

  6. Bayesian Networks Example: Rain causes the grass to get wet: • The joint distribution of 𝑄 𝑆, 𝑋 = 𝑄 𝑆 𝑄(𝑋|𝑆) • The individual (marginal) probability of wet grass can be computed by summing up over the possible values that its parent node can take 𝑄 𝑋 = $ 𝑄 𝑆, 𝑋 = 𝑄 𝑋|𝑆 𝑄 𝑆 + 𝑄 𝑋 ~𝑆 𝑄 ~𝑆 = 0.9×0.4 + 0.2×0.6 = 0.48 : • If we knew that it rained, the probability of wet grass would be 0.9 ; if we knew for sure that it did not, it would be as low as 0.2 ; not knowing whether it rained or not, the probability is 0.48 . University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 6

  7. Bayesian Networks Example: Rain causes the grass to get wet: Bayes’ rule helps us to invert the dependencies and have a diagnosis. • If we know that the grass is wet, the probability that it rained can be calculated as follows: 𝑄 𝑆 𝑋 = 𝑄 𝑋 𝑆 𝑄(𝑆) = 0.75 𝑄(𝑋) • Knowing that the grass is wet increased the probability of rain from 0.4 to 0.75 . University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 7

  8. Bayesian Network P(R) P(S) R=1 0.2 S=1 0.1 • 𝑆 ∈ 0,1 ( 𝑆 = 1 means that it has been raining, and 0 otherwise). • 𝑇 ∈ 0,1 ( 𝑇 = 1 means that Tracey has forgotten to turn off the sprinkler, and 0 otherwise). • 𝐾 ∈ 0,1 ( 𝐾 = 1 means that Jack's grass is wet, and 0 otherwise). • 𝑈 ∈ 0,1 ( 𝑈 = 1 means that Tracey's Grass is wet, and 0 otherwise). A model of Tracey's world corresponds to a probability distribution on the joint set of the variables of interest 𝑞(𝑈, 𝐾, 𝑆, 𝑇) . There are 2 D = 16 states. Using Bayes’ Rule, we have: 𝑄 𝑈, 𝐾, 𝑆, 𝑇 = 𝑄 𝑈 𝐾, 𝑆, 𝑇 𝑄 𝐾, 𝑆, 𝑇 = 𝑄 𝑈 𝐾, 𝑆, 𝑇 𝑄 𝐾|𝑆, 𝑇 𝑄 𝑆, 𝑇 = 𝑄 𝑈 𝐾, 𝑆, 𝑇 𝑄 𝐾|𝑆, 𝑇 𝑄 𝑆|𝑇)𝑄(𝑇 conditioned P(J|R) conditioned conditioned P(T|R,S) J=1 R=1 1 𝑄 𝑈 𝐾, 𝑆, 𝑇 = 𝑄 𝑈 𝑆, 𝑇 , 𝑄 𝐾 𝑆, 𝑇 = 𝑄 𝐾 𝑆 , 𝑄 𝑆 𝑇 = 𝑄 𝑆 T=1 R=1 S=1 0.9 𝑄(𝑈, 𝐾, 𝑆, 𝑇) = 𝑄(𝑈│𝑆, 𝑇) 𝑄(𝐾│𝑆) 𝑄(𝑆) 𝑄(𝑇) J=1 R=0 0.2 T=1 R=0 S=0 0 We need to specify to 4 + 2 + 1 + 1 = 8 values. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 8

  9. Bayesian Network P(R) P(S) R=1 0.2 S=1 0.1 • What is the probability that the sprinkler was on overnight, given that Tracey's grass is wet ∑ N,: 𝑄(𝑈 = 1, 𝐾, 𝑆, 𝑇 = 1) 𝑄 𝑇 = 1|𝑈 = 1 = 𝑄(𝑇 = 1, 𝑈 = 1) = 𝑄(𝑈 = 1) ∑ N,:,O 𝑄(𝑈 = 1, 𝑇, 𝑆, 𝐾) ∑ N,: 𝑄 𝑈 = 1 𝑆, 𝑇 = 1 𝑄 𝐾|𝑆 𝑄 𝑆)𝑄(𝑇 = 1 = ∑ : 𝑄 𝑈 = 1 𝑆, 𝑇 = 1 𝑄 𝑆)𝑄(𝑇 = 1 = ∑ N,:,O 𝑄(𝑈 = 1│𝑆, 𝑇) 𝑄(𝐾│𝑆) 𝑄(𝑆) 𝑄(𝑇) ∑ :,O 𝑄(𝑈 = 1│𝑆, 𝑇) 𝑄(𝑆) 𝑄(𝑇) = 𝑄 𝑈 = 1 𝑆 = 1, 𝑇 = 1 𝑄 𝑆 = 1)𝑄(𝑇 = 1 + 𝑄 𝑈 = 1 𝑆 = 0, 𝑇 = 1 𝑄 𝑆 = 0)𝑄(𝑇 = 1 ∑ :,O 𝑄(𝑈 = 1│𝑆, 𝑇) 𝑄(𝑆) 𝑄(𝑇) 0.1(0.9×0.8 + 1×0.2) = 0.1 0.9×0.8 + 1×0.2 + 0.9(0.8×0 + 1×0.2) = 0.3382 The belief that the sprinkler is on increases above the prior probability 0.1, due to the fact that the grass is wet. • What is the probability that Tracey's sprinkler was on overnight, given that her grass is wet and that Jack's grass is also wet? conditioned P(J|R) conditioned conditioned P(T|R,S) = ∑ : 𝑄(𝑇 = 1, 𝐾 = 1, 𝑈 = 1, 𝑆) 𝑞 𝑇 = 1 𝑈 = 1, 𝐾 = 1 = 𝑄(𝑇 = 1, 𝐾 = 1, 𝑈 = 1) ∑ :,O 𝑄(𝑈 = 1, 𝐾 = 1, 𝑆, 𝑇) 𝑄(𝑈 = 1, 𝐾 = 1) J=1 R=1 1 T=1 R=0 S=1 0.9 ∑ : 𝑄 𝑈 = 1 𝑆, 𝑇 = 1 𝑄 𝐾 = 1|𝑆 𝑄 𝑆)𝑄(𝑇 = 1 1×1×0.2×0.1 + 0.9×0.2×0.8×0.1 = = J=1 R=0 0.2 T=1 R=0 S=0 0 0.8×0.2 0×0.9 + 0.9×1 + 0.2×1(1×0.9 + 1×0.1) ∑ :,O 𝑄(𝑈 = 1│𝑆, 𝑇) 𝑄(𝐾 = 1│𝑆) 𝑄(𝑆) 𝑄(𝑇) = 0.0344 T=1 R=1 S=1 1 0.2144 = 0.1604 T=1 R=1 S=0 1 The probability that the sprinkler is on, given the extra evidence that Jack's grass is wet, is lower than the probability that the grass is wet given only that Tracey's grass is wet. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 9

  10. Bayesian Network Earthquake Burglary • If there is an arrow from node 𝐵 to another node 𝐶 , 𝐵 is called a parent of 𝐶 , and 𝐶 is a child of 𝐵 . In addition, the Radio Alarm parents of 𝐵 are the ancestors of 𝐶 . • The set of parent nodes of a node 𝑦 S is denoted by 𝑞𝑏𝑠𝑓𝑜𝑢 𝑡 Z [ . Call _ 𝑄 𝑦 \ , 𝑦 ] , … , 𝑦 _ = ` 𝑄(𝑦 S |𝑞𝑏𝑠𝑓𝑜𝑢 𝑡 Z [ ) Sa\ • If the alarm rings your neighbour may call you at work to let you know. When on your rush way home you hear a radio report of an earthquake, the degree of confidence (i.e. belief) that there was a burglary will diminish. 𝑄 𝐹, 𝐶, 𝑆, 𝐵, 𝐷 = 𝑄 𝐹 𝑄 𝐶 𝑄 𝑆 𝐹 𝑄 𝐵 𝐹, 𝐶 𝑄(𝐷|𝐵) University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 10

  11. Bayesian Network Earthquake Burglary • The node call (child) is independent of burglary and earthquake (ancestors) given the node alarm (parent). The node call is the descendent of node Radio Alarm alarm and earthquake. • Given alarm , call is conditionally independent of Call burglary and earthquake . 𝑄(𝐵, 𝐹, 𝐶) Earthqua Burglary ke • Using conditional independence reduces the 𝑓 𝑐 𝑄(𝐵|𝑓, 𝑐) 𝑄(¬𝐵|𝑓, 𝑐) dimensionality of the network from full joint 𝑓 ¬𝑐 𝑄(𝐵|𝑓, ¬𝑐) 𝑄(¬𝐵|𝑓, ¬𝑐) probability table, 2 d − 1 = 31 to 1 + 1 + 4 + 2 + ¬𝑓 𝑐 𝑄(𝐵|¬𝑓, 𝑐) 𝑄(¬𝐵|¬𝑓, 𝑐) 2 = 10 parameters . ¬𝑓 ¬𝑐 𝑄(𝐵|¬𝑓, ¬𝑐) 𝑄(¬𝐵|¬𝑓, ¬𝑐) The conditional probability table for the node alarm, ¬ means “not”. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 11

  12. Independence 𝑄 𝑦, 𝑧 = 𝑄 𝑦 𝑄(𝑧) 𝑄 𝑦, 𝑧 = 𝑄 𝑦 𝑄(𝑧|𝑦) 𝑄 𝑦, 𝑧 = 𝑄 𝑧 𝑄(𝑦|𝑧) • Two sets of variables 𝐵 and 𝐶 are independent iff 𝑄(𝐵) = 𝑄(𝐵|𝐶) • or equivalently we can write 𝑄 𝐵, 𝐶 = 𝑄 𝐵 𝑄 𝐶 In this case we write 𝐵∐𝐶 . Let 𝐷 is the parent of two nodes 𝐵 and 𝐶 . Conditional Independence: Variable 𝐵 and 𝐶 are conditionally independent events given all states of variable 𝐷 if C 𝑄 𝐵, 𝐶 𝐷 = 𝑄 𝐵 𝐷 𝑄(𝐶|𝐷) This is written as 𝐵∐𝐶|𝐷 . 𝑄 𝐵, 𝐶 𝐷 = 𝑄(𝐵, 𝐶, 𝐷) = 𝑄 𝐷 𝑄 𝐵 𝐷 𝑄(𝐶|𝐷) A B = 𝑄 𝐵 𝐷 𝑄(𝐶|𝐷) 𝑄(𝐷) 𝑄(𝐷) Tail-to-tail connection University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 12

Recommend


More recommend