lecture 19
play

Lecture 19 Conditional Independence, Bayesian networks intro 1 - PowerPoint PPT Presentation

Lecture 19 Conditional Independence, Bayesian networks intro 1 Announcement nouncement Assignment 4 will be out on next week. Due Friday Dec 1 you can still use late days if you have any left) 2 Lecture cture Ov Overvie rview


  1. Lecture 19 Conditional Independence, Bayesian networks intro 1

  2. Announcement nouncement • Assignment 4 will be out on next week. • Due Friday Dec 1 • you can still use late days if you have any left) 2

  3. Lecture cture Ov Overvie rview • Recap lecture 18 • Marginal Independence • Conditional Independence • Bayesian Networks Introduction 3

  4. Proba obabili bility ty Dis istri tributions butions Consider the case where possible worlds are simply assignments to one random variable. Definition (probability distribution) A probability distribution P on a random variable X is a function dom(X)  [0,1] such that x  P(X=x) Example: X represents a female adult’s hight in Canada with domain {short, normal, tall} – based on some definition of these terms short  P(hight = short) = 0.2 normal  P(hight = normal) = 0.5 tall  P(hight = tall) = 0.3 4

  5. Joint nt Pr Probabilit ility y Distrib tributio ution (JPD PD) • Joint probability distribution over random variables X 1 , …, X n : • a probability distribution over the joint random variable <X 1 , …, X n > with domain dom(X 1 ) × … × dom(X n ) (the Cartesian product) • Think of a joint distribution over n variables as the table of the corresponding possible worlds • There is a column (dimension) for each variable, and one for the probability • Each row corresponds to an assignment X 1 = x 1 , …, X n = x n and its probability P(X 1 = x 1 , … , X n = x n ) We can also write P(X 1 = x 1  …  X n = x n ) • • The sum of probabilities across the whole table is 1. Weather Temperature µ(w) {Weather, Temperature} sunny hot 0.10 example from before sunny mild 0.20 sunny cold 0.10 cloudy hot 0.05 cloudy mild 0.35 5 cloudy cold 0.20

  6. Recap: cap: Condi nditioning tioning • Conditioning: revise beliefs based on new observations • We need to integrate two sources of knowledge • Prior probability distribution P(X): all background knowledge • New evidence e • Combine the two to form a posterior probability distribution • The conditional probability P(h|e) 6

  7. Recap: cap: Condi nditional tional probabil obability ity Possible Weather Temperature µ(w) T P(T|W=sunny) world hot 0.10/0.40=0.25 w 1 sunny hot 0.10 mild 0.20/0.40=0.50 w 2 sunny mild 0.20 cold 0.10/0.40=0.25 w 3 sunny cold 0.10 JPD for P(T|W=sunny ) w 4 cloudy hot 0.05 w 5 cloudy mild 0.35 w 6 cloudy cold 0.20 7

  8. Recap: cap: In Infer ference ence by Enumeration meration • Great, we can compute arbitrary probabilities now! • Given • Prior joint probability distribution (JPD) on set of variab riables les X • specific values e for the evidenc idence e variables ariables E (subset of X) • We want to compute • posterior joint distribution of quer ery y variables ariables Y (a subset of X) given evidence e • Step 1: Condition to get distribution P(X|e) • Step 2: Marginalize to get distribution P(Y|e) 8

  9. In Infer ference ence by Enumerati umeration: on: example ample • Given P(W,C,T) as JPD below, and evidence e : “ Wind=yes ” • What is the probability that it is cold? I.e., P(T= cold | W=yes) • Step 1: condition to get distribution P(C, T| W=yes) Windy Cloudy Temperature P(W, C, T) Cloudy Temperature P(C, T| W=yes) W C T C T yes no hot 0.04 no 0.04/0.43  0.10 hot yes no mild 0.09 no 0.09/0.43  0.21 mild yes no cold 0.07 no 0.07/0.43  0.16 cold yes yes hot 0.01 yes 0.01/0.43  0.02 hot yes yes mild 0.10 yes 0.10/0.43  0.23 mild yes yes cold 0.12 yes 0.12/0.43  0.28 cold no no hot 0.06 no no mild 0.11 𝑄(𝐷 = 𝑑 ∧ 𝑈 = 𝑢|𝑋 = 𝑧𝑓𝑡) = no no cold 0.03 𝑄(𝐷=𝑑ٿ 𝑈=𝑢ٿ 𝑋=𝑧𝑓𝑡) no yes hot 0.04 = 𝑄(𝑋=𝑧𝑓𝑡) no yes mild 0.25 9 no yes cold 0.08

  10. In Infer ference ence by Enumerati umeration: on: example ample • Given P(W,C,T) as JPD in previous slide, and evidence e : “ Wind=yes ” • What is the probability that it is cold? I.e., P(T=cold | W=yes) • Step 2: marginalize over Cloudy to get distribution P(T | W=yes) Cloudy Temperature P(C, T| W=yes) Temperature P(T| W=yes) C T T sunny hot 0.10 hot 0.10+0.02 = 0.12 sunny mild 0.21 mild 0.21+0.23 = 0.44 sunny 0.16 cold 0.16+0.28 = 0.44 cold cloudy 0.02 hot cloudy 0.23 mild cloudy cold 0.28 P(T=cold | W=yes) is a specific entry of the • This is a probability distribution: it defines the probability probability distribution for of all the possible values of Temperature (three here), P(T | W=yes ) given the observed value for Windy (yes). • Because this is a probability distribution, the sum of all its values is 10

  11. Conditi ition onal al Pr Probabili ility y among g Random m Va Variabl bles es It expresses the conditional probability of P(X | Y) = P(X , Y) / P(Y) each possible value for X given each possible value for Y P(X | Y) = P(Temperature | Weather) = P(Temperature  Weather) / P(Weather) Example: Temperature {hot, cold}; Weather = {sunny, cloudy} P(Temperature | Weather) T = hot T = cold W = sunny P(hot|sunny) P(cold|sunny) W = cloudy P(hot|cloudy) P(cold|cloudy) Which of the following is true? A. The probabilities in each row should sum to 1 B. The probabilities in each column should sum to 1 C. Both of the above D. None of the above 11

  12. Conditional Probability among Random Variables It expresses the conditional probability P(X | Y) = P(X , Y) / P(Y) of each possible value for X given each possible value for Y P(X | Y) = P(Temperature | Weather) = P(Temperature  Weather) / P(Weather) Example: Temperature {hot, cold}; Weather = {sunny, cloudy} P(Temperature | Weather) T = hot T = cold P(T | Weather = sunny) W = sunny P(hot|sunny) P(cold|sunny) W = cloudy P(hot|cloudy) P(cold|cloudy) P(T | Weather = cloudy) A. The probabilities in each row should These are two JPDs! sum to 1 12

  13. Recap: cap: In Infer ference ence by Enumeration meration • Great, we can compute arbitrary probabilities now! • Given • Prior joint probability distribution (JPD) on set of variab riables les X • specific values e for the evidence idence var ariables iables E (subset of X) • We want to compute • posterior joint distribution of query ery var ariables iables Y (a subset of X) given evidence e • Step 1: Condition to get distribution P(X|e) • Step 2: Marginalize to get distribution P(Y|e) Generally applicable, but memory-heavy and slow We will see a better way to do probabilistic inference 13

  14. Bayes yes rule le and d Chain ain Rule le  ( | ) P fire alarm 14

  15. Bayes yes rule le and d Chain ain Rule le 15

  16. Product oduct Rule le • By definition, we know that :  ( ) P f f  2 1 ( | ) P f f 2 1 ( ) P f 1 • We can rewrite this to    ( ) ( | ) ( ) P f f P f f P f 2 1 2 1 1 • In general

  17. Chain ain Rule le 1 Theorem: Chain Rule 𝑜 𝑄(𝑔 1 ٿ…ٿ𝑔 𝑜 ) = ෑ 𝑄(𝑔𝑗|𝑔 𝑗 − 1ٿ…ٿ 𝑔 1 ) 𝑗=1 17

  18. Chain ain Rule le example ample 𝑜 𝑄(𝑔 1 ٿ…ٿ𝑔 𝑜 ) = ෑ 𝑄(𝑔𝑗|𝑔 𝑗 − 1ٿ…ٿ 𝑔 1 ) 𝑗=1 P(A,B,C,D) = P(D|A,B,C) × P(A,B,C) = = P(D|A,B,C) × P(C|A,B) × P(A,B) = P(D|A,B,C) × P(C|B,A) × P(B|A) × P(A) = P(A)P(B|A)P(C|A,B)P(D|A,B,C) 18

  19. Chain ain Rule le • Allows representing a Join Probability Distribution (JPD) as the product of conditional probability distributions Theorem: Chain Rule 𝑜 𝑄(𝑔 1 ٿ…ٿ𝑔 𝑜 ) = ෑ 𝑄(𝑔𝑗|𝑔 𝑗 − 1ٿ…ٿ 𝑔 1 ) 𝑗=1 19

  20. Wh Why does es th the chain in rule le help lp us? We will see how, under specific circumstances (variables independence), this rule helps gain compactness • We can represent the JPD as a product of marginal distributions • We can simplify some terms when the variables involved are marginally independent or conditionally independent 20

  21. Lecture cture Ov Overvie rview • Recap lecture 18 • Marginal Independence • Conditional Independence • Bayesian Networks Introduction 21

  22. Margi rginal nal In Independenc ependence • Intuitively: if X ╨ Y , then • learning that Y=y does not change your belief in X • and this is true for all values y that Y could take • For example, weather is marginally independent of the result of a coin toss 22

  23. Examples mples fo for marginal ginal in independence dependence Weather W Temperature T P(W,T) • Is Temperature marginally sunny hot 0.10 independent of Weather (see sunny mild 0.20 previous example)? sunny cold 0.10 cloudy hot 0.05 cloudy mild 0.35 cloudy cold 0.20 23

  24. • Is Temperature marginally independent of Weather (see previous example Weather W Temperature T P(W,T) A. yes sunny hot 0.10 sunny mild 0.20 B. no sunny cold 0.10 C. It depends of the value of T cloudy hot 0.05 cloudy mild 0.35 D. It depends of the value of W cloudy cold 0.20 T P(T) T P(T|W=sunny) hot 0.15 hot 0.25 mild 0.55 mild 0.50 cold 0.30 cold 0.25 24

Recommend


More recommend