Announcements Introduction to Artificial Intelligence • How was mid-term? V22.0472-001 Fall 2009 • Will grade mid-term / assignment 2 this Lecture 14: Bayes’ Nets 2 Lecture 14: Bayes Nets 2 weekend weekend • Assignment 3 due this time next week Rob Fergus – Dept of Computer Science, Courant Institute, NYU Slides from Karen Livescu, Jeff Blimes , • Office hours today after class Dan Klein, Stuart Russell or Andrew Moore Example Bayes’ Net Bayes’ Nets • A Bayes’ net is an efficient encoding of a probabilistic model of a domain • Questions we can ask: Q • Inference: given a fixed BN, what is P(X | e)? • Representation: given a fixed BN, what kinds of distributions can it encode? • Modeling: what BN is most appropriate for a given domain? 3 4 Example: Traffic Bayes’ Net Semantics • Variables • A Bayes’ net: L • A set of nodes, one per variable X A 1 A n • T: Traffic • A directed, acyclic graph • R: It rains • A conditional distribution of each variable R B • L: Low pressure conditioned on its parents (the parameters θ ) • D: Roof drips X • B: Ballgame D T • Semantics: • A BN defines a joint probability distribution over its variables: 5 6 1
Building the (Entire) Joint Example: Alarm Network • We can take a Bayes’ net and build any entry from the full joint distribution it encodes • Typically, there’s no reason to build ALL of it • We build what we need on the fly • To emphasize: every BN over a domain implicitly defines a joint distribution over that domain, specified by local probabilities and graph structure 7 8 Size of a Bayes’ Net Bayes’ Nets • How big is a joint distribution over N Boolean variables? • So far: • What is a Bayes’ net? 2 N • What joint distribution does it encode? • Next: how to answer queries about that distribution • How big is an N-node net if nodes have up to k parents? • Key idea: conditional independence Key idea: conditional independence O(N * 2 k+1 ) O(N * 2 k+1 ) • Last class: assembled BNs using an intuitive notion of conditional independence as causality • Today: formalize these ideas • Both give you the power to calculate • Main goal: answer queries about conditional independence and • BNs: Huge space savings! influence • Also easier to elicit local CPTs • After that: how to answer numerical queries (inference) • Also turns out to be faster to answer queries (coming) 9 10 Conditional Independence Example: Independence • Reminder: independence • For this graph, you can fiddle with θ (the CPTs) all you want, but you won’t be able to represent any distribution in which • X and Y are independent if the flips are dependent! • X and Y are conditionally independent given Z X 1 X 2 • (Conditional) independence is a property of a h 0.5 h 0.5 distribution t 0.5 t 0.5 All distributions 11 12 2
Topology Limits Distributions Independence in a BN Y • Important question about a BN: • Given some graph topology G, only certain • Are two nodes independent given certain evidence? X Z joint distributions can be Y • If yes, can calculate using algebra (really tedious) encoded X Z • If no, can prove with a counter example • The graph structure guarantees certain g • Example: • Example: (conditional) independences X Y Z • (There might be more independence) • Adding arcs increases the • Question: are X and Z independent? set of distributions, but • Answer: not necessarily , we’ve seen examples otherwise: low has several costs pressure causes rain which causes traffic. Y • X can influence Z, Z can influence X (via Y) X Z • Addendum: they could be independent: how? 13 14 1. Causal Chains 2. Common Cause • This configuration is a “causal chain” • Another basic configuration: two effects of the same cause Y X: Low pressure X Y Z Y: Rain • Are X and Z independent? Z: Traffic X Z • Are X and Z independent given Y? p g • Is X independent of Z given Y? Y: Midterm exam X: Email list busy Z: Library full Yes! Yes! • Observing the cause blocks influence between effects. • Evidence along the chain “blocks” the influence 15 16 Is height independent of Common Cause Example: Is height independent of hair length? hair length? (2) x x x x long x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x L L mid x x x x x x x x x x x x x x x x x x x x x x x short x x x x x x x x x x 5’ 6’ 7’ H Slide credit: Karen Livescu Slide credit: Karen Livescu 3
Is height independent of 3. Common Effect hair length? (3) • Generally, no • Last configuration: two causes of one • If gender known, yes effect (v-structures) X Z • This is the “common cause” scenario • Are X and Z independent? gender • Yes: remember the ballgame and the rain G causing traffic, no correlation? causing traffic no correlation? Y • Still need to prove they must be (try it!) • Are X and Z independent given Y? hair L H height length X: Raining • No: remember that seeing traffic put the rain and the ballgame in competition? Z: Ballgame • This is backwards from the other cases Y: Traffic ≠ H ⊥ p ( h | l ) p ( h ) L • Observing the effect enables influence between = ⊥ p ( h | l , g ) p ( h | g ) | H L G effects. 20 Slide credit: Karen Livescu More explaining away... Common Effect Example • Let X, Z be two i.i.d coin tosses {0,1} pipes faucet caulking drain upstairs • Let Y = X + Z C 1 C 2 C 3 C 4 C 5 • If we observe Y then X and Z become coupled X Z Y L 0 0 0 leak 1 0 1 0 1 1 1 1 2 = ⊥ ∀ ( | ) ( ) p c c p c C C i , j i j i i j ≠ ⊥ ∀ p ( c | c , l ) p ( c | l ) C C | L i , j • P(X=1|Z=1) = 0.25 but P(X=1|Z=1,Y=2) = 1 i j i i j Slide credit: Karen Livescu Examples of the three cases The General Case • Any complex example can be analyzed using Greenhouse Global SUVs Gasses Warming these three canonical cases • General question: in a given BN are two • General question: in a given BN, are two Lung Bad d Smoking Cancer Breath variables independent (given evidence)? • Solution: analyze the graph Genetics Cancer Smoking Page 23 24 Slide: J. Bilmes 4
Reachability Reachability (the Bayes’ Ball) Correct algorithm: • Recipe: shade evidence nodes • S L • Shade in evidence • Start at source node • Attempt 1: if two nodes are connected • Try to reach target by search X X by an undirected path not blocked by a R B • States: pair of (node X, previous shaded node, they are conditionally state S) S S dependent d d • Successor function: • X unobserved: • To any child • Almost works, but not quite S D T • To any parent if coming from a child • Where does it break? • X observed: • Answer: the v-structure at T doesn’t count • From parent to parent X X as a link in a path unless “inactive” • If you can’t reach a node, it’s conditionally independent of the T’ S start node given evidence 25 26 Reachability (D-Separation) Example Active Triples Inactive Triples • Question: Are X and Y conditionally independent given evidence variables {Z}? • Look for “active paths” from X to Y Yes • No active paths = independence! • A path is active if each triple is • A path is active if each triple is either a: Causal chain A → B → C where B is • unobserved (either direction) Common cause A ← B → C where B • is unobserved • Common effect (aka v-structure) A → B ← C where B or one of its descendents is observed Also known as Bayes Ball 28 27 Example Example • Variables: L • R: Raining R Yes • T: Traffic R B Yes Yes • D: Roof drips D: Roof drips T D • S: I’m sad • Questions: D T S Yes Yes T’ 29 30 5
Causality? Example: Coins • When Bayes’ nets reflect the true causal patterns: • Extra arcs don’t prevent representing independence, • Often simpler (nodes have fewer parents) just allow non-independence • Often easier to think about • Often easier to elicit from experts • BNs need not actually be causal y X 1 X 2 X 1 X 2 1 2 1 2 • Sometimes no causal net exists over the domain • E.g. consider the variables Traffic and Drips • End up with arrows that reflect correlation, not causation h 0.5 h 0.5 h 0.5 h | h 0.5 • What do the arrows really mean? t 0.5 t 0.5 t 0.5 t | h 0.5 • Topology may happen to encode causal structure h | t 0.5 • Topology only guaranteed to encode conditional independence t | t 0.5 31 32 Changing Bayes’ Net Structure Example: Alternate Alarm If we reverse the edges, we • The same joint distribution can be encoded in make different conditional B urglary E arthquake many different Bayes’ nets independence assumptions • Causal structure tends to be the simplest J ohn calls M ary calls A l A larm • Analysis question: given some edges, what other edges do you need to add? A larm J ohn calls • One answer: fully connect the graph M ary calls • Better answer: don’t make any false conditional To capture the same joint independence assumptions B urglary E arthquake distribution, we have to add 33 more edges to the graph 34 Summary • Bayes nets compactly encode joint distributions • Guaranteed independencies of distributions can be deduced from BN graph structure • The Bayes’ ball algorithm (aka d-separation) • A Bayes’ net may have other independencies that are not detectable until you inspect its specific distribution 35 6
Recommend
More recommend