Probability recap CS 188: Artificial Intelligence § Conditional probability § Product rule Bayes ’ Nets § Chain rule Representation and Independence § X, Y independent iff: Pieter Abbeel – UC Berkeley § X and Y are conditionally independent given Z iff: Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore 2 Probabilistic Models Bayes ’ Nets: Big Picture § Models describe how (a portion of) the world works § Two problems with using full joint distribution tables as our probabilistic models: § Models are always simplifications § Unless there are only a few variables, the joint is WAY too big to § May not account for every variable represent explicitly. For n variables with domain size d, joint table has d n entries --- exponential in n. § May not account for all interactions between variables § “ All models are wrong; but some are useful. ” § Hard to learn (estimate) anything empirically about more than a – George E. P. Box few variables at a time § What do we do with probabilistic models? § Bayes ’ nets: a technique for describing complex joint § We (or our agents) need to reason about unknown variables, distributions (models) using simple, local distributions given evidence (conditional probabilities) § Example: explanation (diagnostic reasoning) § More properly called graphical models § Example: prediction (causal reasoning) § We describe how variables locally interact § Example: value of information § Local interactions chain together to give global, indirect interactions 3 4 Bayes’ Nets Graphical Model Notation § Representation § Nodes: variables (with domains) § Can be assigned (observed) or § Informal first introduction of Bayes’ nets unassigned (unobserved) through causality “intuition” § Arcs: interactions § More formal introduction of Bayes’ nets § Similar to CSP constraints § Indicate “ direct influence ” between variables § Conditional Independences § Formally: encode conditional independence (more later) § Probabilistic Inference § For now: imagine that arrows mean direct causation (in general, they don ’ t!) § Learning Bayes’ Nets from Data 5 6 1
Example: Coin Flips Example: Traffic § N independent coin flips § Variables: § R: It rains R § T: There is traffic X 1 X 2 X n § Model 1: independence T § Model 2: rain causes traffic § No interactions between variables: absolute independence § Why is an agent using model 2 better? 7 8 Example: Traffic II Example: Alarm Network § Let ’ s build a causal graphical model § Variables § B: Burglary § Variables § A: Alarm goes off § T: Traffic § M: Mary calls § R: It rains § J: John calls § L: Low pressure § D: Roof drips § E: Earthquake! § B: Ballgame § C: Cavity 9 10 Bayes ’ Net Semantics Probabilities in BNs § Let ’ s formalize the semantics of a § Bayes ’ nets implicitly encode joint distributions Bayes ’ net § As a product of local conditional distributions A 1 A n § To see what probability a BN gives to a full assignment, multiply § A set of nodes, one per variable X all the relevant conditionals together: § A directed, acyclic graph X § A conditional distribution for each node § Example: § A collection of distributions over X, one for each combination of parents ’ values § CPT: conditional probability table § This lets us reconstruct any entry of the full joint § Description of a noisy “ causal ” process § Not every BN can represent every joint distribution § The topology enforces certain conditional independencies A Bayes net = Topology (graph) + Local Conditional Probabilities 11 12 2
Example: Coin Flips Example: Traffic X 1 X 2 X n +r 1/4 R ¬ r 3/4 h 0.5 h 0.5 h 0.5 t 0.5 t 0.5 t 0.5 +r +t 3/4 T ¬ t 1/4 ¬ r +t 1/2 ¬ t 1/2 Only distributions whose variables are absolutely independent can be represented by a Bayes ’ net with no arcs. 13 14 Example: Alarm Network Example Bayes ’ Net: Insurance E P(E) B P(B) B urglary E arthqk +e 0.002 +b 0.001 ¬ e 0.998 ¬ b 0.999 A larm B E A P(A|B,E) +b +e +a 0.95 J ohn M ary +b +e ¬ a 0.05 calls calls +b ¬ e +a 0.94 A J P(J|A) A M P(M|A) +b ¬ e ¬ a 0.06 ¬ b +e +a 0.29 +a +j 0.9 +a +m 0.7 ¬ b +e ¬ a 0.71 +a ¬ j 0.1 +a ¬ m 0.3 ¬ b ¬ e +a 0.001 ¬ a +j 0.05 ¬ a +m 0.01 ¬ a ¬ j 0.95 ¬ a ¬ m 0.99 ¬ b ¬ e ¬ a 0.999 16 Example Bayes ’ Net: Car Build your own Bayes nets! § http://www.aispace.org/bayes/index.shtml 17 18 3
Size of a Bayes ’ Net Bayes’ Nets § How big is a joint distribution over N Boolean variables? § Representation 2 N § Informal first introduction of Bayes’ nets through causality “intuition” § How big is an N-node net if nodes have up to k parents? § More formal introduction of Bayes’ nets O(N * 2 k+1 ) § Conditional Independences § Both give you the power to calculate § BNs: Huge space savings! § Probabilistic Inference § Also easier to elicit local CPTs § Also turns out to be faster to answer queries (coming) § Learning Bayes’ Nets from Data 21 22 Representing Joint Probability Chain Rule à Bayes’ net Distributions § Chain rule representation: applies to ALL distributions § Table representation : § Pick any ordering of variables, rename accordingly as x 1 , x 2 , … , x n d n -1 number of parameters: Exponential in n § Chain rule representation: number of parameters: (d-1) + d(d-1) + d 2 (d-1)+ … +d n-1 (d-1) = d n -1 § Bayes’ net representation: makes assumptions § Pick any ordering of variables, rename accordingly as x 1 , x 2 , … , x n number of parameters: (d-1) + d(d-1) + d 2 (d-1)+ … +d n-1 (d-1) = d n -1 § Pick any directed acyclic graph consistent with the ordering § Assume following conditional independencies: Size of CPT = (number of different joint instantiations of the preceding variables) P ( x i | x 1 · · · x i − 1 ) = P ( x i | parents ( X i )) times (number of values current variable can take on minus 1) à Joint: à § Both can represent any distribution over the n random variables. Linear Makes sense same number of parameters needs to be stored. number of parameters: (maximum number of parents = K) in n § Chain rule applies to all orderings of the variables, so for a given 24 distribution we can represent it in n! = n factorial = n(n-1)(n-2) … 2.1 23 different ways with the chain rule Note: no causality assumption made anywhere. Causality? Example: Traffic § When Bayes ’ nets reflect the true causal patterns: § Basic traffic net § Often simpler (nodes have fewer parents) § Often easier to think about § Let ’ s multiply out the joint § Often easier to elicit from experts § BNs need not actually be causal § Sometimes no causal net exists over the domain r 1/4 R r t 3/16 § E.g. consider the variables Traffic and Drips ¬ r 3/4 r ¬ t 1/16 § End up with arrows that reflect correlation, not causation ¬ r t 6/16 § What do the arrows really mean? ¬ r ¬ t 6/16 r t 3/4 § Topology may happen to encode causal structure T ¬ t 1/4 § Topology only guaranteed to encode conditional independence ¬ r t 1/2 ¬ t 1/2 25 26 4
Example: Reverse Traffic Example: Coins § Reverse causality? § Extra arcs don ’ t prevent representing independence, just allow non-independence X 1 X 2 X 1 X 2 t 9/16 T r t 3/16 ¬ t 7/16 r ¬ t 1/16 ¬ r t 6/16 h 0.5 h 0.5 h 0.5 h | h 0.5 ¬ r ¬ t 6/16 t 0.5 t 0.5 t 0.5 t | h 0.5 t r 1/3 R h | t 0.5 ¬ r 2/3 § Adding unneeded arcs isn ’ t t | t 0.5 ¬ t r 1/7 wrong, it ’ s just inefficient ¬ r 6/7 27 28 Bayes’ Nets Bayes Nets: Assumptions § To go from chain rule to Bayes’ net representation, we § Representation made the following assumption about the distribution: § Informal first introduction of Bayes’ nets P ( x i | x 1 · · · x i − 1 ) = P ( x i | parents ( X i )) through causality “intuition” § Turns out that probability distributions that satisfy the above § More formal introduction of Bayes’ nets (“chain-rule à Bayes net”) conditional independence assumptions § Conditional Independences § often can be guaranteed to have many more conditional independences § These guaranteed additional conditional independences can be § Probabilistic Inference read off directly from the graph § Learning Bayes’ Nets from Data § Important for modeling: understand assumptions made 29 30 when choosing a Bayes net graph Example Independence in a BN § Given a Bayes net graph X Y Z W § Important question: Are two nodes guaranteed to be independent given certain evidence? § Conditional independence assumptions directly from simplifications in chain rule: Equivalent question: Are two nodes independent given the evidence in all distributions that can be encoded with the Bayes net graph? § Before proceeding: How about opposite question: Are § Additional implied conditional independence two nodes guaranteed to be dependent given certain assumptions? evidence? § No! For any BN graph you can choose all CPT’s such that all variables are independent by having P(X | Pa(X) = paX) not 31 depend on the value of the parents. Simple way of doing so: pick all entries in all CPTs equal to 0.5 (assuming binary variables) 5
Recommend
More recommend