Probability recap CS 188: Artificial Intelligence Conditional - PDF document

Probability recap CS 188: Artificial Intelligence § Conditional probability § Product rule Review of Probability, Bayes’ nets § Chain rule § X, Y independent iff: DISCLAIMER: It is insufficient to simply study these slides, equivalently, iff: ∀ x, y : P ( x | y ) = P ( x ) they are merely meant as a quick refresher of the high-level ideas covered. You need to study all materials covered in equivalently, iff: ∀ x, y : P ( y | x ) = P ( y ) lecture, section, assignments and projects ! § X and Y are conditionally independent given Z iff: equivalently, iff: Pieter Abbeel – UC Berkeley ∀ x, y, z : P ( x | y, z ) = P ( x | z ) Many slides adapted from Dan Klein equivalently, iff: ∀ x, y, z : P ( y | x, z ) = P ( y | z ) 2 Inference by Enumeration Bayes’ Nets Recap § Representation § P(sun)? S T W P § Chain rule -> Bayes’ net = DAG + CPTs summer hot sun 0.30 § Conditional Independences summer hot rain 0.05 § P(sun | winter)? summer cold sun 0.10 § D-separation summer cold rain 0.05 winter hot sun 0.10 § Probabilistic Inference winter hot rain 0.05 § Enumeration (exact, exponential complexity) winter cold sun 0.15 § P(sun | winter, hot)? § Variable elimination (exact, worst-case winter cold rain 0.20 exponential complexity, often better) § Probabilistic inference is NP-complete 3 4 § Sampling (approximate) Chain Rule à Bayes net Probabilities in BNs § Chain rule: can always write any joint distribution as an § Bayes ’ nets implicitly encode joint distributions incremental product of conditional distributions § As a product of local conditional distributions § To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: § Example: § Bayes nets: make conditional independence assumptions of the form: B E P ( x i | x 1 · · · x i − 1 ) = P ( x i | parents ( X i )) A giving us: § This lets us reconstruct any entry of the full joint J M § Not every BN can represent every joint distribution § The topology enforces certain conditional independencies 5 6 1

Example: Alarm Network Size of a Bayes’ Net for E P(E) B P(B) § How big is a joint distribution over N Boolean variables? B urglary E arthqk +e 0.002 +b 0.001 2 N ¬ e 0.998 ¬ b 0.999 § Size of representation if we use the chain rule A larm 2 N B E A P(A|B,E) § How big is an N-node net if nodes have up to k parents? +b +e +a 0.95 J ohn M ary O(N * 2 k+1 ) +b +e ¬ a 0.05 calls calls +b ¬ e +a 0.94 A J P(J|A) A M P(M|A) +b ¬ e ¬ a 0.06 § Both give you the power to calculate +a +j 0.9 +a +m 0.7 ¬ b +e +a 0.29 § BNs: ¬ b +e ¬ a 0.71 +a ¬ j 0.1 +a ¬ m 0.3 § Huge space savings! ¬ a +j 0.05 ¬ a +m 0.01 ¬ b ¬ e +a 0.001 § Easier to elicit local CPTs ¬ a ¬ j 0.95 ¬ a ¬ m 0.99 ¬ b ¬ e ¬ a 0.999 § Faster to answer queries 8 Bayes Nets: Assumptions D-Separation § Assumptions made by specifying the graph: § Question: Are X and Y Active Triples Inactive Triples conditionally independent given evidence vars {Z}? P ( x i | x 1 · · · x i − 1 ) = P ( x i | parents ( X i )) § Yes, if X and Y “ separated ” by Z § Consider all (undirected) paths § Given a Bayes net graph additional conditional from X to Y independences can be read off directly from the graph § No active paths = independence! § Question: Are two nodes guaranteed to be independent given § A path is active if each triple certain evidence? is active: § If no, can prove with a counter example § Causal chain A → B → C where B is unobserved (either direction) § I.e., pick a set of CPT’s, and show that the independence § Common cause A ← B → C assumption is violated by the resulting distribution where B is unobserved § If yes, can prove with § Common effect (aka v-structure) A → B ← C where B or one of its § Algebra (tedious) descendents is observed § D-separation (analyzes graph) § All it takes to block a path is 9 a single inactive segment D-Separation Example ? § Given query ⊥ X j |{ X k 1 , ..., X k n } X i ⊥ L § Shade all evidence nodes Yes § For all (undirected!) paths between and R B Yes § Check whether path is active § If active return ⊥ X j |{ X k 1 , ..., X k n } X i ⊥ D T § (If reaching this point all paths have been checked and shown inactive) Yes T ’ X i ⊥ ⊥ X j |{ X k 1 , ..., X k n } § Return 11 12 2

All Conditional Independences Topology Limits Distributions Y Y X Z § Given a Bayes net structure, can run d- § Given some graph { X ⊥ ⊥ Y, X ⊥ ⊥ Z, Y ⊥ ⊥ Z, topology G, only certain X Z X ⊥ ⊥ Z | Y, X ⊥ ⊥ Y | Z, Y ⊥ ⊥ Z | X } separation to build a complete list of joint distributions can Y { X ⊥ ⊥ Z | Y } be encoded conditional independences that are X Z § The graph structure necessarily true of the form guarantees certain Y (conditional) independences X Z ⊥ X j |{ X k 1 , ..., X k n } X i ⊥ § (There might be more {} independence) § This list determines the set of probability § Adding arcs increases Y Y Y the set of distributions, distributions that can be represented by X Z X Z X Z but has several costs Bayes’ nets with this graph structure § Full conditioning can Y Y Y encode any distribution 13 14 X Z X Z X Z Inference by Enumeration Example: Enumeration § In this simple method, we only need the BN to § Given unlimited time, inference in BNs is easy synthesize the joint entries § Recipe: § State the marginal probabilities you need § Figure out ALL the atomic probabilities you need § Calculate and combine them § Example: B E A J M 15 16 Variable Elimination Variable Elimination Outline § Track objects called factors § Why is inference by enumeration so slow? § Initial factors are local CPTs (one per node) R § You join up the whole joint distribution before you sum out the hidden variables +r ¡ 0.1 ¡ +r ¡ +t ¡ 0.8 ¡ +t ¡ +l ¡ 0.3 ¡ § You end up repeating a lot of work! -‑r ¡ 0.9 ¡ +r ¡ -‑t ¡ 0.2 ¡ +t ¡ -‑l ¡ 0.7 ¡ T -‑r ¡ +t ¡ 0.1 ¡ -‑t ¡ +l ¡ 0.1 ¡ -‑r ¡ -‑t ¡ 0.9 ¡ -‑t ¡ -‑l ¡ 0.9 ¡ § Idea: interleave joining and marginalizing! § Any known values are selected L § Called “ Variable Elimination ” § E.g. if we know , the initial factors are § Still NP-hard, but usually much faster than inference by enumeration +r ¡ 0.1 ¡ +r ¡ +t ¡ 0.8 ¡ +t ¡ +l ¡ 0.3 ¡ -‑r ¡ 0.9 ¡ +r ¡ -‑t ¡ 0.2 ¡ -‑t ¡ +l ¡ 0.1 ¡ -‑r ¡ +t ¡ 0.1 ¡ -‑r ¡ -‑t ¡ 0.9 ¡ 17 § VE: Alternately join factors and eliminate variables 18 3

Variable Elimination Example Variable Elimination Example T +r ¡ 0.1 ¡ T, L L Sum out R Join R Join T Sum out T -‑r ¡ 0.9 ¡ L +r ¡ +t ¡ 0.08 ¡ R +t ¡ 0.17 ¡ +r ¡ +t ¡ 0.8 ¡ +r ¡ -‑t ¡ 0.02 ¡ -‑t ¡ 0.83 ¡ +r ¡ -‑t ¡ 0.2 ¡ -‑r ¡ +t ¡ 0.09 ¡ +t ¡ 0.17 ¡ -‑r ¡ +t ¡ 0.1 ¡ -‑r ¡ -‑t ¡ 0.81 ¡ -‑t ¡ 0.83 ¡ T T +t ¡ +l ¡ 0.051 ¡ -‑r ¡ -‑t ¡ 0.9 ¡ R, T +l ¡ 0.134 ¡ +t ¡ -‑l ¡ 0.119 ¡ -‑l ¡ 0.886 ¡ -‑t ¡ +l ¡ 0.083 ¡ +t ¡ +l ¡ 0.3 ¡ L L -‑t ¡ -‑l ¡ 0.747 ¡ +t ¡ +l ¡ 0.3 ¡ +t ¡ +l ¡ 0.3 ¡ +t ¡ +l ¡ 0.3 ¡ +t ¡ -‑l ¡ 0.7 ¡ L +t ¡ -‑l ¡ 0.7 ¡ +t ¡ -‑l ¡ 0.7 ¡ +t ¡ -‑l ¡ 0.7 ¡ -‑t ¡ +l ¡ 0.1 ¡ -‑t ¡ +l ¡ 0.1 ¡ -‑t ¡ +l ¡ 0.1 ¡ -‑t ¡ +l ¡ 0.1 ¡ -‑t ¡ -‑l ¡ 0.9 ¡ -‑t ¡ -‑l ¡ 0.9 ¡ -‑t ¡ -‑l ¡ 0.9 ¡ -‑t ¡ -‑l ¡ 0.9 ¡ 19 * VE is variable elimination Example Example Choose E Choose A Finish with B Normalize 21 22 Another (bit more abstractly worked General Variable Elimination out) Variable Elimination Example § Query: § Start with initial factors: § Local CPTs (but instantiated by evidence) § While there are still hidden variables (not Q or evidence): § Pick a hidden variable H § Join all factors mentioning H § Eliminate (sum out) H § Join all remaining factors and normalize Computational complexity critically depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size 2 --- as 23 24 they all only have one variable (Z, Z, and X3 respectively). 4

Probability recap CS 188: Artificial Intelligence Conditional - PDF document

Probability recap CS 188: Artificial Intelligence Conditional probability Product rule Review of Probability, Bayes nets Chain rule X, Y independent iff: DISCLAIMER: It is insufficient to simply study these slides,

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Probability Recap CS 188: Artificial Intelligence Hidden Markov Models Conditional probability

Probability recap CS 188: Artificial Intelligence Conditional probability Product rule

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Introduction to Probability Click to go to Table of Contents Slide 5 / 188 Probability One way

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Introduction to Number of Probability favorable outcomes Probability = of an event Total

Today CS 188: Artificial Intelligence Uncertainty Spring 2006 Probability Basics

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Inference and Representation David Sontag New York University Lecture 4, Sept. 29, 2015 David

Low-temperature sprectrum of correlation lengths of the XXZ chain in the massive antiferromagnetic

Experimental Design for Simulation [Law, Ch. 12][Sanchez et al. 1 ] Peter J. Haas CS 590M:

A survey of the model theory of tracial von Neumann algebras Isaac Goldbring University of

Expectation Propagation Tom Minka Microsoft Research, Cambridge, UK 2006 Advanced Tutorial

On decomposition of factor maps between shift spaces on groups - Z to countable amenable groups

On Mixtures of Factor Mixture Analyzers Cinzia Viroli cinzia.viroli@unibo.it Department of

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to