Bayesian networks (1) Lirong Xia Random variables and joint - PowerPoint PPT Presentation

Bayesian networks (1) Lirong Xia

Random variables and joint distributions Ø A random variable is a variable with a domain • Random variables: capital letters, e.g. W, D, L • values: small letters, e.g. w, d, l Ø A joint distribution over a set of random variables: X 1 , X 2, …, X n specifies a real number for each assignment (or outcome) • p(X 1 = x 1 , X 2 = x 2 , …, X n = x n ) p T W , • p( x 1 , x 2 , …, x n ) ( ) T W p Ø This is a special (structured) probability space hot sun 0.4 • Sample space Ω : all combinations of values hot rain 0.1 • probability mass function is p cold sun 0.2 Ø A probabilistic model is a joint distribution over a cold rain 0.3 set of random variables • will be our focus of this course 2

Marginal Distributions Ø Marginal distributions are sub-tables which eliminate variables Ø Marginalization (summing out): combine collapsed rows by adding p X x p X x X , x ( ) ( ) ∑ = = = = 1 1 1 1 2 2 x 2 p T W , ( ) T W p W: sun rain hot sun 0.4 0.6 0.4 hot rain 0.1 T: hot 0.5 0.4 0.1 cold 0.5 cold sun 0.2 0.2 0.3 cold rain 0.3 3

Conditional Distributions Ø Conditional distributions are probability distributions over some variables given fixed values of others Conditional Distributions Joint Distributions ( ) p W T hot = p T W , ( ) W p T W p sun 0.8 hot sun 0.4 rain 0.2 ( ) p W T hot rain 0.1 ( ) p W T cold = cold sun 0.2 W p cold rain 0.3 sun 0.4 rain 0.6 4

Independence Ø Two variables are independent in a joint distribution if for all x , y , the events X= x and Y= y are independent: p X Y , p X p Y ( ) ( ) ( ) = x y p x y , , p x p y ( ) ( ) ( ) ∀ = • The joint distribution factors into a product of two simple ones • Usually variables aren’t independent! 5

The Chain Rule Ø Write any joint distribution as an incremental product of conditional distributions ( ) p x 3 x 1 , x 2 ( ) ( ) = p x 1 ( ) p x 2 x 1 p x 1 , x 2 , x 3 ( ) ( ) = p x 1 , x 2 ,  x n p x i x 1  x i − 1 ∏ i Ø Why is this always true? • Key: p(A|B)=p(A,B)/p(B) 6

Today’s schedule Ø Conditional independence Ø Bayesian networks • definitions • independence 7

Conditional Independence among random variables Ø p(Toothache, Cavity, Catch) Ø If I don’t have a cavity, the probability that the probe catches in it doesn’t depend on whether I have a toothache: • p(+Catch|+Toothache,-Cavity) = p(+Catch|-Cavity) Ø The same independence holds if I have a cavity: • p(+Catch|+Toothache,+Cavity) = p(+Catch|+Cavity) Ø Catch is conditionally independent of toothache given cavity: • p(Catch|Toothache,Cavity) = p(Catch|Cavity) Ø Equivalent statements: • p(Toothache|Catch,Cavity) = p(Toothache|Cavity) • p(Toothache,Catch|Cavity) = p(Toothache|Cavity) × p(Catch|Cavity) • One can be derived from the other easily (part of Homework 1) 8

Conditional Independence Ø Unconditional (absolute) independence very rare Ø Conditional independence is our most basic and robust form of knowledge about uncertain environments: Ø Definition: X and Y are conditionally independent given Z, if • ∀ x, y, z : p( x, y | z )=p( x | z ) × p( y | z ) • or equivalently, ∀ x, y, z : p( x | z, y )=p( x | z ) • X, Y, Z are random variables • written as X ⊥ Y|Z Ø Brain teaser: in a probabilistic model with three random variables XYZ • If X and Y are independent, can we say X and Y are conditionally independent given Z • If X and Y are conditionally independent given Z, can we say X and Y are independent? • Bonus questions in Homework 1 9

The Chain Rule Ø p(X 1 ,…, X n )=p(X 1 ) p(X 2 |X 1 ) p(X 3 |X 1 ,X 2 )… Ø Trivial decomposition: p(Catch, Cavity, Toothache) = p(Cavity) p(Catch |Cavity) p(Toothache|Catch,Cavity) Ø With assumption of conditional independence: • Toothache ⊥ Catch | Cavity p(Toothache, Catch, Cavity) = p(Cavity) p(Catch |Cavity) p(Toothache|Cavity) Ø Bayesian networks/ graphical models help us express conditional independence assumptions 10

Bayesian networks: Big Picture Ø Using full joint distribution tables S T W p summer hot sun 0.30 • Representation: n random variables, at least 2 n entries summer hot rain 0.05 summer cold sun 0.10 • Computation: hard to learn (estimate) summer cold rain 0.05 anything empirically about more than a winter hot sun 0.10 few variables at a time winter hot rain 0.05 winter cold sun 0.15 winter cold rain 0.20 • Bayesian networks: a technique for describing complex joint distributions (models) using simple, local distributions (conditional probabilities) – More properly called graphical models – We describe how variables locally interact – Local interactions chain together to give global, indirect interactions 11

Example Bayesian networks: Car Ø Initial observation: car won’t start Ø Orange: “broken” nodes Ø Green: testable evidence Ø Gray: “hidden variables” to ensure sparse structure, reduce parameters 12

Graphical Model Notation Ø Nodes: variables (with domains) Ø Arcs: interactions • Indicate “direct influence” between variables • Formally: encode conditional independence (more later) Ø For now: imagine that arrows mean direct causation (in general, they don’t!) 13

Example: Coin Flips Ø n independent coin flips (different coins) Ø No interactions between variables: independence • Really? How about independent flips of the same coin? • How about a skillful coin flipper? • Bottom line: build an application-oriented model 14

Example: Traffic Ø Variables: • R: It rains • T: There is traffic Ø Model 1: independence Ø Model 2: rain causes traffic Ø Which model is better? 15

Example: Burglar Alarm Network Ø Variables: • B: Burglary • A: Alarm goes off • M: Mary calls • J: John calls • E: Earthquake! 16

Bayesian network Ø Definition of Bayesian network (Bayes’ net or BN) Ø A set of nodes, one per variable X Ø A directed, acyclic graph Ø A conditional distribution for each node • A collection of distributions over X, one for each combination of parents’ values ( ) p(X| a 1 ,…, a n ) p X A 1 … A n • CPT: conditional probability table • Description of a noisy “causal” process A Bayesian network = Topology (graph) + Local Conditional Probabilities 17

Probabilities in BNs Ø Bayesian networks implicitly encode joint distributions • As a product of local conditional distributions n ( ) ( ) = ( ) p x 1 , x 2 ,  x n ∏ p x i parents X i i = 1 • Example: ( ) p +Cavity, +Catch, -Toothache Ø This lets us reconstruct any entry of the full joint Ø Not every BN can represent every joint distribution • The topology enforces certain conditional independencies 18

Example: Coin Flips p X p X p X ( ) ( ) ( ) 1 2 n … h 0.5 h 0.5 h 0.5 t 0.5 t 0.5 t 0.5 p h h t h = , , , ( ) Only distributions whose variables are absolutely independent can be represented by a Bayesian network with no arcs. 19

Example: Traffic p R ( ) +r 0.25 -r p r , t ( ) 0.75 + − = ( ) p T R +r +t 0.75 +r -t 0.25 -r +t 0.50 -r -t 0.50 20

Example: Alarm Network E p (E) B p (B) +e 0.002 +b 0.001 -e 0.998 -b 0.999 B E A p (A|B,E) +b +e +a 0.95 +b +e -a 0.05 +b -e +a 0.94 +b -e -a 0.06 -b +e +a 0.29 A J p (J|A) A M p (M|A) -b +e -a 0.71 +a +j 0.9 +a +m 0.7 -b -e +a 0.001 +a -m 0.3 +a -j 0.1 -b -e -a 0.999 -a +m 0.01 -a +j 0.05 -a -m 0.99 -a -j 0.95 21

Size of a Bayesian network Ø How big is a joint distribution over N Boolean variables? • 2 N Ø How big is an N-node net if nodes have up to k parents? • O(N × 2 k+1 ) Ø Both give you the power to calculate p(X 1 ,…, X n ) Ø BNs: Huge space savings! Ø Also easier to elicit local CPTs Ø Also turns out to be faster to answer queries 22

Bayesian networks Ø So far: how a Bayesian network encodes a joint distribution Ø Next: how to answer queries about that distribution • Key idea: conditional independence • Main goal: answer queries about conditional independence and influence from the graph Ø After that: how to answer numerical queries (inference) 23

Conditional Independence in a BN Ø Important question about a BN: • Are two nodes independent given certain evidence? • If yes, can prove using algebra (tedious in general) • If no, can prove with a counter example • Example: X: pressure, Y: rain, Z: traffic • Question: are X and Z necessarily independent? • Answer: no. Example: low pressure causes rain, which causes traffic • X can influence Z, Z can influence X (via Y) 24

Causal Chains Ø This configuration is a “causal chain” X: Low pressure Y: Rain ( ) ( ) ( ) p x y z , , p x p y x p z y ( ) = Z: Traffic • Is X independent of Z given Y? ( ) ( ) ( ) p x p y x p z y p x y z , , ( ) ( ) p z x y , = = ( ) ( ) p x y , p x p y x ( ) ( ) p z y = Yes! • Evidence along the chain “blocks” the influence 25

Bayesian networks (1) Lirong Xia Random variables and joint - PowerPoint PPT Presentation

Bayesian networks (1) Lirong Xia Random variables and joint distributions A random variable is a variable with a domain Random variables: capital letters, e.g. W, D, L values: small letters, e.g. w, d, l A joint distribution over

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Bayesian Networks C H A P T E R 1 4 H A S S A N K H O S R A V I S P R I N G 2 0 1 1

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

5.2 Learning Bayesian networks: General idea See Witten et al. 2011. Bayesian (belief) networks

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 19: Bayesian Networks

Bayesian networks Independence Bayesian networks Markov conditions Inference by

Bayesian networks 1 Outline Syntax Semantics Parameterized distributions 2 Bayesian

Why This Matters Bayesian networks have been one of the most important contributions to the

Bayesian Neural Networks - Presenters Group 1: A Practical Bayesian Framework for Backpropagation

Bioinformatics: Network Analysis Probabilistic Modeling: Bayesian Networks COMP 572 (BIOS 572 /

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Lecture 6: Examples of Bayesian Networks and Markov Networks Zhenke Wu Department of

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Modelling Survey Data with Bayesian Networks Marco Scutari scutari@stats.ox.ac.uk Department of

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE

Bayesian Networks Part 1 CS 760@UW-Madison Goals for the lecture you should understand the

Causal Modelling Using Bayesian Networks Outline Methodology:

Graphical Models and Bayesian Networks Required reading: Ghahramani, section 2, Learning

Bayesian networks Compact representa)on of distribu)ons over large

CSE 473: Artificial Intelligence Bayesian Networks: Inference Hanna Hajishirzi Many slides over