outline graphical models part i
play

Outline Graphical Models - Part I Greg Mori - CMPT 419/726 - PDF document

Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Outline Graphical Models - Part I Greg Mori - CMPT 419/726 Probabilistic Models Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Bayesian Networks


  1. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Outline Graphical Models - Part I Greg Mori - CMPT 419/726 Probabilistic Models Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Bayesian Networks Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Probabilistic Models Reminder - Three Tricks • Bayes’ rule: • We now turn our focus to probabilistic models for pattern p ( Y | X ) = p ( X | Y ) p ( Y ) recognition = α p ( X | Y ) p ( Y ) • Probabilities express beliefs about uncertain events, useful p ( X ) for decision making, combining sources of information • Marginalization: • Key quantity in probabilistic reasoning is the joint � distribution � p ( X ) = p ( X , Y = y ) or p ( X ) = p ( X , Y = y ) dy p ( x 1 , x 2 , . . . , x K ) y where x 1 to x K are all variables in model • Product rule: • Address two problems p ( X , Y ) = p ( X ) p ( Y | X ) • Inference: answering queries given the joint distribution • All 3 work with extra conditioning, e.g.: • Learning: deciding what the joint distribution is (involves inference) � p ( X | Z ) = p ( X , Y = y | Z ) • All inference and learning problems involve manipulations y of the joint distribution p ( Y | X , Z ) = α p ( X | Y , Z ) p ( Y | Z )

  2. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Joint Distribution Joint Distribution toothache toothache L catch catch catch catch L L toothache toothache L .108 .012 .072 .008 cavity catch catch catch catch L L cavity .016 .064 .144 .576 L cavity .108 .012 .072 .008 .016 .064 .144 .576 cavity L • Consider model with 3 boolean random variables: cavity , catch , toothache • Consider model with 3 boolean random variables: cavity , • Can answer query such as catch , toothache • Can answer query such as p ( ¬ cavity | toothache ) = p ( ¬ cavity , toothache ) p ( toothache ) p ( ¬ cavity | toothache ) 0 . 016 + 0 . 064 p ( ¬ cavity | toothache ) = 0 . 108 + 0 . 012 + 0 . 016 + 0 . 064 = 0 . 4 Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Joint Distribution Problems • In general, to answer a query on random variables Q = Q 1 , . . . , Q N given evidence E = e , E = E 1 , . . . , E M , • The joint distribution is large • e. g. with K boolean random variables, 2 K entries e = e 1 , . . . , e M : • Inference is slow, previous summations take O ( 2 K ) time p ( Q , E = e ) p ( Q | E = e ) = • Learning is difficult, data for 2 K parameters p ( E = e ) � • Analogous problems for continuous random variables h p ( Q , E = e , H = h ) = � q , h p ( Q = q , E = e , H = h )

  3. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Reminder - Independence Reminder - Conditional Independence • p ( Toothache , Cavity , Catch ) has 2 3 − 1 = 7 independent Cavity Cavity entries Toothache Catch decomposes into Toothache Catch Weather • If I have a cavity, the probability that the probe catches in it Weather doesn’t depend on whether I have a toothache: • A and B are independent iff (1) P ( catch | toothache , cavity ) = P ( catch | cavity ) p ( A | B ) = p ( A ) or p ( B | A ) = p ( B ) or p ( A , B ) = p ( A ) p ( B ) • The same independence holds if I haven’t got a cavity: • p ( Toothache , Catch , Cavity , Weather ) = (2) P ( catch | toothache , ¬ cavity ) = P ( catch |¬ cavity ) p ( Toothache , Catch , Cavity ) p ( Weather ) • Catch is conditionally independent of Toothache given • 32 entries reduced to 12 ( Weather takes one of 4 values) Cavity : p ( Catch | Toothache , Cavity ) = p ( Catch | Cavity ) • Absolute independence powerful but rare • Equivalent statements: • Dentistry is a large field with hundreds of variables, none of • p ( Toothache | Catch , Cavity ) = p ( Toothache | Cavity ) which are independent. What to do? • p ( Toothache , Catch | Cavity ) = p ( Toothache | Cavity ) p ( Catch | Cavity ) • Toothache ⊥ ⊥ Catch | Cavity Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Conditional Independence contd. Graphical Models • Graphical Models provide a visual depiction of probabilistic • Write out full joint distribution using chain rule: model p ( Toothache , Catch , Cavity ) = p ( Toothache | Catch , Cavity ) p ( Catch , Cavity ) • Conditional indepence assumptions can be seen in graph = p ( Toothache | Catch , Cavity ) p ( Catch | Cavity ) p ( Cavity ) • Inference and learning algorithms can be expressed in = p ( Toothache | Cavity ) p ( Catch | Cavity ) p ( Cavity ) terms of graph operations 2 + 2 + 1 = 5 independent numbers • We will look at 2 types of graph (can be combined) • In many cases, the use of conditional independence • Directed graphs: Bayesian networks greatly reduces the size of the representation of the joint • Undirected graphs: Markov Random Fields distribution • Factor graphs (won’t cover)

  4. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Bayesian Networks Example • A simple, graphical notation for conditional independence Cavity Weather assertions and hence for compact specification of full joint distributions • Syntax: • a set of nodes, one per variable Toothache Catch • a directed, acyclic graph (link ≈ “directly influences”) • a conditional distribution for each node given its parents: • Topology of network encodes conditional independence assertions: p ( X i | pa ( X i )) • Weather is independent of the other variables • In the simplest case, conditional distribution represented • Toothache and Catch are conditionally independent given Cavity as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Example Example contd. P(E) P(B) Burglary Earthquake .001 .002 • I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by B E P(A|B,E) minor earthquakes. Is there a burglar? T T .95 Alarm • Variables: Burglar , Earthquake , Alarm , JohnCalls , MaryCalls T F .94 F T .29 • Network topology reflects “causal” knowledge: F F .001 • A burglar can set the alarm off • An earthquake can set the alarm off • The alarm can cause Mary to call A P(J|A) A P(M|A) • The alarm can cause John to call T .90 JohnCalls MaryCalls T .70 F .05 F .01

  5. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Compactness Global Semantics B E B E • A CPT for Boolean X i with k Boolean parents • Global semantics defines the full joint has 2 k rows for the combinations of parent distribution as the product of the local A A values conditional distributions: J M J M • Each row requires one number p for X i = true n � (the number for X i = false is just 1 − p ) P ( x 1 , . . . , x n ) = P ( x i | pa ( X i )) • If each variable has no more than k parents, i = 1 the complete network requires O ( n · 2 k ) e.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) = numbers • i.e., grows linearly with n , vs. O ( 2 n ) for the full P ( j | a ) P ( m | a ) P ( a |¬ b , ¬ e ) P ( ¬ b ) P ( ¬ e ) joint distribution = 0 . 9 × 0 . 7 × 0 . 001 × 0 . 999 × 0 . 998 • For burglary net, ?? numbers ≈ 0 . 00063 • 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 − 1 = 31 ) Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Constructing Bayesian Networks Example • Need a method such that a series of locally testable assertions of conditional independence guarantees the Suppose we choose the ordering M , J , A , B , E required global semantics MaryCalls 1. Choose an ordering of variables X 1 , . . . , X n JohnCalls 2. For i = 1 to n add X i to the network select parents from X 1 , . . . , X i − 1 such that p ( X i | pa ( X i )) = p ( X i | X 1 , . . . , X i − 1 ) • This choice of parents guarantees the global semantics: n � p ( X 1 , . . . , X n ) = p ( X i | X 1 , . . . , X i − 1 ) (chain rule) i = 1 P ( J | M ) = P ( J ) ? n � = p ( X i | pa ( X i )) (by construction) i = 1

  6. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Example Example Suppose we choose the ordering M , J , A , B , E Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Alarm Burglary P ( J | M ) = P ( J ) ? No P ( J | M ) = P ( J ) ? No P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? No P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? P ( B | A , J , M ) = P ( B | A ) ? P ( B | A , J , M ) = P ( B ) ? Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Example Example Suppose we choose the ordering M , J , A , B , E Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Alarm Burglary Burglary Earthquake Earthquake P ( J | M ) = P ( J ) ? No P ( J | M ) = P ( J ) ? No P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? No P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? No P ( B | A , J , M ) = P ( B | A ) ? P ( B | A , J , M ) = P ( B | A ) ? Yes Yes P ( B | A , J , M ) = P ( B ) ? No P ( B | A , J , M ) = P ( B ) ? No P ( E | B , A , J , M ) = P ( E | A ) ? P ( E | B , A , J , M ) = P ( E | A ) ? No P ( E | B , A , J , M ) = P ( E | A , B ) ? P ( E | B , A , J , M ) = P ( E | A , B ) ? Yes

Recommend


More recommend