5 1 r e t p Probabilistic Reasoning a h C , N R wrt Time
Decision Theoretic Agents Introduction to Probability [Ch13] � Belief networks [Ch14] � � Dynamic Belief Networks [Ch15] � Foundations � Markov Chains (Classification) � Hidden Markov Models (HMM) � Kalman Filter � General: Dynamic Belief Networks (DBN) � Applications � Future Work, Extensions, ... Single Decision [Ch16] � Sequential Decisions [Ch17] � Game Theory [Ch 17.6 – 17.7] � 2
Markovian Models � In general, X t+ 1 depends on everything ! … X t , X t-1 , … � Markovian means... Future is independent of the past once you know the present. , … ) = P( X t+ 1 | X t P( X t+ 1 | X t , X t-1 ) � Markov Chain: “state” (everything important) is visible , 〈 everything 〉 P( x t+ 1 | x t ) = P( x t+ 1 | x t ) � Eg: First-Order Markov Chain 1. Random Walk along x axis, changing x-position ± 1 at each time 2. Predicting rain � Stationarity: P( x 2 | x 1 ) = P( x 3 | x 2 ) = … = P( x t+ 1 | x t ) � Hidden Markov Model: State information not visible 4
Using Markov Chain, for Classification � Two classes of DNA... different di-nucleotide distribution � Use this to classify a nucleotide sequence x = 〈 ACATTGACCA… 〉 A : P( x |+ ) = p + ( x 1 | ) p + ( x 2 | x 1 ) p + ( x 3 | x 2 ) … p + ( x k | x k-1 ) = + ∏ i= 1 ) = ∏ i= 1 k p + (x i k a |x i-1 xi|xi-1 using Markov properties 5
Using Markov Chain, for Classification � Is x = 〈 ACATTGACCAT 〉 positive? P( x |+ ) = p + ( x 1 | ) p + ( x 2 | x 1 ) p + ( x 3 | x 2 ) … p + ( x k | x k-1 ) = p + (A) p + ( C | A ) p + ( A | C) … p + ( T | A) = 0.25 × 0.274 × 0.171 × … × 0.355 P( x |–) = p – ( x 1 | ) p – ( x 2 | x 1 ) p – ( x 3 | x 2 ) … p – ( x k | x k-1 ) = p – (A) p – ( C | A ) p – ( A | C) … p – ( T | A) = 0.25 × 0.205 × 0.322 × … × 0.239 � Pick larger: + if p(x|+ ) > p(x | – ) 6
Results (Markov Chain) Predict – Predict + � Results over 48 sequences: � Here: everything is visible � Sometimes, can't see the “states” 7
Phydeaux, the Dog Known Correlations � Sometimes: Grumpy � State { G,H } to Sometimes: Happy Observations { s, f, y} � But hides emotional state… � State { G,H } on day t to Only observations: state { G,H } on day t+ 1 { slobbers, frowns, yelps } p= 0.15 p= 0.85 p= 0.95 Grumpy (state) Happy (state) p( s | g) = 0.15 p( s | h) = 0.8 p( f | g) = 0.75 p( f | h) = 0.15 p( y | g) = 0.10 p( y | h) = 0.05 p= 0.05 � Challenge: Given observation sequence: 〈 s, s, f, y, y, f, … 〉 what were Phydeaux's states? ?? 〈 H, H, H, G, G, G, … 〉 8
Umbrella+ Rain Situation X t ∈ { + rain, –rain } � State: � Observation: E t ∈ { + umbrella, –umbrella} � Simple Belief Net: R 0 � Note: Umbrella t depends only on Rain t Rain t depends only on Rain t-1 9
R 0 HMM Tasks Filtering / Monitoring: P( X t | e 1:t ) 1. What is P(R 3 = + | U 1 = + , U 2 = + , U 3 = –) ? � Need distr. current state to make rational decisions � Prediction: P( X t+ k | e 1:t ) 2. What is P(R 5 = – | U 1 = + , U 2 = + , U 3 = –) ? � Use to evaluate possible courses of actions � Smoothing / Hindsight: P( X t-k | e 1:t ) 3. What is P(R 1 = – | U 1 = + , U 2 = + , U 3 = –) ? � Likelihood: P( e 1:t ) 4. What is P(U 1 = + , U 2 = + , U 3 = –) ? � For comparing different models … classification � Most likely expl'n: argmax x1:t { P( x 1:t | e 1:t ) } 5. Given 〈 U 1 = + , U 2 = + , U 3 = – 〉 , � what is most likely value for 〈 , R 3 〉 R 1 , R 2 ? Compute assignments, for DNA, sounds, . . . � 11
R 0 1. Filtering � At time 3: have � P(R 2 | u 1:2 ) = 〈 P(+ r 2 |+ + ), P(–r 2 |+ + ) 〉 � … then observe u 3 = – � P(R 3 | u 1:3 ) = P( R 3 | u 1:2 , u 3 ) = 1/P(u 1:3 ) P( u 3 | R 3 , u 1:2 ) P(R 3 | u 1:2 ) = 1/P(u 1:3 ) P( u 3 | R 3 ) P(R 3 | e 1:2 ) � P( R 3 | e 1:2 ) = ∑ r2 P(R 3 , r 2 | e 1:2 ) = ∑ r2 P(R 3 | r 2 , e 1:2 ) P( r 2 | e 1:2 ) = ∑ r2 P(R 3 | r 2 ) P( r 2 | e 1:2 ) 12
1. Filtering R 0 � At time t: � have P(X t | e 1:t ) � … then update from e t+ 1 � P(X t+ 1 | e 1:t+ 1 ) = α P( e t+ 1 | X t+ 1 ) ∑ xt P(X t+ 1 | x t ) P( x t | e 1:t ) Transition Prob’s distribution wrt time t Emission Prob’s � Called “ Forward Algorithm ” 14
P( x t , e 1:t ) vs P( x t | e 1:t ) To compute P( X t = a | e 1:t ): Just compute 〈 = k , e 1:t ) 〉 P( X t = 1 , e 1:t ), …, P( X t 1. Compute P(e 1:t ) = ∑ i P( X t = i , e 1:t ) 2. Return P( X t = a | e 1:t ) = P( X t = a , e 1:t ) / P( e 1:t ) = a , e 1:t ) / ∑ i P( X t = P( X t = i , e 1:t ) Normalizing constant: α = 1/ P(e 1:t ) 15
Filtering – Forward Algorithm � Let f 1:t = P( X t | e 1:t ) = 〈 P( X t = 1 | e 1:t ),..., P( X t = r | e 1:t ) 〉 f 1:t+ 1 (x t+ 1 ) = P( x t+ 1 | e 1:t+ 1 ) = α P( e t+ 1 | x t+ 1 ) ∑ xt P(X t+ 1 | x t ) f 1:t (x t ) � f 1:t+ 1 = α Forward( f 1:t+ 1 , e t+ 1 ) Detached! � Update (for discrete state variables): Constant time & Constant space! 16
Filtering Process State.t from State.t-1 State.t from Percept.t State.t+ 1 from State.t 17
R 0 Forward( ) Process Given: P(R 0 ) = 〈 0.5, 0.5 〉 � Evidence 〈 U 1 = + , U 2 = + 〉 : Predict state distribution (before evidence) � ) = ∑ r0 P(R 1 | r 0 ) P( r 0 ) P(R 1 = 〈 0.7, 0.3 〉× 0.5 + 〈 0.2, 0.8 〉× 0.5 = 〈 0.45, 0.55 〉 I ncorporate “Day 1 evidence" + u 1 : � P(R 1 | + u 1 ) = α P(+ u 1 | R 1 ) P( R 1 ) 〈 0.9, 0.2 〉 .* 〈 0.45, 0.55 〉 〈 0.405, 0.11 〉 〈 0.786, 0.214 〉 = α = α ≈ Predict (from t = 1 to t = 2, before new evidence) � P(R 2 | + u 1 ) = ∑ r1 P(R 2 | r 1 ) P( r 1 | + u 1 ) = 〈 0.7, 0.3 〉 0.786 + 〈 0.2, 0.8 〉 〈 0.593, 0.407 〉 0.214 ≈ I ncorporate “Day 2 evidence” + u 2 : � P(R 2 |+ u 1 ,+ u 2 ) = P(+ u 2 |R 2 ) P(R 2 |+ u 1 ) = 〈 0.9, 0.2 〉 .* 〈 0.609, 0.391 〉 〈 0.533, 0.081 〉 〈 0.868, 0.132 〉 α = α ≈ 18
R 0 HMM Tasks Filtering / Monitoring: P( X t | e 1:t ) 1. What is P(R 3 = + | U 1 = + , U 2 = + , U 3 = –) ? � Need distr. current state to make rational decisions � Prediction: P( X t+ k | e 1:t ) 2. What is P(R 5 = – | U 1 = + , U 2 = + , U 3 = –) ? � Use to evaluate possible courses of actions � Smoothing / Hindsight: P( X t-k | e 1:t ) 3. What is P(R 1 = – | U 1 = + , U 2 = + , U 3 = –) ? � Likelihood: P( e 1:t ) 4. What is P(U 1 = + , U 2 = + , U 3 = –) ? � For comparing different models … classification � Most likely expl'n: argmax x1:t { P( x 1:t | e 1:t ) } 5. Given 〈 U 1 = + , U 2 = + , U 3 = – 〉 , � what is most likely value for 〈 , R 3 〉 R 1 , R 2 ? Compute assignments, for DNA, sounds, . . . � 19
R 0 4. Likelihood � How to compute likelihood P( e 1:t ) ? � Let L 1:t = P( X t , e 1:t ) , e 1:t+ 1 ) = ∑ xt P( x t L 1:t+ 1 = P( X t+ 1 , X t+ 1 , e 1:t , e t+ 1 ) = ∑ xt P( e t+ 1 | X t+ 1 , x t , e 1:t ) P(X t+ 1 | x t , e 1:t ) P( x t , e 1:t ) = P( e t+ 1 | X t+ 1 ) ∑ xt P(X t+ 1 | x t ) L 1:t (x t ) � Note: Same Forward( ) algorithm!! � To compute actual likelihood: P( e 1:t ) = ∑ xt P(X t = x t , e 1:t ) = ∑ xt L 1:t (x t ) 20
Best Model of Phydeaux? p= 0.15 I p= 0.85 p= 0.95 Grumpy (state) Happy (state) p( s | g) = 0.15 p( s | h) = 0.8 p( f | g) = 0.75 p( f | h) = 0.15 p( y | g) = 0.10 p( y | h) = 0.05 p= 0.05 p= 025 p= 0.75 II p= 0.75 Grumpy (state) Happy (state) p( s | g) = 0.10 p( s | h) = 0.5 p( f | g) = 0.8 p( f | h) = 0.25 p( y | g) = 0.10 p( y | h) = 0.25 p= 0.25 � Challenge: Given observation sequence: 〈 s, s, f, y, y, f, … 〉 which model of Phydeaux is “correct”?? ( e ) vs P II ( e ) Want P I 21
Use HMMs to Classify Words in Speech Recognition � Use one HMM for each word � hmm j for j th word � Convert acoustic signal to sequence of fixed duration frames (eg, 60ms) (Assumes know start/end of each word in speech signal) � Map each frame to nearest “codebook” frame (discrete symbol x t ) � e 1:T = 〈 e 1 , ... , e n 〉 To classify sequence of frames e 1:T � � 1. Compute P( e 1:T | hmm j ) likelihood e 1:T generated by each word hmm j � 2. Return argmax j { P( e 1:T | hmm j ) } word# j whose hmm j gave highest likelihood 22
R 0 HMM Tasks Filtering / Monitoring: P( X t | e 1:t ) 1. What is P(R 3 = + | U 1 = + , U 2 = + , U 3 = –) ? � Need distr. over current state to make rational decisions � Prediction: P( X t+ k | e 1:t ) 2. What is P(R 5 = – | U 1 = + , U 2 = + , U 3 = –) ? � Use to evaluate possible courses of actions � Smoothing / Hindsight: P( X t-k | e 1:t ) 3. What is P(R 1 = – | U 1 = + , U 2 = + , U 3 = –) ? � Likelihood: P( e 1:t ) 4. What is P(U 1 = + , U 2 = + , U 3 = –) ? � For comparing different models … classification � Most likely expl'n: argmax x1:t { P( x 1:t | e 1:t ) } 5. Given 〈 U 1 = + , U 2 = + , U 3 = – 〉 , � what is most likely value for 〈 , R 3 〉 R 1 , R 2 ? Compute assignments, for DNA, sounds, . . . � 23
Recommend
More recommend