Temporal probability models Chapter 15, Sections 1–5 Chapter 15, Sections 1–5 1
Outline ♦ Time and uncertainty ♦ Inference: filtering, prediction, smoothing ♦ Hidden Markov models ♦ Dynamic Bayesian networks Chapter 15, Sections 1–5 2
Time and uncertainty The world changes; we need to track and predict it Diabetes management vs vehicle diagnosis Basic idea: copy state and evidence variables for each time step X t = set of unobservable state variables at time t e.g., BloodSugar t , StomachContents t , etc. E t = set of observable evidence variables at time t e.g., MeasuredBloodSugar t , PulseRate t , FoodEaten t This assumes discrete time ; step size depends on problem Notation: X a : b = X a , X a +1 , . . . , X b − 1 , X b Chapter 15, Sections 1–5 3
Markov processes (Markov chains) Construct a Bayes net from these variables: parents? CPTs? Chapter 15, Sections 1–5 4
Markov processes (Markov chains) Construct a Bayes net from these variables: parents? CPTs? Markov assumption: X t depends on bounded subset of X 0: t − 1 First-order Markov process: P ( X t | X 0: t − 1 ) = P ( X t | X t − 1 ) Second-order Markov process: P ( X t | X 0: t − 1 ) = P ( X t | X t − 2 , X t − 1 ) X t −2 X t −1 X t X t +1 X t +2 First−order X t −2 X t −1 X t X t +1 X t +2 Second−order Stationary process: transition model P ( X t | X t − 1 ) fixed for all t Chapter 15, Sections 1–5 5
Hidden Markov Model (HMM) Sensor Markov assumption: P ( E t | X 0: t , E 1: t − 1 ) = P ( E t | X t ) Stationary process: transition model P ( X t | X t − 1 ) and sensor model P ( E t | X t ) fixed for all t HMM is a special type of Bayes net, X t is single discrete random variable: with joint probability distribution P ( X 0: t , E 1: t ) =? Chapter 15, Sections 1–5 6
Hidden Markov Model (HMM) Sensor Markov assumption: P ( E t | X 0: t , E 1: t − 1 ) = P ( E t | X t ) Stationary process: transition model P ( X t | X t − 1 ) and sensor model P ( E t | X t ) fixed for all t HMM is a special type of Bayes net, X t is single discrete random variable: with joint probability distribution P ( X 0: t , E 1: t ) = P ( X 0 ) Π t i =1 P ( X i | X i − 1 ) P ( E i | X i ) Chapter 15, Sections 1–5 7
Example R t −1 P(R ) t t 0.7 f 0.3 Rain t −1 Rain Rain t +1 t R P(U ) t t t 0.9 f 0.2 Umbrella t −1 Umbrella Umbrella t +1 t First-order Markov assumption not exactly true in real world! Possible fixes: 1. Increase order of Markov process 2. Augment state , e.g., add Temp t , Pressure t Example: robot motion. Augment position and velocity with Battery t Chapter 15, Sections 1–5 8
Inference tasks Filtering: P ( X t | e 1: t ) belief state—input to the decision process of a rational agent Prediction: P ( X t + k | e 1: t ) for k > 0 evaluation of possible action sequences; like filtering without the evidence Smoothing: P ( X k | e 1: t ) for 0 ≤ k < t better estimate of past states, essential for learning Most likely explanation: arg max x 1: t P ( x 1: t | e 1: t ) speech recognition, decoding with a noisy channel Chapter 15, Sections 1–5 9
Filtering Aim: devise a recursive state estimation algorithm: P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) Chapter 15, Sections 1–5 10
Filtering Aim: devise a recursive state estimation algorithm: P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) P ( X t +1 | e 1: t +1 ) = P ( X t +1 | e 1: t , e t +1 ) = α P ( e t +1 | X t +1 , e 1: t ) P ( X t +1 | e 1: t ) = α P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) Chapter 15, Sections 1–5 11
Filtering Aim: devise a recursive state estimation algorithm: P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) P ( X t +1 | e 1: t +1 ) = P ( X t +1 | e 1: t , e t +1 ) = α P ( e t +1 | X t +1 , e 1: t ) P ( X t +1 | e 1: t ) = α P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) I.e., prediction + estimation. Prediction by summing out X t : P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 , x t | e 1: t ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t , e 1: t ) P ( x t | e 1: t ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) Chapter 15, Sections 1–5 12
Filtering Aim: devise a recursive state estimation algorithm: P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) P ( X t +1 | e 1: t +1 ) = P ( X t +1 | e 1: t , e t +1 ) = α P ( e t +1 | X t +1 , e 1: t ) P ( X t +1 | e 1: t ) = α P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) I.e., prediction + estimation. Prediction by summing out X t : P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 , x t | e 1: t ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t , e 1: t ) P ( x t | e 1: t ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) f 1: t +1 = Forward ( f 1: t , e t +1 ) where f 1: t = P ( X t | e 1: t ) Time and space constant (independent of t ) Chapter 15, Sections 1–5 13
Filtering example 0.500 0.627 0.500 0.373 0.818 0.883 True 0.500 False 0.500 0.182 0.117 Rain 0 Rain 1 Rain 2 Umbrella 1 Umbrella 2 P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) R t − 1 P ( R t ) R t P ( U t ) t 0.7 t 0.9 f 0.3 f 0.2 Chapter 15, Sections 1–5 14
Most likely explanation Chapter 15, Sections 1–5 15
Most likely explanation Most likely sequence � = sequence of most likely states!!!! Most likely path to each x t +1 = most likely path to some x t plus one more step x 1 ... x t P ( x 1 , . . . , x t , X t +1 | e 1: t +1 ) max = P ( e t +1 | X t +1 ) max P ( X t +1 | x t ) max x 1 ... x t − 1 P ( x 1 , . . . , x t − 1 , x t | e 1: t ) x t Identical to filtering, except f 1: t replaced by m 1: t = x 1 ... x t − 1 P ( x 1 , . . . , x t − 1 , X t | e 1: t ) , max I.e., m 1: t ( i ) gives the probability of the most likely path to state i . Update has sum replaced by max, giving the Viterbi algorithm: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) Chapter 15, Sections 1–5 16
Viterbi example Rain 1 Rain 2 Rain 3 Rain 4 Rain 5 true true true true true state space paths false false false false false true true false true true umbrella .8182 .5155 .0361 .0334 .0210 most likely paths .1818 .0491 .1237 .0173 .0024 m 1:1 m 1:2 m 1:3 m 1:4 m 1:5 Chapter 15, Sections 1–5 17
Implementation Issues Viterbi message: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) or filtering update: P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) What is 10 − 6 · 10 − 6 · 10 − 6 ? Chapter 15, Sections 1–5 18
Implementation Issues Viterbi message: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) or filtering update: P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) What is 10 − 6 · 10 − 6 · 10 − 6 ? What is floating point arithmetic precision? Chapter 15, Sections 1–5 19
Implementation Issues Viterbi message: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) or filtering update: P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) What is 10 − 6 · 10 − 6 · 10 − 6 ? What is floating point arithmetic precision? 10 − 6 · 10 − 6 · 10 − 6 = 0 Chapter 15, Sections 1–5 20
Answer? Use either: – Rescaling, multiply values by a (large) constant – logsum trick (Assignment 5) log is monotone increasing, so: arg max f ( x ) = arg max log f ( x ) Also, log( a · b ) = log a + log b Therefore, work with sums of logarithms of probabilities, rather than products of probabilities: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) → log m 1: t +1 = log P ( e t +1 | X t +1 ) + max x t (log P ( X t +1 | x t ) + log m 1: t ) Chapter 15, Sections 1–5 21
Hidden Markov models X t is a single, discrete variable (usually E t is too) Domain of X t is { 1 , . . . , S } 0 . 7 0 . 3 Transition matrix T ij = P ( X t = j | X t − 1 = i ) , e.g., 0 . 3 0 . 7 Sensor matrix O t for each time step, diagonal elements P ( e t | X t = i ) 0 . 9 0 e.g., with U 1 = true , O 1 = 0 0 . 2 Forward messages as column vectors: f 1: t +1 = α O t +1 T ⊤ f 1: t Chapter 15, Sections 1–5 22
Dynamic Bayesian networks X t , E t contain arbitrarily many variables in a replicated Bayes net BMeter 1 R 0 P(R ) P(R ) 1 0 t Battery 0 Battery 0.7 0.7 f 0.3 1 Rain 0 Rain 1 X 0 X 1 R 1 P(U ) 1 t 0.9 f 0.2 X 0 X 1 X t Umbrella 1 Z 1 Chapter 15, Sections 1–5 23
Summary Temporal models use state and sensor variables replicated over time Markov assumptions and stationarity assumption, so we need – transition model P ( X t | X t − 1 ) – sensor model P ( E t | X t ) Tasks are filtering, prediction, smoothing, most likely sequence; all done recursively with constant cost per time step Hidden Markov models have a single discrete state variable; used for speech recognition Dynamic Bayes nets subsume HMMs; exact update intractable Chapter 15, Sections 1–5 24
Recommend
More recommend