temporal probability models
play

Temporal probability models Chapter 15, Sections 13 of; based on - PowerPoint PPT Presentation

Temporal probability models Chapter 15, Sections 13 of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 13 1 Outline Time and uncertainty


  1. Temporal probability models Chapter 15, Sections 1–3 of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 1

  2. Outline ♦ Time and uncertainty ♦ Inference: filtering, prediction, smoothing ♦ Hidden Markov models of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 2

  3. Time and uncertainty The world changes; we need to track and predict it Our basic idea is to copy state and evidence variables for each time step X t = set of unobservable state variables at time t e.g., BloodSugar t , StomachContents t , etc. E t = set of observable evidence variables at time t e.g., MeasuredBloodSugar t , PulseRate t , FoodEaten t This assumes discrete time ; the step size depends on the problem Notation: X a : b = X a , X a +1 , . . . , X b − 1 , X b We want to construct a Bayes net from these variables: – what are the parents of X t and E t ? of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 3

  4. Markov chains A Markov chain has a single observable state X t that obeys the Markov assumption: X t depends on a bounded subset of X 0: t − 1 First-order Markov process: P ( X t | X 0: t − 1 ) = P ( X t | X t − 1 ) X t −2 X t −1 X t X t +1 X t +2 First−order X t −2 X t −1 X t X t +1 X t +2 Second−order Second-order Markov process: P ( X t | X 0: t − 1 ) = P ( X t | X t − 2 , X t − 1 ) (can be reduced to 1st order by using � X t − 2 , X t − 1 � as the state) of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 4

  5. Hidden Markov models (HMM) A HMM contains a Markov chain X t , which is not observable. Instead we observe the evidence variables E t , and assume that they obey the Sensor Markov assumption: P ( E t | X 0: t , E 0: t − 1 ) = P ( E t | X t ) Both Markov chains and HMMs are stationary processes: – the transition model P ( X t | X t − 1 ) and – the sensor model P ( E t | X t ) are fixed for all t of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 5

  6. Example R t −1 P(R ) t t 0.7 f 0.3 Rain t −1 Rain Rain t +1 t R P(U ) t t t 0.9 f 0.2 Umbrella t −1 Umbrella Umbrella t +1 t Neither the Markov assumption nor the sensor Markov assumtion are exactly true in the real world! Possible fixes: 1. Increase the order of the Markov process 2. Augment the state , e.g., add Temp t , Pressure t of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 6

  7. Inference tasks Filtering: P ( X t | e 1: t ) to compute the current belief state given all evidence better name: state estimation Prediction: P ( X t + k | e 1: t ) for k > 0 to compute a future belief state, given current evidence (it’s like filtering without all evidence) Smoothing: P ( X k | e 1: t ) for 0 ≤ k < t to compute a better estimate of past states Most likely explanation: arg max x 1: t P ( x 1: t | e 1: t ) to compute the state sequence that is most likely, given the evidence Applications: speech recognition, decoding with a noisy channel, etc. of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 7

  8. Filtering / state estimation A useful filtering algorithm needs to maintain a current state and update it, instead of recalculating everything. I.e., we need a function f such that: P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) We compose the evidence e 1: t +1 into e 1: t and e t +1 : P ( X t +1 | e 1: t +1 ) = P ( X t +1 | e 1: t , e t +1 ) (divide the evidence) = α P ( e t +1 | X t +1 , e 1: t ) P ( X t +1 | e 1: t ) (Bayes’ rule) = α P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) (Sensor Markov assumption) � �� � � �� � the sensor model prediction We obtain the one-step prediction by conditioning on the current state X t : P ( X t +1 | e 1: t ) = P ( X t +1 | X t , e 1: t ) P ( X t | e 1: t ) = P ( X t +1 | X t ) P ( X t | e 1: t ) (Markov assumption) � �� � � �� � previous estimate the Markov model Our final equation becomes this: P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) P ( X t +1 | X t ) P ( X t | e 1: t ) � �� � � �� � � �� � � �� � current estimate = f 1: k +1 the sensor model the Markov model previous estimate = f 1: k of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 8

  9. Smoothing X 0 X 1 X k X t E E k E t 1 Divide evidence e 1: t into e 1: k , e k +1: t : P ( X k | e 1: t ) = P ( X k | e 1: k , e k +1: t ) = α P ( X k | e 1: k ) P ( e k +1: t | X k , e 1: k ) (Bayes’ rule) = α P ( X k | e 1: k ) P ( e k +1: t | X k ) (conditional independence) � �� � � �� � f 1: k b k +1: t The backward message b k +1: t is computed by backwards recursion: P ( e k +1: t | X k ) = P ( e k +1: t | X k , X k +1 ) P ( X k +1 | X k ) = P ( e k +1: t | X k +1 ) P ( X k +1 | X k ) = P ( e k +1 | X k +1 ) P ( e k +2: t | X k +1 ) P ( X k +1 | X k ) � �� � � �� � � �� � the sensor model b k +2: t the Markov model of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 9

  10. Forward and backward Forward algorithm is used to compute the current belief state Backward algorithm is used to compute a previous belief state Forward–backward algorithm: cache forward messages along the way, which can then be used when going backward of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 10

  11. Most likely explanation Most likely sequence � = sequence of most likely states! P ( x 1: t , X t +1 | e 1: t , e t +1 ) = α P ( e t +1 | x 1: t , X t +1 , e 1: t ) P ( x 1: t , X t +1 | e 1: t ) = α P ( e t +1 | x 1: t , X t +1 , e 1: t ) P ( X t +1 | x 1: t , e 1: t ) P ( x 1: t | e 1: t ) = α P ( e t +1 | X t +1 ) P ( X t +1 | x t ) P ( x 1: t − 1 , x t | e 1: t ) Most likely path to each x t +1 = most likely path to some x t , plus one step. Since we don’t care about the exact values, we can forget α . m 1: t +1 = max x 1: t P ( x 1: t , X t +1 | e 1: t , e t +1 ) = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) max x 1: t − 1 P ( x 1: t − 1 , X t | e 1: t )) = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) m 1: t is the probability distribution of the most likely path to each x t ∈ X t , and is calculated by the Viterbi algorithm: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 11

  12. Hidden Markov models X t is a single, discrete variable X t (and usually E t is too) Assume that the domain of X t is { 1 , . . . , S } Transition matrix T ij = P ( X t = j | X t − 1 = i ) ,    0 . 7 0 . 3 e.g., the rain matrix     0 . 3 0 . 7  Sensor matrix O t for each time step t , consists of diagonal elements P ( e t | X t = i )    0 . 9 0 e.g., with U 1 = true , O 1 =     0 0 . 2  Forward and backward messages can now be represented as column vectors: f 1: t +1 = α O t +1 T ⊤ f 1: t b k +1: t = T O k +1 b k +2: t The forward-backward algorithm needs time O ( S 2 t ) and space O ( St ) of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 12

  13. Summary for HMMs Temporal models use state X t and sensor E t variables replicated over time To make the models tractable, we introduce simplifying assumptions: – Markov assumption: P ( X t | X 0: t − 1 ) = P ( X t | X t − 1 ) – sensor assumption: P ( E t | X 0: t , E 0: t − 1 ) = P ( E t | X t ) – stationarity: P ( X t | X t − 1 ) = P ( X t ′ | X t ′ − 1 ) , P ( E t | X t ) = P ( E t ′ | X t ′ ) With the assumptions we only need the following models: – the transition model P ( X t | X t − 1 ) – the sensor model P ( E t | X t ) Possible computing tasks: – filtering/state estimation, prediction, smoothing, most likely sequence – all can be done with constant cost per time step of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 13

Recommend


More recommend