Hidden Markov Models 1 Hidden Markov Models Markov chains not so useful for most agents Need observations to update your beliefs Hidden Markov models (HMMs) Underlying Markov chain over states X You observe outputs (effects) at each time step As a Bayes net (or more generally, a graphical model): X 1 X 2 X 3 X 4 X 5 E 1 E 2 E 3 E 4 E 5 Don't complain; the weather could be worse. 2 Example: Weather HMM CSE 473: Artificial Intelligence Hidden Markov Models Rain t-1 Rain t Rain t+1 Umbrella t-1 Umbrella t Umbrella t+1 An HMM is defined by: R t R t+1 P(R t+1 |R t ) R t U t P(U t |R t ) Initial distribution: +r +r 0.7 +r +u 0.9 +r -r 0.3 +r -u 0.1 Transitions: -r +r 0.3 -r +u 0.2 Emissions: Steve Tanimoto --- University of Washington -r -r 0.7 -r -u 0.8 [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] 1
X 1 X 2 X 3 Ghostbusters HMM Chain Rule and HMMs E 1 E 2 E 3 P(X 1 ) = uniform 1/9 1/9 1/9 P(X’|X) = ghosts usually move clockwise, From the chain rule, every joint distribution over can be written as: 1/9 1/9 1/9 but sometimes move in a random direction or stay put 1/9 1/9 1/9 P(E|X) = same sensor model as before: P(X 1 ) red means close, green means far away. Assuming that for all t : State independent of all past states and all past evidence given the previous state, i.e.: 1/6 1/6 1/2 X 1 X 2 X 3 X 4 0 1/6 0 Etc… Evidence is independent of all past states and all past evidence given the current state, i.e.: 0 0 0 E 1 E 1 E 3 E 4 P(X’|X=<1,2>) gives us the expression posited on the earlier slide: P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) E 5 P(E|X) 0.05 0.15 0.5 0.3 Etc… (must specify for other distances) Joint Distribution of an HMM Conditional Independence X 1 X 2 X 3 X 5 HMMs have two important independence properties: Markov hidden process: future depends on past via the present E 1 E 2 E 3 E 5 ? ? Joint distribution: X 1 X 2 X 3 X 4 X 5 More generally: E 1 E 2 E 3 E 4 E 5 Questions to be resolved: Does this indeed define a joint distribution? Can every joint distribution be factored this way, or are we making some assumptions about the joint distribution by using this factorization? X 1 X 2 X 3 Chain Rule and HMMs Conditional Independence E 1 E 2 E 3 HMMs have two important independence properties: Markov hidden process: future depends on past via the present From the chain rule, every joint distribution over can be written as: Current observation independent of all else given current state ? X 1 X 2 X 3 X 4 X 5 Assuming that E 1 E 2 E 3 E 4 E 5 ? gives us the expression posited on the previous slide: 2
Conditional Independence HMM Computations HMMs have two important independence properties: Given Markov hidden process: future depends on past via the present parameters Current observation independent of all else given current state evidence E 1: n =e 1: n ? X 1 X 2 X 3 X 4 X 5 Inference problems include: Filtering, find P ( X t |e 1: t ) for all t E 1 E 2 E 3 E 4 E 5 Smoothing, find P ( X t |e 1: n ) for all t ? Most probable explanation, find x* 1: n = argmax x 1: n P ( x 1: n |e 1: n ) Conditional Independence Filtering / Monitoring HMMs have two important independence properties: Filtering, or monitoring, is the task of tracking the distribution Markov hidden process: future depends on past via the present B t (X) = P t (X t | e 1 , …, e t ) (the belief state) over time Current observation independent of all else given current state We start with B 1 (X) in an initial setting, usually uniform X 1 X 2 X 3 X 4 X 5 As time passes, or we get observations, we update B(X) E 1 E 2 E 3 E 4 E 5 The Kalman filter was invented in the 60’s and first ? ? implemented as a method of trajectory estimation for the Quiz: does this mean that evidence variables are guaranteed to be independent? Apollo program (Kalman filter is a type of HMM with continuous values) [No, they are correlated by the hidden state(s)] Real HMM Examples Example: Robot Localization Example from Speech recognition HMMs: Michael Pfeiffer Observations are acoustic signals (continuous valued) States are specific positions in specific words (so, tens of thousands) Machine translation HMMs: Observations are words (tens of thousands) States are translation options Robot tracking: Observations are range readings (continuous) States are positions on a map (continuous) Prob 0 1 t=0 Sensor model: can read in which directions there is a wall, never more than 1 mistake Motion model: may not execute action with small prob. 3
Example: Robot Localization Example: Robot Localization Prob 0 1 Prob 0 1 t=1 Lighter grey: was possible to get the reading, but less likely b/c t=4 required 1 mistake Example: Robot Localization Example: Robot Localization Prob 0 1 Prob 0 1 t=2 t=5 Example: Robot Localization Inference: Base Cases X 1 X 1 X 2 E 1 Prob 0 1 t=3 4
Passage of Time Observation Assume we have current belief P(X | previous evidence): X 1 Assume we have current belief P(X | evidence to date) X 1 X 2 Then, after evidence comes in: E 1 Then, after one time step passes: Or compactly: Basic idea: beliefs “reweighted” Or, compactly: by likelihood of evidence Basic idea: beliefs get “pushed” through the transitions Unlike passage of time, we have With the “B” notation, we have to be careful about what time step t the belief is about, and what to renormalize evidence it includes Example: Passage of Time Example: Observation As time passes, uncertainty “ accumulates ” As we get observations, beliefs get reweighted, uncertainty “ decreases ” (Transition model: ghosts usually go clockwise) T = 1 T = 2 T = 5 Before observation After observation Example: Weather HMM Video of Passage of Time (Transition Model) B’(+r) = 0.5 B’(+r) = 0.627 B’(-r) = 0.5 B’(-r) = 0.373 B(+r) = 0.818 B(+r) = 0.5 B(+r) = 0.883 B(-r) = 0.5 B(-r) = 0.182 B(-r) = 0.117 Rain 0 Rain 1 Rain 2 R t R t+1 P(R t+1 |R t ) R t U t P(U t |R t ) +r +r 0.7 +r +u 0.9 +r -r 0.3 +r -u 0.1 -r +r 0.3 -r +u 0.2 Umbrella 1 Umbrella 2 -r -r 0.7 -r -u 0.8 5
The Forward Algorithm Video of Demo Pacman – Sonar (with beliefs) We are given evidence at each time and want to know We can derive the following updates We can normalize as we go if we want to have P(x|e) at each time step, or just once at the end… Online Belief Updates HMM Computations (Reminder) Every time step, we start with current P(X | evidence) Given We update for time: parameters evidence E 1: n =e 1: n X 1 X 2 Inference problems include: We update for evidence: X 2 Filtering, find P ( X t |e 1: t ) for all t Smoothing, find P ( X t |e 1: n ) for all t Most probable explanation, find The forward algorithm does both at once (and doesn’t normalize) x* 1: n = argmax x 1: n P ( x 1: n |e 1: n ) Potential issue: space is |X| and time is |X| 2 per time step E 2 Pacman – Sonar (P4) Smoothing Smoothing is the process of using all evidence better individual estimates for a hidden state (or all hidden states) Idea: run FORWARD algorithm up until t , and a similar BACKWARD algorithm from the final timestep n down to t+1 36 [Demo: Pacman – Sonar – No Beliefs(L14D1)] 6
Most Likely Explanation Forward / Viterbi Algorithms sun sun sun sun rain rain rain rain Forward Algorithm (Sum) Viterbi Algorithm (Max) HMMs: MLE Queries Most Probably Explanation (Sequence) Viterbi algorithm: very similar to filtering algorithm (FORWARD) HMMs defined by X 1 X 2 X 3 X 4 X 5 States X Essentially: replace “sum” with “max”, keep back pointers Observations E Initial distribution: E 1 E 2 E 3 E 4 E 5 Transitions: Emissions: New query: most likely explanation: New method: the Viterbi algorithm State Trellis State trellis: graph of states and transitions over time sun sun sun sun rain rain rain rain Each arc represents some transition Each arc has weight Each path is a sequence of states The product of weights on a path is that sequence’s probability along with the evidence Forward algorithm computes sums of paths, Viterbi computes best paths 7
Recommend
More recommend