Exact Inference for Hidden Markov Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring Semester 2020
Recap ◮ Assuming a factorisation / set of statistical independencies allowed us to efficiently represent the pdf or pmf of random variables ◮ Factorisation can be exploited for inference ◮ by using the distributive law ◮ by re-using already computed quantities ◮ Inference for general factor graphs (variable elimination) ◮ Inference for factor trees ◮ Sum-product and max-product message passing Michael Gutmann HMM Exact Inference 2 / 32
Program 1. Markov models 2. Inference by message passing Michael Gutmann HMM Exact Inference 3 / 32
Program 1. Markov models Markov chains Transition distribution Hidden Markov models Emission distribution Mixture of Gaussians as special case 2. Inference by message passing Michael Gutmann HMM Exact Inference 4 / 32
Applications of (hidden) Markov models Markov and hidden Markov models have many applications, e.g. ◮ speech modelling (speech recognition) ◮ text modelling (natural language processing) ◮ gene sequence modelling (bioinformatics) ◮ spike train modelling (neuroscience) ◮ object tracking (robotics) Michael Gutmann HMM Exact Inference 5 / 32
Markov chains ◮ Chain rule with ordering x 1 , . . . , x d d � p ( x 1 , . . . , x d ) = p ( x i | x 1 , . . . , x i − 1 ) i =1 ◮ If p satisfies ordered Markov property, the number of variables in the conditioning set can be reduced to a subset π i ⊆ { x 1 , . . . , x i − 1 } ◮ Not all predecessors but only subset π i is “relevant” for x i . ◮ L -th order Markov chain: π i = { x i − L , . . . , x i − 1 } d � p ( x 1 , . . . , x d ) = p ( x i | x i − L , . . . , x i − 1 ) i =1 ◮ 1st order Markov chain: π i = { x i − 1 } d � p ( x 1 , . . . , x d ) = p ( x i | x i − 1 ) i =1 Michael Gutmann HMM Exact Inference 6 / 32
Markov chain — DAGs Chain rule x 1 x 2 x 3 x 4 Second-order Markov chain x 1 x 2 x 3 x 4 First-order Markov chain x 1 x 2 x 3 x 4 Michael Gutmann HMM Exact Inference 7 / 32
Vector-valued Markov chains ◮ While not explicitly discussed, the graphical models extend to vector-valued variables ◮ Chain rule with ordering x 1 , . . . , x d d � p ( x 1 , . . . , x d ) = p ( x i | x 1 , . . . , x i − 1 ) i =1 x 1 x 2 x 3 x 4 ◮ 1st order Markov chain: d � p ( x 1 , . . . , x d ) = p ( x i | x i − 1 ) i =1 x 1 x 2 x 3 x 4 Michael Gutmann HMM Exact Inference 8 / 32
Modelling time series ◮ Index i may refer to time t ◮ L -th order Markov chain of length T : T � p ( x 1 , . . . , x T ) = p ( x t | x t − L , . . . , x t − 1 ) t =1 Only the recent past of L time points x t − L , . . . , x t − 1 is relevant for x t ◮ 1st order Markov chain of length T : T � p ( x 1 , . . . , x T ) = p ( x t | x t − 1 ) t =1 Only the last time point x t − 1 is relevant for x t . Michael Gutmann HMM Exact Inference 9 / 32
Transition distribution (Consider 1st order Markov chain.) ◮ p ( x i | x i − 1 ) is called the transition distribution ◮ For discrete random variables, p ( x i | x i − 1 ) is defined by a transition matrix A i p ( x i = k | x i − 1 = k ′ ) = A i k , k ′ ◮ For continuous random variables, p ( x i | x i − 1 ) is a conditional pdf, e.g. � � − ( x i − f i ( x i − 1 )) 2 1 p ( x i | x i − 1 ) = exp � 2 σ 2 2 πσ 2 i i for some function f i ◮ Homogeneous Markov chain: p ( x i | x i − 1 ) does not depend on i , e.g. A i = A σ i = σ, f i = f ◮ Inhomogeneous Markov chain: p ( x i | x i − 1 ) does depend on i Michael Gutmann HMM Exact Inference 10 / 32
Hidden Markov model DAG: h 1 h 2 h 3 h 4 v 1 v 2 v 3 v 4 ◮ 1st order Markov chain on hidden (latent) variables h i . ◮ Each visible (observed) variable v i only depends on the corresponding hidden variable h i ◮ Factorisation d � p ( h 1: d , v 1: d ) = p ( v 1 | h 1 ) p ( h 1 ) p ( v i | h i ) p ( h i | h i − 1 ) i =2 ◮ The visibles are d-connected if hiddens are not observed ◮ Visibles are d-separated (independent) given the hiddens ◮ The h i model/explain all dependencies between the v i Michael Gutmann HMM Exact Inference 11 / 32
Emission distribution ◮ p ( v i | h i ) is called the emission distribution ◮ Discrete-valued v i and h i : p ( v i | h i ) can be represented as a matrix ◮ Discrete-valued v i and continuous-valued h i : p ( v i | h i ) is a conditional pmf. ◮ Continuous-valued v i : p ( v i | h i ) is a density ◮ As for the transition distribution, the emission distribution p ( v i | h i ) may depend on i or not. ◮ If neither the transition nor the emission distribution depend on i , we have a stationary (or homogeneous) hidden Markov model. Michael Gutmann HMM Exact Inference 12 / 32
Gaussian emission model with discrete-valued latents ⊥ h i − 1 , and v i ∈ R m , h i ∈ { 1 , . . . , K } ◮ Special case: h i ⊥ p ( h = k ) = p k � � 1 − 1 Σ − 1 µ k ) ⊤ Σ p ( v | h = k ) = Σ k | 1 / 2 exp 2( v − µ Σ k ( v − µ µ k ) µ µ Σ | det 2 π Σ for all h i and v i . ◮ DAG h 1 h 2 h d . . . v d v 1 v 2 ◮ Corresponds to d iid draws from a Gaussian mixture model with K mixture components ◮ Mean E [ v | h = k ] = µ µ µ k ◮ Covariance matrix V [ v | h = k ] = Σ Σ Σ k Michael Gutmann HMM Exact Inference 13 / 32
Gaussian emission model with discrete-valued latents The HMM is a generalisation of the Gaussian mixture model where cluster membership at “time” i (the value of h i ) generally depends on cluster membership at “time” i − 1 (the value of h i − 1 ). 1 1 0.5 0.5 k = 1 k = 3 k = 2 0 0 0 0.5 1 0 0.5 1 Example for v i ∈ R 2 , h i ∈ { 1 , 2 , 3 } . Left: p ( v | h = k ). Right: samples (Bishop, Figure 13.8) Michael Gutmann HMM Exact Inference 14 / 32
Program 1. Markov models Markov chains Transition distribution Hidden Markov models Emission distribution Mixture of Gaussians as special case 2. Inference by message passing Michael Gutmann HMM Exact Inference 15 / 32
Program 1. Markov models 2. Inference by message passing Inference: filtering, prediction, smoothing, Viterbi Filtering: Sum-product message passing yields the alpha-recursion from the HMM literature Smoothing: Sum-product message passing yields the alpha-beta recursion from the HMM literature Sum-product message passing for prediction, inference of most likely hidden path, and for inference of joint distributions Michael Gutmann HMM Exact Inference 16 / 32
The classical inference problems (Considering the index i to refer to time t ) Filtering (Inferring the present) p ( h t | v 1: t ) Smoothing (Inferring the past) p ( h t | v 1: u ) t < u Prediction (Inferring the future) p ( h t | v 1: u ) t > u Most likely (Viterbi alignment) argmax h 1: t p ( h 1: t | v 1: t ) Hidden path For prediction, one is also often interested in p ( v t | v 1: u ) for t > u . (slide courtesy of David Barber) Michael Gutmann HMM Exact Inference 17 / 32
The classical inference problems filtering ������������ ������������ ������������ ������������ ������������ ������������ ������������ ������������ t smoothing �������������������� �������������������� �������������������� �������������������� �������������������� �������������������� ����������������������� ����������������������� �������������������� �������������������� t prediction �������� �������� �������� �������� �������� �������� ����������������������� ����������������������� �������� �������� t ���� ���� denotes the extent of data ���� ���� ���� ���� ���� ���� available (slide courtesy of Chris Williams) Michael Gutmann HMM Exact Inference 18 / 32
Factor graph for hidden Markov model (see tutorial 4) DAG: h 1 h 2 h 3 h 4 v 1 v 2 v 3 v 4 Factor graph: p ( h 2 | h 1 ) p ( h 3 | h 2 ) p ( h 4 | h 3 ) p ( h 1 ) h 1 h 2 h 3 h 4 p ( v 1 | h 1 ) p ( v 2 | h 2 ) p ( v 3 | h 3 ) p ( v 4 | h 4 ) v 1 v 2 v 3 v 4 Michael Gutmann HMM Exact Inference 19 / 32
Recommend
More recommend