Exact Inference for Hidden Markov Models Michael Gutmann - PowerPoint PPT Presentation

Exact Inference for Hidden Markov Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring Semester 2020

Recap ◮ Assuming a factorisation / set of statistical independencies allowed us to efficiently represent the pdf or pmf of random variables ◮ Factorisation can be exploited for inference ◮ by using the distributive law ◮ by re-using already computed quantities ◮ Inference for general factor graphs (variable elimination) ◮ Inference for factor trees ◮ Sum-product and max-product message passing Michael Gutmann HMM Exact Inference 2 / 32

Program 1. Markov models 2. Inference by message passing Michael Gutmann HMM Exact Inference 3 / 32

Program 1. Markov models Markov chains Transition distribution Hidden Markov models Emission distribution Mixture of Gaussians as special case 2. Inference by message passing Michael Gutmann HMM Exact Inference 4 / 32

Applications of (hidden) Markov models Markov and hidden Markov models have many applications, e.g. ◮ speech modelling (speech recognition) ◮ text modelling (natural language processing) ◮ gene sequence modelling (bioinformatics) ◮ spike train modelling (neuroscience) ◮ object tracking (robotics) Michael Gutmann HMM Exact Inference 5 / 32

Markov chains ◮ Chain rule with ordering x 1 , . . . , x d d � p ( x 1 , . . . , x d ) = p ( x i | x 1 , . . . , x i − 1 ) i =1 ◮ If p satisfies ordered Markov property, the number of variables in the conditioning set can be reduced to a subset π i ⊆ { x 1 , . . . , x i − 1 } ◮ Not all predecessors but only subset π i is “relevant” for x i . ◮ L -th order Markov chain: π i = { x i − L , . . . , x i − 1 } d � p ( x 1 , . . . , x d ) = p ( x i | x i − L , . . . , x i − 1 ) i =1 ◮ 1st order Markov chain: π i = { x i − 1 } d � p ( x 1 , . . . , x d ) = p ( x i | x i − 1 ) i =1 Michael Gutmann HMM Exact Inference 6 / 32

Markov chain — DAGs Chain rule x 1 x 2 x 3 x 4 Second-order Markov chain x 1 x 2 x 3 x 4 First-order Markov chain x 1 x 2 x 3 x 4 Michael Gutmann HMM Exact Inference 7 / 32

Vector-valued Markov chains ◮ While not explicitly discussed, the graphical models extend to vector-valued variables ◮ Chain rule with ordering x 1 , . . . , x d d � p ( x 1 , . . . , x d ) = p ( x i | x 1 , . . . , x i − 1 ) i =1 x 1 x 2 x 3 x 4 ◮ 1st order Markov chain: d � p ( x 1 , . . . , x d ) = p ( x i | x i − 1 ) i =1 x 1 x 2 x 3 x 4 Michael Gutmann HMM Exact Inference 8 / 32

Modelling time series ◮ Index i may refer to time t ◮ L -th order Markov chain of length T : T � p ( x 1 , . . . , x T ) = p ( x t | x t − L , . . . , x t − 1 ) t =1 Only the recent past of L time points x t − L , . . . , x t − 1 is relevant for x t ◮ 1st order Markov chain of length T : T � p ( x 1 , . . . , x T ) = p ( x t | x t − 1 ) t =1 Only the last time point x t − 1 is relevant for x t . Michael Gutmann HMM Exact Inference 9 / 32

Transition distribution (Consider 1st order Markov chain.) ◮ p ( x i | x i − 1 ) is called the transition distribution ◮ For discrete random variables, p ( x i | x i − 1 ) is defined by a transition matrix A i p ( x i = k | x i − 1 = k ′ ) = A i k , k ′ ◮ For continuous random variables, p ( x i | x i − 1 ) is a conditional pdf, e.g. � � − ( x i − f i ( x i − 1 )) 2 1 p ( x i | x i − 1 ) = exp � 2 σ 2 2 πσ 2 i i for some function f i ◮ Homogeneous Markov chain: p ( x i | x i − 1 ) does not depend on i , e.g. A i = A σ i = σ, f i = f ◮ Inhomogeneous Markov chain: p ( x i | x i − 1 ) does depend on i Michael Gutmann HMM Exact Inference 10 / 32

Hidden Markov model DAG: h 1 h 2 h 3 h 4 v 1 v 2 v 3 v 4 ◮ 1st order Markov chain on hidden (latent) variables h i . ◮ Each visible (observed) variable v i only depends on the corresponding hidden variable h i ◮ Factorisation d � p ( h 1: d , v 1: d ) = p ( v 1 | h 1 ) p ( h 1 ) p ( v i | h i ) p ( h i | h i − 1 ) i =2 ◮ The visibles are d-connected if hiddens are not observed ◮ Visibles are d-separated (independent) given the hiddens ◮ The h i model/explain all dependencies between the v i Michael Gutmann HMM Exact Inference 11 / 32

Emission distribution ◮ p ( v i | h i ) is called the emission distribution ◮ Discrete-valued v i and h i : p ( v i | h i ) can be represented as a matrix ◮ Discrete-valued v i and continuous-valued h i : p ( v i | h i ) is a conditional pmf. ◮ Continuous-valued v i : p ( v i | h i ) is a density ◮ As for the transition distribution, the emission distribution p ( v i | h i ) may depend on i or not. ◮ If neither the transition nor the emission distribution depend on i , we have a stationary (or homogeneous) hidden Markov model. Michael Gutmann HMM Exact Inference 12 / 32

Gaussian emission model with discrete-valued latents ⊥ h i − 1 , and v i ∈ R m , h i ∈ { 1 , . . . , K } ◮ Special case: h i ⊥ p ( h = k ) = p k � � 1 − 1 Σ − 1 µ k ) ⊤ Σ p ( v | h = k ) = Σ k | 1 / 2 exp 2( v − µ Σ k ( v − µ µ k ) µ µ Σ | det 2 π Σ for all h i and v i . ◮ DAG h 1 h 2 h d . . . v d v 1 v 2 ◮ Corresponds to d iid draws from a Gaussian mixture model with K mixture components ◮ Mean E [ v | h = k ] = µ µ µ k ◮ Covariance matrix V [ v | h = k ] = Σ Σ Σ k Michael Gutmann HMM Exact Inference 13 / 32

Gaussian emission model with discrete-valued latents The HMM is a generalisation of the Gaussian mixture model where cluster membership at “time” i (the value of h i ) generally depends on cluster membership at “time” i − 1 (the value of h i − 1 ). 1 1 0.5 0.5 k = 1 k = 3 k = 2 0 0 0 0.5 1 0 0.5 1 Example for v i ∈ R 2 , h i ∈ { 1 , 2 , 3 } . Left: p ( v | h = k ). Right: samples (Bishop, Figure 13.8) Michael Gutmann HMM Exact Inference 14 / 32

Program 1. Markov models Markov chains Transition distribution Hidden Markov models Emission distribution Mixture of Gaussians as special case 2. Inference by message passing Michael Gutmann HMM Exact Inference 15 / 32

Program 1. Markov models 2. Inference by message passing Inference: filtering, prediction, smoothing, Viterbi Filtering: Sum-product message passing yields the alpha-recursion from the HMM literature Smoothing: Sum-product message passing yields the alpha-beta recursion from the HMM literature Sum-product message passing for prediction, inference of most likely hidden path, and for inference of joint distributions Michael Gutmann HMM Exact Inference 16 / 32

The classical inference problems (Considering the index i to refer to time t ) Filtering (Inferring the present) p ( h t | v 1: t ) Smoothing (Inferring the past) p ( h t | v 1: u ) t < u Prediction (Inferring the future) p ( h t | v 1: u ) t > u Most likely (Viterbi alignment) argmax h 1: t p ( h 1: t | v 1: t ) Hidden path For prediction, one is also often interested in p ( v t | v 1: u ) for t > u . (slide courtesy of David Barber) Michael Gutmann HMM Exact Inference 17 / 32

The classical inference problems filtering �� t smoothing �� t prediction �� t �� denotes the extent of data �� available (slide courtesy of Chris Williams) Michael Gutmann HMM Exact Inference 18 / 32

Factor graph for hidden Markov model (see tutorial 4) DAG: h 1 h 2 h 3 h 4 v 1 v 2 v 3 v 4 Factor graph: p ( h 2 | h 1 ) p ( h 3 | h 2 ) p ( h 4 | h 3 ) p ( h 1 ) h 1 h 2 h 3 h 4 p ( v 1 | h 1 ) p ( v 2 | h 2 ) p ( v 3 | h 3 ) p ( v 4 | h 4 ) v 1 v 2 v 3 v 4 Michael Gutmann HMM Exact Inference 19 / 32

Exact Inference for Hidden Markov Models Michael Gutmann - PowerPoint PPT Presentation

Exact Inference for Hidden Markov Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring Semester 2020 Recap Assuming a factorisation / set of statistical

CSEP 573: Artificial Intelligence Spring 2014 Hidden Markov Models & Exact Inference Ali

1 X 1 X 2 X 3 Ghostbusters HMM Chain Rule and HMMs E 1 E 2 E 3 P(X 1 ) = uniform 1/9 1/9

CSE P 590 A Markov Models and Hidden Markov Models

Inference in non parametric Hidden Markov Models Elisabeth Gassiat Universit e Paris-Sud

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

CSE 527 " Markov Models and Hidden Markov Models !

CSE P 527 Markov Models and Hidden Markov Models 1 2

CSE 427 Markov Models and Hidden Markov Models 2

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

CSE 527 Lectures 12-13 Markov Models and Hidden Markov Models DNA Methylation CH 3 CpG - 2

CSE 527 Lectures 11-12 Markov Models and Hidden Markov Models DNA Methylation CH 3 CpG - 2

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model Walid Krichene

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

A C Standard model has 1-1 correspondence between symbols and states, P(A | T) thus P ( x i

HMM Review Lecture Outline 1. Markov models 2. Hidden Markov

Introduction to Hidden Markov Models Antonio Art es-Rodr guez Unviersidad Carlos III de

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay 25 August

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CSE 427 Markov Models and Hidden Markov Models How Proteins Read DNA E.g.:

Exact Inference for Hidden Markov Models Michael Gutmann - PowerPoint PPT Presentation

Exact Inference for Hidden Markov Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring Semester 2020 Recap Assuming a factorisation / set of statistical

CSEP 573: Artificial Intelligence Spring 2014 Hidden Markov Models &amp; Exact Inference Ali

1 X 1 X 2 X 3 Ghostbusters HMM Chain Rule and HMMs E 1 E 2 E 3 P(X 1 ) = uniform 1/9 1/9

CSE P 590 A Markov Models and Hidden Markov Models

Inference in non parametric Hidden Markov Models Elisabeth Gassiat Universit e Paris-Sud

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

CSE 527 &quot; Markov Models and Hidden Markov Models !

CSE P 527 Markov Models and Hidden Markov Models 1 2

CSE 427 Markov Models and Hidden Markov Models 2

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

CSE 527 Lectures 12-13 Markov Models and Hidden Markov Models DNA Methylation CH 3 CpG - 2

CSE 527 Lectures 11-12 Markov Models and Hidden Markov Models DNA Methylation CH 3 CpG - 2

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model Walid Krichene

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

A C Standard model has 1-1 correspondence between symbols and states, P(A | T) thus P ( x i

HMM Review Lecture Outline 1. Markov models 2. Hidden Markov

Introduction to Hidden Markov Models Antonio Art es-Rodr guez Unviersidad Carlos III de

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay 25 August

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CSE 427 Markov Models and Hidden Markov Models How Proteins Read DNA E.g.:

CSEP 573: Artificial Intelligence Spring 2014 Hidden Markov Models & Exact Inference Ali

CSE 527 " Markov Models and Hidden Markov Models !