Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing - PowerPoint PPT Presentation

Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010

i.i.d to sequential data • So far we assumed independent, identically distributed data • Sequential data – Time-series data E.g. Speech

i.i.d to sequential data • So far we assumed independent, identically distributed data • Sequential data – Time-series data E.g. Speech – Characters in a sentence – Base pairs along a DNA strand

Markov Models • Joint Distribution Chain rule • Markov Assumption (m th order) Current observation only depends on past m observations

Markov Models • Markov Assumption 1 st order 2 nd order

Markov Models # parameters in stationary model • Markov Assumption K-ary variables 1 st order O(K 2 ) m th order O(K m+1 ) n-1 th order O(K n ) ≡ no assumptions – complete (but directed) graph Homogeneous/stationary Markov model (probabilities don’t depend on n)

Hidden Markov Models • Distributions that characterize sequential data with few parameters but are not limited by strong Markov assumptions. S 1 S 2 S T-1 S T O 2 O T-1 O 1 O T Observation space O t ϵ {y 1 , y 2 , …, y K } S t ϵ {1, …, I } Hidden states

Hidden Markov Models S 1 S 2 S T-1 S T O 2 O T-1 O 1 O T 1 st order Markov assumption on hidden states {S t } t = 1, …, T (can be extended to higher order). Note: O t depends on all previous observations {O t-1 ,…O 1 }

Hidden Markov Models • Parameters – stationary/homogeneous markov model (independent of time t) S 1 S 2 S T-1 S T Initial probabilities p(S 1 = i) = π i O 2 O T-1 O T O 1 Transition probabilities p(S t = j|S t-1 = i) = p ij Emission probabilities p(O t = y|S t = i) =

HMM Example • The Dishonest Casino A casino has two die: Fair dice P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 Loaded dice P(1) = P(2) = P(3) = P(5) = 1/10 P(6) = ½ Casino player switches back-&- forth between fair and loaded die once every 20 turns

HMM Problems

HMM Example L F F F L L L F

State Space Representation • Switch between F and L once every 20 turns (1/20 = 0.05) 0.05 0.95 0.95 L F 0.05 • HMM Parameters Initial probs P(S 1 = L ) = 0.5 = P(S 1 = F ) Transition probs P(S t = L / F |S t-1 = L / F ) = 0.95 P(S t = F / L |S t-1 = L / F ) = 0.05 Emission probabilities P(O t = y|S t = F ) = 1/6 y = 1,2,3,4,5,6 P(O t = y|S t = L ) = 1/10 y = 1,2,3,4,5 = 1/2 y = 6

Three main problems in HMMs • Evaluation – Given HMM parameters & observation seqn find prob of observed sequence • Decoding – Given HMM parameters & observation seqn find most probable sequence of hidden states • Learning – Given HMM with unknown parameters and observation sequence find parameters that maximize likelihood of observed data

HMM Algorithms • Evaluation – What is the probability of the observed sequence? Forward Algorithm • Decoding – What is the probability that the third roll was loaded given the observed sequence? Forward-Backward Algorithm – What is the most likely die sequence given the observed sequence? Viterbi Algorithm • Learning – Under what parameterization is the observed sequence most probable? Baum-Welch Algorithm (EM)

Evaluation Problem • Given HMM parameters & observation sequence S T-1 S 1 S 2 S T find probability of observed sequence O T-1 O T O 1 O 2 requires summing over all possible hidden state values at all times – K T exponential # terms! Instead: Compute recursively k α T

Forward Probability k Compute forward probability recursively over t α t S t-1 S t S 1 O t-1 O t O 1 Introduce S t-1 . Chain rule . . Markov assumption

Forward Algorithm k for all k, t using dynamic programming: Can compute α t k = p(O 1 |S 1 = k) p(S 1 = k) • Initialize: α 1 for all k • Iterate: for t = 2, …, T k = p(O t |S t = k) ∑ α t-1 p(S t = k|S t-1 = i) for all k i α t i k • Termination: = ∑ α T k

Decoding Problem 1 • Given HMM parameters & observation sequence find probability that hidden state at time t was k Compute recursively k k β t α t S t-1 S t S t+1 S 1 S T-1 S T O t-1 O t O t+1 O 1 O T-1 O T

Backward Probability k Compute forward probability recursively over t β t S t S t+1 S t+2 S T O t O t+1 O t+2 O T Introduce S t+1 . Chain rule . . Markov assumption

Backward Algorithm k for all k, t using dynamic programming: Can compute β t k = 1 • Initialize: β T for all k • Iterate: for t = T- 1, …, 1 for all k • Termination:

Most likely state vs. Most likely sequence • Most likely state assignment at time t E.g. Which die was most likely used by the casino in the third roll given the observed sequence? • Most likely assignment of state sequence E.g. What was the most likely sequence of die rolls used by the casino given the observed sequence? MLA of x? Not the same solution ! MLA of (x,y)?

Decoding Problem 2 • Given HMM parameters & observation sequence find most likely assignment of state sequence k V T Compute recursively k V T - probability of most likely sequence of states ending at state S T = k

Viterbi Decoding k V t Compute probability recursively over t S t-1 S t S 1 Bayes rule . . O t-1 O t O 1 Markov assumption .

Viterbi Algorithm k for all k, t using dynamic programming: Can compute V t k = p(O 1 |S 1 =k)p(S 1 = k) • Initialize: V 1 for all k • Iterate: for t = 2, …, T for all k • Termination: Traceback:

Computational complexity • What is the running time for Forward, Forward-Backward, Viterbi? O(K 2 T) linear in T instead of O(K T ) exponential in T!

Learning Problem • Given HMM with unknown parameters and observation sequence find parameters that maximize likelihood of observed data But likelihood doesn’t factorize since observations not i.i.d. hidden variables – state sequence EM (Baum-Welch) Algorithm: E-step – Fix parameters, find expected state assignments M-step – Fix expected state assignments, update parameters

Baum-Welch (EM) Algorithm • Start with random initialization of parameters • E-step – Fix parameters, find expected state assignments Forward-Backward algorithm

Baum-Welch (EM) Algorithm • Start with random initialization of parameters • E-step = expected # times in state i -1 = expected # transitions from state i = expected # transitions from state i to j • M-step

Some connections • HMM & Dynamic Mixture Models Choice of mixture component depends on choice of components for previous observations Static mixture Dynamic mixture S 1 S 1 S 2 S 3 S T ... A O 1 A A A A O 1 O 2 O 3 O T N ...

Some connections • HMM vs Linear Dynamical Systems (Kalman Filters) HMM: States are Discrete Observations Discrete or Continuous Linear Dynamical Systems: Observations and States are multi- variate Gaussians whose means are linear functions of their parent states (see Bishop: Sec 13.3)

HMMs.. What you should know • Useful for modeling sequential data with few parameters using discrete hidden states that satisfy Markov assumption • Representation - initial prob, transition prob, emission prob, State space representation • Algorithms for inference and learning in HMMs – Computing marginal likelihood of the observed sequence: forward algorithm – Predicting a single hidden state: forward-backward – Predicting an entire sequence of hidden states: viterbi – Learning HMM parameters: an EM algorithm known as Baum- Welch

Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing - PowerPoint PPT Presentation

Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data Time-series data E.g.

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing

Outline Sequential Data - Part 2 Greg Mori - CMPT 419/726 Hidden Markov Models - Most Likely

Building Java Programs read: 12.5 Recursive backtracking 2 Road Map - Quarter CS Concepts

Objec(ves Dic(onaries March 14, 2018 Sprenkle - CSCI111 1 Review What are the benefits

Reservoir Engineering at two Geothermal field in El Salvador Manuel Monterrosa July 2007

Dictionaries A Dictionary stores keyelement pairs, called items . Several elements might have

The header file is a class declaration class Dice What the class looks like, but now how it

Lecture 29: Artificial Intelligence Marvin Zhang 08/10/2016 Some slides are adapted from CS 188

Methods http://xkcd.com/221/ Not actually a valid Java static method Fundamentals of Computer

Java classes Outline Objects, classes, and object-oriented programming relationship