8: Hidden Markov Models Machine Learning and Real-world Data Helen - PowerPoint PPT Presentation

8: Hidden Markov Models Machine Learning and Real-world Data Helen Yannakoudakis 1 Computer Laboratory University of Cambridge Lent 2018 1 Based on slides created by Simone Teufel

So far we’ve looked at (statistical) classification. Experimented with different ideas for sentiment detection. Let us now talk about . . .

So far we’ve looked at (statistical) classification. Experimented with different ideas for sentiment detection. Let us now talk about . . . the weather!

Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day

Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day Can we guess what the weather will be like tomorrow?

Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day Can we guess what the weather will be like tomorrow? We can use a history of weather observations: P ( w t = Rainy | w t − 1 = Rainy, w t − 2 = Cloudy, w t − 3 = Cloudy, w t − 4 = Rainy )

Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day Can we guess what the weather will be like tomorrow? We can use a history of weather observations: P ( w t = Rainy | w t − 1 = Rainy, w t − 2 = Cloudy, w t − 3 = Cloudy, w t − 4 = Rainy ) Markov Assumption (first order): P ( w t | w t − 1 , w t − 2 , . . . , w 1 ) ≈ P ( w t | w t − 1 )

Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day Can we guess what the weather will be like tomorrow? We can use a history of weather observations: P ( w t = Rainy | w t − 1 = Rainy, w t − 2 = Cloudy, w t − 3 = Cloudy, w t − 4 = Rainy ) Markov Assumption (first order): P ( w t | w t − 1 , w t − 2 , . . . , w 1 ) ≈ P ( w t | w t − 1 ) The joint probability of a sequence of observations / events is then: n P ( w 1 , w 2 , . . . , w t ) = � P ( w t | w t − 1 ) t =1

Markov Chains Tomorrow Rainy Cloudy � 0 . 7 0 . 3 � Rainy Today Cloudy 0 . 3 0 . 7 Transition probability matrix

Markov Chains 0.7 0.7 Tomorrow 0.3 Rainy Cloudy � 0 . 7 0 . 3 � Rainy Today Cloudy 0 . 3 0 . 7 0.3 Transition probability matrix Two states: rainy and cloudy

Markov Chains 0.7 0.7 Tomorrow 0.3 Rainy Cloudy � 0 . 7 0 . 3 � Rainy Today Cloudy 0 . 3 0 . 7 0.3 Transition probability matrix Two states: rainy and cloudy A Markov Chain is a stochastic process that embodies the Markov Assumption. Can be viewed as a probabilistic finite-state automaton. States are fully observable, finite and discrete; transitions are labelled with transition probabilities. Models sequential problems – your current situation depends on what happened in the past

Markov Chains Useful for modeling the probability of a sequence of events Valid phone sequences in speech recognition Sequences of speech acts in dialog systems (answering, ordering, opposing) Predictive texting

Markov Chains Useful for modeling the probability of a sequence of events that can be unambiguously observed Valid phone sequences in speech recognition Sequences of speech acts in dialog systems (answering, ordering, opposing) Predictive texting

Markov Chains Useful for modeling the probability of a sequence of events that can be unambiguously observed Valid phone sequences in speech recognition Sequences of speech acts in dialog systems (answering, ordering, opposing) Predictive texting What if we are interested in events that are not unambiguously observed?

Markov Model 0.7 0.7 0.3 0.3

Markov Model: A Time-elapsed view

Hidden Markov Model: A Time-elapsed view Hidden Observed Underlying Markov Chain over hidden states. We only have access to the observations at each time step. There is no 1:1 mapping between observations and hidden states. A number of hidden states can be associated with a particular observation, but the association of states and observations is governed by statistical behaviour. We now have to infer the sequence of hidden states that correspond to a sequence of observations.

Hidden Markov Model: A Time-elapsed view Hidden Observed Umbrella No umbrella Rainy Cloudy Rainy � 0 . 9 0 . 1 � � � Rainy 0 . 7 0 . 3 Cloudy 0 . 2 0 . 8 Cloudy 0 . 3 0 . 7 Emission probabilities P ( o t | w t ) Transition probabilities P ( w t | w t − 1 ) (Observation likelihoods)

Hidden Markov Model: A Time-elapsed view – start and end states s f s 0 Hidden Observed Could use initial probability distribution over hidden states. Instead, for simplicity, we will also model this probability as a transition, and we will explicitly add a special start state. Similarly, we will add a special end state to explicitly model the end of the sequence. Special start and end states not associated with “real” observations.

More formal definition of Hidden Markov Models; States and Observations S e = { s 1 , . . . , s N } a set of N emitting hidden states, s 0 a special start state, s f a special end state. K = { k 1 , . . . k M } an output alphabet of M observations (“vocabulary”). k 0 a special start symbol, k f a special end symbol. O = O 1 . . . O T a sequence of T observations, each one drawn from K . X = X 1 . . . X T a sequence of T states, each one drawn from S e .

More formal definition of Hidden Markov Models; First-order Hidden Markov Model 1 Markov Assumption (Limited Horizon): Transitions depend only on current state: P ( X t | X 1 ...X t − 1 ) ≈ P ( X t | X t − 1 ) 2 Output Independence: Probability of an output observation depends only on the current state and not on any other states or any other observations: P ( O t | X 1 ...X t , ..., X T , O 1 , ..., O t , ..., O T ) ≈ P ( O t | X t )

More formal definition of Hidden Markov Models; State Transition Probabilities A : a state transition probability matrix of size ( N +2) × ( N +2) .   − a 01 a 02 a 03 . . . a 0 N − − a 11 a 12 a 13 . . . a 1 N a 1 f     − a 21 a 22 a 23 . . . a 2 N a 2 f     − . . . . .   A =   − . . . . .     − . . . . .     − a N 1 a N 2 a N 3 . . . a NN a Nf   − − − − − − − − − a ij is the probability of moving from state s i to state s j : a ij = P ( X t = s j | X t − 1 = s i ) N +1 ∀ i � a ij = 1 j =0

More formal definition of Hidden Markov Models; Start state s 0 and end state s f Not associated with “real” observations. a 0 i describe transition probabilities out of the start state into state s i . a if describe transition probabilities into the end state. Transitions into start state ( a i 0 ) and out of end state ( a fi ) undefined.

More formal definition of Hidden Markov Models; Emission Probabilities an emission probability matrix of size ( M + 2) × ( N + 2) . B : b 0 ( k 0 ) − − − − − − − −   − b 1 ( k 1 ) b 2 ( k 1 ) b 3 ( k 1 ) . . . b N ( k 1 ) −   − b 1 ( k 2 ) b 2 ( k 2 ) b 3 ( k 2 ) . . . b N ( k 2 ) −     − . . . . −   B =   − . . . . −     − . . . . −   − b 1 ( k M ) b 2 ( k M ) b 3 ( k M ) . . . b N ( k M ) −   − − − − − − − b f ( k f ) b i ( k j ) is the probability of emitting vocabulary item k j from state s i : b i ( k j ) = P ( O t = k j | X t = s i ) Our HMM is defined by its parameters µ = ( A, B ) .

Examples where states are hidden Speech recognition Observations: audio signal States: phonemes Part-of-speech tagging (assigning tags like Noun and Verb to words) Observations: words States: part-of-speech tags Machine translation Observations: target words States: source words

8: Hidden Markov Models Machine Learning and Real-world Data Helen - PowerPoint PPT Presentation

8: Hidden Markov Models Machine Learning and Real-world Data Helen Yannakoudakis 1 Computer Laboratory University of Cambridge Lent 2018 1 Based on slides created by Simone Teufel So far weve looked at (statistical) classification.

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing

Outline Sequential Data - Part 2 Greg Mori - CMPT 419/726 Hidden Markov Models - Most Likely

Markov Chains and Hidden Markov Models CE417: Introduction to Artificial Intelligence Sharif

The computational complexity of analyzing infinite-state structured Markov Chains and structured

Markov chains and the number of occurrences of a word in a sequence (4.54.9, 11.1,2,4,6)

18.175: Lecture 31 More Markov chains Scott Sheffield MIT 1 18.175 Lecture 31 Outline

Introduction to Markov Chain Monte Carlo Olivier Le Matre 1 with Omar Knio (KAUST) 1 Centre de

Randomness in Computing L ECTURE 26 Last time Randomized algorithm for 3SAT Gamblers

Markov Decision Processes and Dynamic Programming A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo