8 hidden markov models
play

8: Hidden Markov Models Machine Learning and Real-world Data Helen - PowerPoint PPT Presentation

8: Hidden Markov Models Machine Learning and Real-world Data Helen Yannakoudakis 1 Computer Laboratory University of Cambridge Lent 2018 1 Based on slides created by Simone Teufel So far weve looked at (statistical) classification.


  1. 8: Hidden Markov Models Machine Learning and Real-world Data Helen Yannakoudakis 1 Computer Laboratory University of Cambridge Lent 2018 1 Based on slides created by Simone Teufel

  2. So far we’ve looked at (statistical) classification. Experimented with different ideas for sentiment detection. Let us now talk about . . .

  3. So far we’ve looked at (statistical) classification. Experimented with different ideas for sentiment detection. Let us now talk about . . . the weather!

  4. Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day

  5. Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day Can we guess what the weather will be like tomorrow?

  6. Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day Can we guess what the weather will be like tomorrow? We can use a history of weather observations: P ( w t = Rainy | w t − 1 = Rainy, w t − 2 = Cloudy, w t − 3 = Cloudy, w t − 4 = Rainy )

  7. Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day Can we guess what the weather will be like tomorrow? We can use a history of weather observations: P ( w t = Rainy | w t − 1 = Rainy, w t − 2 = Cloudy, w t − 3 = Cloudy, w t − 4 = Rainy ) Markov Assumption (first order): P ( w t | w t − 1 , w t − 2 , . . . , w 1 ) ≈ P ( w t | w t − 1 )

  8. Weather prediction Two types of weather: rainy and cloudy The weather doesn’t change within the day Can we guess what the weather will be like tomorrow? We can use a history of weather observations: P ( w t = Rainy | w t − 1 = Rainy, w t − 2 = Cloudy, w t − 3 = Cloudy, w t − 4 = Rainy ) Markov Assumption (first order): P ( w t | w t − 1 , w t − 2 , . . . , w 1 ) ≈ P ( w t | w t − 1 ) The joint probability of a sequence of observations / events is then: n P ( w 1 , w 2 , . . . , w t ) = � P ( w t | w t − 1 ) t =1

  9. Markov Chains Tomorrow Rainy Cloudy � 0 . 7 0 . 3 � Rainy Today Cloudy 0 . 3 0 . 7 Transition probability matrix

  10. Markov Chains 0.7 0.7 Tomorrow 0.3 Rainy Cloudy � 0 . 7 0 . 3 � Rainy Today Cloudy 0 . 3 0 . 7 0.3 Transition probability matrix Two states: rainy and cloudy

  11. Markov Chains 0.7 0.7 Tomorrow 0.3 Rainy Cloudy � 0 . 7 0 . 3 � Rainy Today Cloudy 0 . 3 0 . 7 0.3 Transition probability matrix Two states: rainy and cloudy A Markov Chain is a stochastic process that embodies the Markov Assumption. Can be viewed as a probabilistic finite-state automaton. States are fully observable, finite and discrete; transitions are labelled with transition probabilities. Models sequential problems – your current situation depends on what happened in the past

  12. Markov Chains Useful for modeling the probability of a sequence of events Valid phone sequences in speech recognition Sequences of speech acts in dialog systems (answering, ordering, opposing) Predictive texting

  13. Markov Chains Useful for modeling the probability of a sequence of events that can be unambiguously observed Valid phone sequences in speech recognition Sequences of speech acts in dialog systems (answering, ordering, opposing) Predictive texting

  14. Markov Chains Useful for modeling the probability of a sequence of events that can be unambiguously observed Valid phone sequences in speech recognition Sequences of speech acts in dialog systems (answering, ordering, opposing) Predictive texting What if we are interested in events that are not unambiguously observed?

  15. Markov Model 0.7 0.7 0.3 0.3

  16. Markov Model: A Time-elapsed view

  17. Hidden Markov Model: A Time-elapsed view Hidden Observed Underlying Markov Chain over hidden states. We only have access to the observations at each time step. There is no 1:1 mapping between observations and hidden states. A number of hidden states can be associated with a particular observation, but the association of states and observations is governed by statistical behaviour. We now have to infer the sequence of hidden states that correspond to a sequence of observations.

  18. Hidden Markov Model: A Time-elapsed view Hidden Observed Umbrella No umbrella Rainy Cloudy Rainy � 0 . 9 0 . 1 � � � Rainy 0 . 7 0 . 3 Cloudy 0 . 2 0 . 8 Cloudy 0 . 3 0 . 7 Emission probabilities P ( o t | w t ) Transition probabilities P ( w t | w t − 1 ) (Observation likelihoods)

  19. Hidden Markov Model: A Time-elapsed view – start and end states s f s 0 Hidden Observed Could use initial probability distribution over hidden states. Instead, for simplicity, we will also model this probability as a transition, and we will explicitly add a special start state. Similarly, we will add a special end state to explicitly model the end of the sequence. Special start and end states not associated with “real” observations.

  20. More formal definition of Hidden Markov Models; States and Observations S e = { s 1 , . . . , s N } a set of N emitting hidden states, s 0 a special start state, s f a special end state. K = { k 1 , . . . k M } an output alphabet of M observations (“vocabulary”). k 0 a special start symbol, k f a special end symbol. O = O 1 . . . O T a sequence of T observations, each one drawn from K . X = X 1 . . . X T a sequence of T states, each one drawn from S e .

  21. More formal definition of Hidden Markov Models; First-order Hidden Markov Model 1 Markov Assumption (Limited Horizon): Transitions depend only on current state: P ( X t | X 1 ...X t − 1 ) ≈ P ( X t | X t − 1 ) 2 Output Independence: Probability of an output observation depends only on the current state and not on any other states or any other observations: P ( O t | X 1 ...X t , ..., X T , O 1 , ..., O t , ..., O T ) ≈ P ( O t | X t )

  22. More formal definition of Hidden Markov Models; State Transition Probabilities A : a state transition probability matrix of size ( N +2) × ( N +2) .   − a 01 a 02 a 03 . . . a 0 N − − a 11 a 12 a 13 . . . a 1 N a 1 f     − a 21 a 22 a 23 . . . a 2 N a 2 f     − . . . . .   A =   − . . . . .     − . . . . .     − a N 1 a N 2 a N 3 . . . a NN a Nf   − − − − − − − − − a ij is the probability of moving from state s i to state s j : a ij = P ( X t = s j | X t − 1 = s i ) N +1 ∀ i � a ij = 1 j =0

  23. More formal definition of Hidden Markov Models; State Transition Probabilities A : a state transition probability matrix of size ( N +2) × ( N +2) .   − a 01 a 02 a 03 . . . a 0 N − − a 11 a 12 a 13 . . . a 1 N a 1 f     − a 21 a 22 a 23 . . . a 2 N a 2 f     − . . . . .   A =   − . . . . .     − . . . . .     − a N 1 a N 2 a N 3 . . . a NN a Nf   − − − − − − − − − a ij is the probability of moving from state s i to state s j : a ij = P ( X t = s j | X t − 1 = s i ) N +1 ∀ i � a ij = 1 j =0

  24. More formal definition of Hidden Markov Models; Start state s 0 and end state s f Not associated with “real” observations. a 0 i describe transition probabilities out of the start state into state s i . a if describe transition probabilities into the end state. Transitions into start state ( a i 0 ) and out of end state ( a fi ) undefined.

  25. More formal definition of Hidden Markov Models; Emission Probabilities an emission probability matrix of size ( M + 2) × ( N + 2) . B : b 0 ( k 0 ) − − − − − − − −   − b 1 ( k 1 ) b 2 ( k 1 ) b 3 ( k 1 ) . . . b N ( k 1 ) −   − b 1 ( k 2 ) b 2 ( k 2 ) b 3 ( k 2 ) . . . b N ( k 2 ) −     − . . . . −   B =   − . . . . −     − . . . . −   − b 1 ( k M ) b 2 ( k M ) b 3 ( k M ) . . . b N ( k M ) −   − − − − − − − b f ( k f ) b i ( k j ) is the probability of emitting vocabulary item k j from state s i : b i ( k j ) = P ( O t = k j | X t = s i ) Our HMM is defined by its parameters µ = ( A, B ) .

  26. More formal definition of Hidden Markov Models; Emission Probabilities an emission probability matrix of size ( M + 2) × ( N + 2) . B : b 0 ( k 0 ) − − − − − − − −   − b 1 ( k 1 ) b 2 ( k 1 ) b 3 ( k 1 ) . . . b N ( k 1 ) −   − b 1 ( k 2 ) b 2 ( k 2 ) b 3 ( k 2 ) . . . b N ( k 2 ) −     − . . . . −   B =   − . . . . −     − . . . . −   − b 1 ( k M ) b 2 ( k M ) b 3 ( k M ) . . . b N ( k M ) −   − − − − − − − b f ( k f ) b i ( k j ) is the probability of emitting vocabulary item k j from state s i : b i ( k j ) = P ( O t = k j | X t = s i ) Our HMM is defined by its parameters µ = ( A, B ) .

  27. Examples where states are hidden Speech recognition Observations: audio signal States: phonemes Part-of-speech tagging (assigning tags like Noun and Verb to words) Observations: words States: part-of-speech tags Machine translation Observations: target words States: source words

Recommend


More recommend