Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models 2 Inferences from HMMs 3 Evaluation Decoding Training an HMM 4 Baum-Welch Algorithm Model Selection 2
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Introduction Sequences of input, not i.i.d. Sequences in time: phonemes in a word, words in a sentence, pen movements in handwriting Sequences in space: base pairs in DNA 3
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Discrete Markov Processes N states: S 1 , S 2 , . . . , S N State at “time” t : q t = S i First-order Markov: prob of entering a state depnds on only the most recent prior state P ( q t +1 = S j | q t = S i , q t − 1 = S k , . . . ) = P ( q t +1 = S j | q t = S i ) Transition probabilities are independent of time N � a ij ≡ P ( q t +1 = S j | q t = S i ) , ; a ij ≥ 0 ∧ a ij = 1 j =1 Initial probabilities N � π i ≡ P ( q 1 = S i ) , ; π i = 1 j =1 4
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Stochastic Automaton 5
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Example: Balls & Urns Three urns each full of balls of one color. A “genii” moves randomly from urn to urn selecting balls. S 1 : red, S 2 : blue, S 3 : green π = [0 . 5 , 0 . 2 , 0 . 3] T � 0 . 4 0 . 3 0 . 3 A = 0 . 2 0 . 6 0 . 2 0 . 1 0 . 1 0 . 8 Suppose we observe O = [red, red, green, green] P ( O | A ,� π ) = P ( S 1 ) P ( S 1 | S 1 ) P ( S 3 | S 1 ) P ( S 3 | S 3 ) = π 1 a 11 a 13 a 33 = 0 . 5 ∗ 0 . 4 ∗ 0 . 3 ∗ 0 . 8 = 0 . 048 6
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hiding the Model Now suppose that The urns and the genii are hidden behind a screen. The urns start with different mixtures of all three colors and (if we’re really unlucky) we don’t even know how many urns there are Suppose we observe O = [red, red, green, green]. Can we say anything at all? 7
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models States are not observable Discrete observations [ v 1 , v 2 , . . . , v M ] are recorded Each is a probabilisitic function of the state Emission probabilities b j ( m ) ≡ P ( O t = v m | q t = S j ) For any given sequence of observations, there may be multiple possible state sequences. 8
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM HMM Unfolded in Time 9
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Elements of an HMM An HMM λ = ( A , B ,� π ) A = [ a ij ]: N × N state transition probability matrix N is number of hidden states B = b j ( m ): N × M emission probability matrix M is number of observation symbols � π = [ π i ]’: N × 1 initial state probability vector 10
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Making Inferences from an HMM Evaluation : Given λ and O , calculate P ( O | λ ) Example: Given several HMMs, each trained to recognize a different handwritten character, and given a sequence of pen strokes, which character is most likely denoted by that sequence? Decoding : Given λ and O , what is the most probable sequence of states leading to that observation? Example: Given an HMM trianed on sentences and a sequence of words, some of which can belong to multiple syntactical classes (e.g., “green” can be an adjective, a noun, or a verb), determine the most likely syntactic class from surrounding context. Related problems: most likely starting or ending state 11
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Decoding Example What’s the weather been? States can be “labeled” even though “hidden”. 12
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Evaluation Given λ and O , calculate P ( O | λ ) If we knew the state sequence � q , we could do T T � � P ( O | λ,� q ) = P ( O t | q t , λ ) = b q t ( O t ) t =1 t =1 The prob of a state sequence is T − 1 � P ( � q | λ ) = π q 1 a q t q t +1 t =1 T − 1 � P ( O ,� q | λ ) = π q 1 b q 1 ( O 1 ) a q t q t +1 b q t +1 ( O t +1 ) t =1 � P ( O | λ ) = P ( O ,� q | λ ) } all possible � q which is totally impractical 13
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Forward Variable α t ( i ) ≡ P ( O 1 . . . O t , q t = S i | λ ) N � P ( O | λ ) = α T ( i ) i =1 Computed recursively Initial: α 1 ( i ) = π i b i ( O 1 ) Recursion: � N � � α t +1 ( j ) = α t ( i ) a ij b j ( O t +1 ) i =1 14
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Decoding Given λ and O , what is the most probable sequence of states leading to that observation? Start by introducing a backward variable : β t ( i ) ≡ P ( O t +1 . . . O T | q t = S i , λ ) Initial: β T ( i ) = 1 Recursion: N � β t ( i ) = a ij b j ( O t +1 ) β t +1 ( j ) j =1 15
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Viterbi’s Algorithm A constrained optimizer for state graph traversal Dynamic programming algorithm: Assign a cost to each edge Update path metrics by addition from shorter paths. Discard suboptimal cases. Starting from the final state, trace back the optimal path. 16
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM The HMM Trellis 17
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Viterbi’s Algorithm for HMMs δ t ( i ) ≡ q 1 q 2 ... q t − 1 p ( q 1 q 2 . . . q t − 1 , q t = S i , O 1 . . . O t | λ ) max Initial: δ 1 ( i ) = π i b i ( O 1 ), ψ 1 ( i ) = 0 Iterate: δ t ( i ) = max i δ t − 1 ( i ) a ij b j ( O t ) ψ t ( j ) = arg max i δ t − 1( i ) a ij Optimum: p ∗ = max i δ T ( i ), q ∗ T = arg max i δ T ( i ) Backtrack: q ∗ t = ψ t +1 ( q ∗ t +1 ), t = T − 1 , . . . , 1 Examples: Numeric sequence, fixed problem Coin Flipping, Customizable Spelling Correction as a decoding problem 18
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Training an HMM Need to estimate a ij , π i , b j ( m ) that maximize the likelihood of observing a set or training instances X = { O k } K k =1 19
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Baum-Welch Algorithm - Overview An E-M style algorithm Repeatedly apply the steps E : Use the current λ = ( A , B ,� π ) to compute, for each training instance, the probability of being in S i at time t the probability of making the transition from S i to S j at time t + 1 M : Update the values of λ = ( A , B ,� π ) to maximize the likelihood of matching those probabilities. 20
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM From the Training Data � 1 if q t = S i z t i = 0 ow k z k � = ˆ Note that P ( q t = S i | λ ) i K � 1 if q t = S i ∧ q t +1 = S j z t ij = 0 ow k z k � = ˆ ij Note that P ( q t = S i , q t +1 = S j | λ ) K 21
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM From the HMM γ t ( i ) ≡ P ( q t = S i | O , λ ) α t ( i ) β t ( i ) = � N j =1 α t ( j ) β t ( j ) During Baum-Welch, we estimate as γ k t ( i ) ≈ E [ z t i ] 22
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM From the HMM ξ t ( i , j ) ≡ P ( q t = S i , q t +1 = S j | O , λ ) α t ( i ) a ij b j ( O t +1 ) β t +1 ( j ) = � � m α t ( k ) a km b m ( O t +1 ) β t +1 ( m ) k During Baum-Welch, we estimate as ξ k t ( i , j ) ≈ E [ z t ij ] 23
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Baum-Welch Algorithm - E Repeatedly apply the steps E : For each O k , γ k t ( i ) ← E [ z t i ] ξ k t ( i , j ) ← E [ z t ij ] Then average over all observations: � K k =1 γ k t ( i ) γ t ( i ) ← K � K k =1 ξ k t ( i , j ) ξ t ( i , j ) ← K M : Update the values of λ = ( A , B ,� π ) to maximize the likelihood of matching those probabilities. 24
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Updating A Expected number of transitions from S i to S j is � t ξ t ( i , j ) Expected number of times to be in S i is � t γ t ( i ) Therefore the probability of the transition from S i to S j is � t ξ t ( i , j ) ˆ a ij = � t γ t ( i ) 25
Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Updating B Expected number of times we see v m when the system is in S j is � T t =1 γ t ( j )1( O t = v m ) Expected number of times will be in S j is � t γ t ( j ) Therefore the probability emitting v m from S j is t γ k t ( j )1( O k � � t = v m ) ˆ k b j ( m ) = � � t γ k t ( i ) k 26
Recommend
More recommend