HIDDEN�MARKOV�MODELS IN�SPEECH�RECOGNITION Wayne�Ward Carnegie�Mellon�University Pittsburgh,�PA 1
Acknowledgements Much�of�this�talk�is�derived�from�the�paper "An�Introduction�to�Hidden�Markov�Models", by Rabiner and Juang and�from�the�talk "Hidden�Markov�Models:�Continuous�Speech� Recognition" by�Kai-Fu�Lee 2
Topics • Markov�Models�and�Hidden�Markov�Models • HMMs applied�to�speech�recognition • Training • Decoding 3
Speech�Recognition O 1 O 2 � �O T W 1 W 2 � �W T Front Match End Search Analog Discrete Word Speech Observations Sequence 4
ML�Continuous�Speech�Recognition Goal: Given��acoustic�data����A�=�a 1 ,�a 2 ,�..., a k Find�word�sequence����W�=�w 1 ,�w 2 ,�... w n Such�that���P(W�|�A)��is�maximized Bayes Rule: acoustic�model�(HMMs) language�model P(A�|�W)��•�P(W) P(W�|�A)��=� P(A) P(A)�is�a�constant�for�a�complete�sentence 5
= = Markov�Models Elements: S�� = � S 0 , �S 1 , � �S N States�: P qt�=�Si�|�qt-1�=�Sj Transition�probabilities�:� P(B�|�B) P(A�|�A) P(B�|�A) A B P(A�|�B) Markov�Assumption: Transition�probability�depends�only�on�current�state P q t = S i |�q t-1 = S j ,� q t-2 = S k ,� =�P q t = S i |�q t-1� =�S j =�� a ji N ∀ aji� ≥ �0�� ∀ �j,i a 1 ������ j ∑ ji i 0 6
Single�Fair�Coin 0.5 0.5 0.5 1 2 0.5 P(H)�=�1.0 P(H)�=�0.0 P(T)�=��0.0 P(T)�=��1.0 Outcome�head�corresponds�to�state�1,�tail�to�state�2 Observation�sequence�uniquely�defines�state�sequence 7
Hidden�Markov�Models Elements: S��=� S0,�S1,� SN States �P qt�=�Si�|�qt-1�=�Sj ��=��aji Transition�probabilities Output prob distributions���� P(yt�=�Ok�|�qt�=�Sj)�=�b j k (at�state�j�for�symbol�k) P(B�|�B) P(A�|�A) P O 1 � | �B P O 1 � | �A P O 2 � | �B P O 2 � | �A P(B�|�A) P O M � | �B P O M � | �A A B Prob Prob P(A�|�B) Obs Obs 8
Discrete�Observation�HMM •��•��• P(R)�=�0.31 P(R)�=�0.50 P(R)�=�0.38 P(B)�=�0.50 P(B)�=�0.25 P(B)�=�0.12 P(Y)�=�0.19 P(Y)�=�0.25 P(Y)�=�0.50 Observation�sequence:��R�B�Y�Y��•�•�•�R not�unique�to�state�sequence 9
HMMs In�Speech�Recognition Represent�speech�as�a�sequence�of�observations Use�HMM�to�model�some�unit�of�speech�(phone,�word) Concatenate�units�into�larger�units ih Phone�Model d d ih Word�Model 10
HMM�Problems�And�Solutions Evaluation: •�Problem�- Compute Probabilty of�observation sequence�given�a�model •�Solution�- Forward�Algorithm� and Viterbi Algorithm Decoding: •�Problem�- Find�state�sequence�which�maximizes probability�of�observation�sequence •�Solution�- Viterbi Algorithm Training: •�Problem�- Adjust�model�parameters�to�maximize probability�of�observed�sequences •�Solution�- Forward-Backward�Algorithm 11
( × ( ) = ( ( ) ( ) ) Evaluation Probability�of�observation�sequence� O �=�O1�O2� �OT given�HMM�model� λ λ is�: λ λ λ λ P O P O Q | , | ) �� Q�=�q 0 q 1 …�q T is�a�state�sequence ∑ ∀ Q a b O a b O a b O q T = ∑ L q q q q q q q q T 1 2 − T 1 T 0 1 1 1 2 2 Not�practical�since�the�number�of�paths�is� O(�N T �) N�=�number�of�states�in�model� T�=�number�of�observations�in�sequence 12
( > = ( ) ( ( ) ) ) = The�Forward�Algorithm α �O t ,�q t �=�S j �|� λ ��) � t� j �=�P(O 1 �O 2 � Compute� α α recursively: α α 1 if�j�is�start�state α � 0� j �=� 0 otherwise N α α j i a b O t ������� 0 − t t ij j t ∑ 1 ( ) i 0 λ = α P O S O N 2 T | �������� Computatio n � is �� ( ) T N 13
Forward�Trellis 0.6 1.0 A���0.8 A���0.3 B���0.2 B���0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.23 0.48 0.03 1.0 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.09 0.12 0.13 0.0 14
= ( ) ) ( ) ( = + + < ) ( ) ) ( ( The�Backward�Algorithm β �q t �=�S i �,� λ ��t (i)�=�P(O t+1 �O t+2 � �O T �| ��) Compute� β β β β recursively: 1 if�i�is�end�state β ��T (i)�=� 0 otherwise N β 1 β i a b O j t T �������� = ∑ t ij j t t 1 j=0 λ β α P O S S O N T 2 | �������� Computatio n � is �� ( ) T N 0 0 15
Backward�Trellis 0.6 1.0 A���0.8 A���0.3 B���0.2 B���0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.28 0.22 0.0 0.13 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.7 0.21 1.0 0.06 16
The Viterbi Algorithm For�decoding: Find�the�state�sequence Q�� which�maximizes P(O,�Q�|� λ λ λ λ ) Similar�to�Forward�Algorithm�except MAX� instead�of SUM �O t ,�q t =i�|� λ ��) VP t (i)�=�MA X q 0 , q t-1 ��P(O 1 O 2 � Recursive�Computation: VP t (j)�=�MAX i=0,� ,�N �VP t-1 (i)��a ij b j (O t )������t�>�0 P(O,�Q�|� λ ��)�=�VP T (S N ) Save�each�maximum�for backtrace at�end 17
Viterbi Trellis 0.6 1.0 A���0.8 A���0.3 B���0.2 B���0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.23 0.48 0.03 1.0 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.06 0.12 0.06 0.0 18
Training�HMM�Parameters Train�parameters�of�HMM Tune� λ to�maximize�P(O�|� λ ) • • No�efficient�algorithm�for�global�optimum • Efficient�iterative�algorithm�finds�a�local�optimum Baum-Welch�(Forward-Backward)�re-estimation Compute�probabilities�using�current�model� λ • Refine�� λ −− > λ based�on�computed�values • Use� α and�� β from�Forward-Backward� • 19
Forward-Backward�Algorithm S j ξ t (i,j)�=� S i Probability�of�transiting�from�������to at�time��t��given�O =�P(�q t =�S i ,�q t+1 =�S j �|�O,� λ ��) =� α t (i)�a ij �b j (O t+1 )� β t+1 (j) P(O�|� λ ��) α t (i) β t+1 (j) a ij b j (O t+1 ) 20
= = ( ) = = = = ( = ) ) ( ) ( = = Baum-Welch Reestimation expected�number�of�trans�from� Si�to�Sj aij�=� expected�number�of�trans�from� Si − T 1 ξ i j , ∑ t t 0 − T N 1 ξ i j , ∑∑ t t j 0 0 b j (k)�=�expected�number�of�times�in�state�j�with�symbol�k expected�number�of�times�in�state�j�� N ξ i j ��� , ∑ ∑ t t O k i : 0 t +1 T − N 1 ξ i j , ∑∑ t t i 0 0 21
3. 3. 3. 4. 4. 4. Convergence�of�FB�Algorithm 1. Initialize�� λ λ =�(A,B) λ λ 2. Compute� α α α α ,� β β β β ,�and� ξ ξ ξ ξ 3. Estimate� λ λ λ λ =�(A,�B)�from� ξ ξ ξ ξ 4. Replace� λ λ λ with� λ λ λ λ λ 5.�If�not�converged�go�to�2 It�can�be�shown�that���P(O�|� λ λ λ λ )�>�P(O�|� λ λ λ λ )�unless� λ λ λ λ =� λ λ λ λ 22
HMMs In�Speech�Recognition Represent�speech�as�a�sequence�of�symbols Use�HMM�to�model�some�unit�of�speech�(phone,�word) Output�Probabilities�- Prob of�observing�symbol�in�a�state Transition Prob - Prob of�staying�in�or�skipping�state Phone�Model 23
Training HMMs for�Continuous�Speech • Use�only orthograph transcription�of�sentence • no�need�for�segmented/labelled data • Concatenate�phone�models�to�give�word�model • Concatenate�word�models�to�give�sentence�model • Train�entire�sentence�model�on�entire�spoken�sentence 24
Forward-Backward�Training for�Continuous�Speech ALL SHOW ALERTS AX L SH L TS AA ER OW 25
Recognition�Search /w/ /ah/ /ts/ /ax/ /th/ /w/�->�/ah/�->�/ts/���������/th/�->�/ax/ location willamette's what's the kirk's longitude sterett's lattitude display 26
Recommend
More recommend