HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard - PowerPoint PPT Presentation

HIDDEN�MARKOV�MODELS IN�SPEECH�RECOGNITION Wayne�Ward Carnegie�Mellon�University Pittsburgh,�PA 1

Acknowledgements Much�of�this�talk�is�derived�from�the�paper "An�Introduction�to�Hidden�Markov�Models", by Rabiner and Juang and�from�the�talk "Hidden�Markov�Models:�Continuous�Speech� Recognition" by�Kai-Fu�Lee 2

Topics • Markov�Models�and�Hidden�Markov�Models • HMMs applied�to�speech�recognition • Training • Decoding 3

Speech�Recognition O 1 O 2 � �O T W 1 W 2 � �W T Front Match End Search Analog Discrete Word Speech Observations Sequence 4

ML�Continuous�Speech�Recognition Goal: Given��acoustic�data��A�=�a 1 ,�a 2 ,�..., a k Find�word�sequence��W�=�w 1 ,�w 2 ,�... w n Such�that��P(W�|�A)��is�maximized Bayes Rule: acoustic�model�(HMMs) language�model P(A�|�W)��•�P(W) P(W�|�A)��=� P(A) P(A)�is�a�constant�for�a�complete�sentence 5

= = Markov�Models Elements: S�� = � S 0 , �S 1 , � �S N States�: P qt�=�Si�|�qt-1�=�Sj Transition�probabilities�:� P(B�|�B) P(A�|�A) P(B�|�A) A B P(A�|�B) Markov�Assumption: Transition�probability�depends�only�on�current�state P q t = S i |�q t-1 = S j ,� q t-2 = S k ,� =�P q t = S i |�q t-1� =�S j =�� a ji N ∀ aji� ≥ �0�� ∀ �j,i a 1 �� j ∑ ji i 0 6

Single�Fair�Coin 0.5 0.5 0.5 1 2 0.5 P(H)�=�1.0 P(H)�=�0.0 P(T)�=��0.0 P(T)�=��1.0 Outcome�head�corresponds�to�state�1,�tail�to�state�2 Observation�sequence�uniquely�defines�state�sequence 7

Hidden�Markov�Models Elements: S��=� S0,�S1,� SN States �P qt�=�Si�|�qt-1�=�Sj ��=��aji Transition�probabilities Output prob distributions�� P(yt�=�Ok�|�qt�=�Sj)�=�b j k (at�state�j�for�symbol�k) P(B�|�B) P(A�|�A) P O 1 � | �B P O 1 � | �A P O 2 � | �B P O 2 � | �A P(B�|�A) P O M � | �B P O M � | �A A B Prob Prob P(A�|�B) Obs Obs 8

Discrete�Observation�HMM •��•��• P(R)�=�0.31 P(R)�=�0.50 P(R)�=�0.38 P(B)�=�0.50 P(B)�=�0.25 P(B)�=�0.12 P(Y)�=�0.19 P(Y)�=�0.25 P(Y)�=�0.50 Observation�sequence:��R�B�Y�Y��•�•�•�R not�unique�to�state�sequence 9

HMMs In�Speech�Recognition Represent�speech�as�a�sequence�of�observations Use�HMM�to�model�some�unit�of�speech�(phone,�word) Concatenate�units�into�larger�units ih Phone�Model d d ih Word�Model 10

HMM�Problems�And�Solutions Evaluation: •�Problem�- Compute Probabilty of�observation sequence�given�a�model •�Solution�- Forward�Algorithm� and Viterbi Algorithm Decoding: •�Problem�- Find�state�sequence�which�maximizes probability�of�observation�sequence •�Solution�- Viterbi Algorithm Training: •�Problem�- Adjust�model�parameters�to�maximize probability�of�observed�sequences •�Solution�- Forward-Backward�Algorithm 11

( × ( ) = ( ( ) ( ) ) Evaluation Probability�of�observation�sequence� O �=�O1�O2� �OT given�HMM�model� λ ฀ λ ฀ is�: λ ฀ λ ฀ λ λ P O P O Q | , | ) �� Q�=�q 0 q 1 …�q T is�a�state�sequence ∑ ∀ Q a b O a b O a b O q T = ∑ L q q q q q q q q T 1 2 − T 1 T 0 1 1 1 2 2 Not�practical�since�the�number�of�paths�is� O(�N T �) N�=�number�of�states�in�model� T�=�number�of�observations�in�sequence 12

( > = ( ) ( ( ) ) ) = The�Forward�Algorithm α �O t ,�q t �=�S j �|� λ ��) � t� j �=�P(O 1 �O 2 � Compute� α ฀ α ฀ recursively: α ฀ α ฀ 1 if�j�is�start�state α � 0� j �=� 0 otherwise N   α α j i a b O t �� 0   − t t ij j t ∑ 1 ( )   i 0     λ = α P O S O N 2 T | �� Computatio n � is �� ( ) T N 13

Forward�Trellis 0.6 1.0 A��0.8 A��0.3 B��0.2 B��0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.23 0.48 0.03 1.0 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.09 0.12 0.13 0.0 14

= ( ) ) ( ) ( = + + < ) ( ) ) ( ( The�Backward�Algorithm β �q t �=�S i �,� λ ��t (i)�=�P(O t+1 �O t+2 � �O T �| ��) Compute� β β β β recursively: 1 if�i�is�end�state β ��T (i)�=� 0 otherwise N β 1 β i a b O j t T �� = ∑ t ij j t t 1 j=0 λ β α P O S S O N T 2 | �� Computatio n � is �� ( ) T N 0 0 15

Backward�Trellis 0.6 1.0 A��0.8 A��0.3 B��0.2 B��0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.28 0.22 0.0 0.13 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.7 0.21 1.0 0.06 16

The Viterbi Algorithm For�decoding: Find�the�state�sequence Q�� which�maximizes P(O,�Q�|� λ λ λ λ ) Similar�to�Forward�Algorithm�except MAX� instead�of SUM �O t ,�q t =i�|� λ ��) VP t (i)�=�MA X q 0 , q t-1 ��P(O 1 O 2 � Recursive�Computation: VP t (j)�=�MAX i=0,� ,�N �VP t-1 (i)��a ij b j (O t )��t�>�0 P(O,�Q�|� λ ��)�=�VP T (S N ) Save�each�maximum�for backtrace at�end 17

Viterbi Trellis 0.6 1.0 A��0.8 A��0.3 B��0.2 B��0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.23 0.48 0.03 1.0 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.06 0.12 0.06 0.0 18

Training�HMM�Parameters Train�parameters�of�HMM Tune� λ ฀ to�maximize�P(O�|� λ ) • • No�efficient�algorithm�for�global�optimum • Efficient�iterative�algorithm�finds�a�local�optimum Baum-Welch�(Forward-Backward)�re-estimation Compute�probabilities�using�current�model� λ • Refine�� λ ฀ −− > λ ฀฀ based�on�computed�values • Use� α and�� β from�Forward-Backward� • 19

Forward-Backward�Algorithm S j ξ t (i,j)�=� S i Probability�of�transiting�from��to at�time��t��given�O =�P(�q t =�S i ,�q t+1 =�S j �|�O,� λ ��) =� α t (i)�a ij �b j (O t+1 )� β t+1 (j) P(O�|� λ ��) α t (i) β t+1 (j) a ij b j (O t+1 ) 20

= = ( ) = = = = ( = ) ) ( ) ( = = Baum-Welch Reestimation expected�number�of�trans�from� Si�to�Sj aij�=� expected�number�of�trans�from� Si − T 1 ξ i j , ∑ t t 0 − T N 1 ξ i j , ∑∑ t t j 0 0 b j (k)�=�expected�number�of�times�in�state�j�with�symbol�k expected�number�of�times�in�state�j�� N ξ i j �� , ∑ ∑ t t O k i : 0 t +1 T − N 1 ξ i j , ∑∑ t t i 0 0 21

3.฀ 3.฀ 3.฀ 4.฀ 4.฀ 4.฀ Convergence�of�FB�Algorithm 1. Initialize�� λ ฀ λ ฀ =�(A,B) λ ฀ λ ฀ 2. Compute� α α α α ,� β β β β ,�and� ξ ξ ξ ξ 3.฀ Estimate� λ λ λ λ =�(A,�B)�from� ξ ξ ξ ξ 4.฀ Replace� λ λ λ with� λ λ λ λ λ 5.�If�not�converged�go�to�2 It�can�be�shown�that��P(O�|� λ λ λ λ )�>�P(O�|� λ λ λ λ )�unless� λ λ λ λ =� λ ฀ λ ฀ λ ฀ λ ฀ 22

HMMs In�Speech�Recognition Represent�speech�as�a�sequence�of�symbols Use�HMM�to�model�some�unit�of�speech�(phone,�word) Output�Probabilities�- Prob of�observing�symbol�in�a�state Transition Prob - Prob of�staying�in�or�skipping�state Phone�Model 23

Training HMMs for�Continuous�Speech • Use�only orthograph transcription�of�sentence • no�need�for�segmented/labelled data • Concatenate�phone�models�to�give�word�model • Concatenate�word�models�to�give�sentence�model • Train�entire�sentence�model�on�entire�spoken�sentence 24

Forward-Backward�Training for�Continuous�Speech ALL SHOW ALERTS AX L SH L TS AA ER OW 25

Recognition�Search /w/ /ah/ /ts/ /ax/ /th/ /w/�->�/ah/�->�/ts/��/th/�->�/ax/ location willamette's what's the kirk's longitude sterett's lattitude display 26

HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard - PowerPoint PPT Presentation

HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard CarnegieMellonUniversity Pittsburgh,PA 1 Acknowledgements Muchofthistalkisderivedfromthepaper

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 8: Hidden

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Real-valued average consensus over noisy quantized channels Andrea Censi Richard Murray Control

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , Matthias Grossglauser LCA

Object Oriented Software Development Naufal F. Setiawan School of Computing and Information

Vietoris-Rips Complexes of Regular Polygons Samir Chowdhury Adam Jaffe The Ohio State University

Signed tropical convexity Georg Loho joint work with L aszl o V egh London School of

CSE 255 Lecture 4 Data Mining and Predictive Analytics Graphical Models 4. Network modularity

CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite Programming Instructor:

Singularly Perturbed Algorithms for Dynamic Average Consensus Solmaz S. Kia, Jorge Corts, Sonia