Speech Processing 15-492/18-492 Speech Recognition Grammars Other ASR techniques
But not just acoustics • But not all phones are equi-probable • Find word sequences that maximizes • Using Bayes’ Law • Combine models – Us HMMs to provide – Use language model to provide
Beyond n-grams Tri- -gram languages models gram languages models � Tri � � Good for general ASR Good for general ASR � More targeted models for dialog systems � More targeted models for dialog systems � � Look for more structure Look for more structure �
Formal Language Theory Chomsky Hierarchy � Chomsky Hierarchy � � Finite State Machines Finite State Machines � � Context Free Grammars Context Free Grammars � � Context Sensitive Grammars Context Sensitive Grammars � � Generalized Rewrite Rules/Turing machines Generalized Rewrite Rules/Turing machines � As LM or as Understanding mechanism � As LM or as Understanding mechanism � � Folded into the ASR or only ran on output Folded into the ASR or only ran on output �
Finite State Machines Trigram is a word^2 FSM � Trigram is a word^2 FSM � FSM for greeting � FSM for greeting � Hello Afternoon Good Morning
Finite State Grammar Sentences - -> Start Greeting End > Start Greeting End � Sentences � Greeting - -> “Hello” > “Hello” � Greeting � Greeting - -> “Good” TOD > “Good” TOD � Greeting � TOD - -> Morning > Morning � TOD � TOD - -> Afternoon > Afternoon � TOD �
Context Free Grammar X - -> Y Z > Y Z � X � Y - -> “Terminal” > “Terminal” � Y � Y - -> > NonTerminal NonTerminal NonTerminal NonTerminal � Y �
JSGF Simple grammar formalism for ASR � Simple grammar formalism for ASR � Standard for writing ASR grammars � Standard for writing ASR grammars � Actually finite state � Actually finite state � http://www.w3.org/TR/jsgf � http://www.w3.org/TR/jsgf �
Finite State Machines Finite State Machines: � Finite State Machines: � � Deterministic Deterministic � Each arc leaving a state has unique label Each arc leaving a state has unique label There always exists a Deterministic machine There always exists a Deterministic machine representing a non- -Deterministic one Deterministic one representing a non � Miniminal Miniminal � There exists an FSM with less (or equal) states that There exists an FSM with less (or equal) states that accepts the same language accepts the same language
Probabilistic FSMs Each arc has a label and a probability � Each arc has a label and a probability � Collect probabilities from data � Collect probabilities from data � � Can do smoothing like Can do smoothing like ngrams ngrams �
Natural Language Processing Probably mildly context sensitive � Probably mildly context sensitive � � i.e. you need context sensitive rules i.e. you need context sensitive rules � But if we only accept context free � But if we only accept context free � � Probably OK Probably OK � If we only accept finite state � If we only accept finite state � � Probably OK too Probably OK too �
Writing Grammars for Speech � What do people say? What do people say? � � No what do people *really* say! No what do people *really* say! � � Write examples Write examples � � Please, I’d like a flight to Boston Please, I’d like a flight to Boston � � I want to fly to Boston I want to fly to Boston � � What do you have going to Boston What do you have going to Boston � � What about Boston What about Boston � � Boston Boston � � Write rules grouping things together Write rules grouping things together �
Ignore the unimportant things I’m terribly sorry but I would greatly � I’m terribly sorry but I would greatly � appreciate if you might be able to help me appreciate if you might be able to help me flight to Boston . find an acceptable flight to Boston . find an acceptable I, I wanna wanna want to go to want to go to ehm ehm Boston. Boston. � I, I �
What do people really say � A: see who else will somebody else important all the A: see who else will somebody else important all the � {mumble} the whole school are out for a week {mumble} the whole school are out for a week � B: oh really B: oh really � � A: { A: {lipsmack lipsmack} {breath} yeah } {breath} yeah � � B: okay {breath} well when are you going to come up then B: okay {breath} well when are you going to come up then � � A: um let’s see well I guess I I could come up actually A: um let’s see well I guess I I could come up actually � anytime anytime � B: okay well how about now B: okay well how about now � � A: now A: now � � B: yeah B: yeah � � A: have to work tonight A: have to work tonight – –laugh laugh- - �
Class based language models Conflate all words in same class � Conflate all words in same class � � Cities, Names, numbers etc Cities, Names, numbers etc � Can be automatic or designed � Can be automatic or designed �
Adaptive Language Models Update with new News stories � Update with new News stories � � Update your language model every day Update your language model every day � Update your language model with daily use � Update your language model with daily use � � Using user generated data (if ASR is good) Using user generated data (if ASR is good) �
Combining models � Use “background” model Use “background” model � � General tri General tri- -gram model gram model � � Use specific model Use specific model � � Grammar based Grammar based � � Very localized Very localized � � Combine Combine � � Interpolated (just a weight factor) Interpolated (just a weight factor) � � More elaborate combinations More elaborate combinations � Maximum entropy models Maximum entropy models
Vocabulary size � Command and control Command and control � � < 100 words, grammar based < 100 words, grammar based � � Simple dialog Simple dialog � � < 1000 words, grammar/tri < 1000 words, grammar/tri- -gram gram � � Complex dialog Complex dialog � � < 10K words, tri < 10K words, tri- -gram (some grammar for control) gram (some grammar for control) � � Dictation Dictation � � < 64K words, tri < 64K words, tri- -gram gram � � Broadcast News Broadcast News � � 256K plus, tri 256K plus, tri- -gram (and lots of other possibilities gram (and lots of other possibilities �
Homework 1 Build a speech recognition system � Build a speech recognition system � � An acoustic model An acoustic model � � A pronunciation lexicon A pronunciation lexicon � � A language model A language model � Note it takes time to build � Note it takes time to build � What is your initial WER � What is your initial WER � � How did you improve it How did you improve it � th Sep Submitted by 3:30pm Monday 29 th Sep � Submitted by 3:30pm Monday 29 �
WFSTs
Recommend
More recommend