speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Recognition Grammars Other - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Grammars Other ASR techniques But not just acoustics But not all phones are equi-probable Find word sequences that maximizes Using Bayes Law Combine models Us HMMs to


  1. Speech Processing 15-492/18-492 Speech Recognition Grammars Other ASR techniques

  2. But not just acoustics • But not all phones are equi-probable • Find word sequences that maximizes • Using Bayes’ Law • Combine models – Us HMMs to provide – Use language model to provide

  3. Beyond n-grams Tri- -gram languages models gram languages models � Tri � � Good for general ASR Good for general ASR � More targeted models for dialog systems � More targeted models for dialog systems � � Look for more structure Look for more structure �

  4. Formal Language Theory Chomsky Hierarchy � Chomsky Hierarchy � � Finite State Machines Finite State Machines � � Context Free Grammars Context Free Grammars � � Context Sensitive Grammars Context Sensitive Grammars � � Generalized Rewrite Rules/Turing machines Generalized Rewrite Rules/Turing machines � As LM or as Understanding mechanism � As LM or as Understanding mechanism � � Folded into the ASR or only ran on output Folded into the ASR or only ran on output �

  5. Finite State Machines Trigram is a word^2 FSM � Trigram is a word^2 FSM � FSM for greeting � FSM for greeting � Hello Afternoon Good Morning

  6. Finite State Grammar Sentences - -> Start Greeting End > Start Greeting End � Sentences � Greeting - -> “Hello” > “Hello” � Greeting � Greeting - -> “Good” TOD > “Good” TOD � Greeting � TOD - -> Morning > Morning � TOD � TOD - -> Afternoon > Afternoon � TOD �

  7. Context Free Grammar X - -> Y Z > Y Z � X � Y - -> “Terminal” > “Terminal” � Y � Y - -> > NonTerminal NonTerminal NonTerminal NonTerminal � Y �

  8. JSGF Simple grammar formalism for ASR � Simple grammar formalism for ASR � Standard for writing ASR grammars � Standard for writing ASR grammars � Actually finite state � Actually finite state � http://www.w3.org/TR/jsgf � http://www.w3.org/TR/jsgf �

  9. Finite State Machines Finite State Machines: � Finite State Machines: � � Deterministic Deterministic �  Each arc leaving a state has unique label Each arc leaving a state has unique label   There always exists a Deterministic machine There always exists a Deterministic machine  representing a non- -Deterministic one Deterministic one representing a non � Miniminal Miniminal �  There exists an FSM with less (or equal) states that There exists an FSM with less (or equal) states that  accepts the same language accepts the same language

  10. Probabilistic FSMs Each arc has a label and a probability � Each arc has a label and a probability � Collect probabilities from data � Collect probabilities from data � � Can do smoothing like Can do smoothing like ngrams ngrams �

  11. Natural Language Processing Probably mildly context sensitive � Probably mildly context sensitive � � i.e. you need context sensitive rules i.e. you need context sensitive rules � But if we only accept context free � But if we only accept context free � � Probably OK Probably OK � If we only accept finite state � If we only accept finite state � � Probably OK too Probably OK too �

  12. Writing Grammars for Speech � What do people say? What do people say? � � No what do people *really* say! No what do people *really* say! � � Write examples Write examples � � Please, I’d like a flight to Boston Please, I’d like a flight to Boston � � I want to fly to Boston I want to fly to Boston � � What do you have going to Boston What do you have going to Boston � � What about Boston What about Boston � � Boston Boston � � Write rules grouping things together Write rules grouping things together �

  13. Ignore the unimportant things I’m terribly sorry but I would greatly � I’m terribly sorry but I would greatly � appreciate if you might be able to help me appreciate if you might be able to help me flight to Boston . find an acceptable flight to Boston . find an acceptable I, I wanna wanna want to go to want to go to ehm ehm Boston. Boston. � I, I �

  14. What do people really say � A: see who else will somebody else important all the A: see who else will somebody else important all the � {mumble} the whole school are out for a week {mumble} the whole school are out for a week � B: oh really B: oh really � � A: { A: {lipsmack lipsmack} {breath} yeah } {breath} yeah � � B: okay {breath} well when are you going to come up then B: okay {breath} well when are you going to come up then � � A: um let’s see well I guess I I could come up actually A: um let’s see well I guess I I could come up actually � anytime anytime � B: okay well how about now B: okay well how about now � � A: now A: now � � B: yeah B: yeah � � A: have to work tonight A: have to work tonight – –laugh laugh- - �

  15. Class based language models Conflate all words in same class � Conflate all words in same class � � Cities, Names, numbers etc Cities, Names, numbers etc � Can be automatic or designed � Can be automatic or designed �

  16. Adaptive Language Models Update with new News stories � Update with new News stories � � Update your language model every day Update your language model every day � Update your language model with daily use � Update your language model with daily use � � Using user generated data (if ASR is good) Using user generated data (if ASR is good) �

  17. Combining models � Use “background” model Use “background” model � � General tri General tri- -gram model gram model � � Use specific model Use specific model � � Grammar based Grammar based � � Very localized Very localized � � Combine Combine � � Interpolated (just a weight factor) Interpolated (just a weight factor) � � More elaborate combinations More elaborate combinations �  Maximum entropy models Maximum entropy models 

  18. Vocabulary size � Command and control Command and control � � < 100 words, grammar based < 100 words, grammar based � � Simple dialog Simple dialog � � < 1000 words, grammar/tri < 1000 words, grammar/tri- -gram gram � � Complex dialog Complex dialog � � < 10K words, tri < 10K words, tri- -gram (some grammar for control) gram (some grammar for control) � � Dictation Dictation � � < 64K words, tri < 64K words, tri- -gram gram � � Broadcast News Broadcast News � � 256K plus, tri 256K plus, tri- -gram (and lots of other possibilities gram (and lots of other possibilities �

  19. Homework 1 Build a speech recognition system � Build a speech recognition system � � An acoustic model An acoustic model � � A pronunciation lexicon A pronunciation lexicon � � A language model A language model � Note it takes time to build � Note it takes time to build � What is your initial WER � What is your initial WER � � How did you improve it How did you improve it � th Sep Submitted by 3:30pm Monday 29 th Sep � Submitted by 3:30pm Monday 29 �

  20. WFSTs

Recommend


More recommend