speech processing 11 492 18 492 speech processing 11 492
play

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 - PowerPoint PPT Presentation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars Other ASR techniques But not just acoustics But not just acoustics But not all phones are equi-probable Find word sequences that maximizes


  1. Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars Other ASR techniques

  2. But not just acoustics But not just acoustics • But not all phones are equi-probable • Find word sequences that maximizes • Using Bayes’ Law • Combine models – Us HMMs to provide – Use language model to provide

  3. Beyond n-grams Beyond n-grams  Tri-gram languages models Tri-gram languages models  Good for general ASR Good for general ASR  More targeted models for dialog systems More targeted models for dialog systems  Look for more structure Look for more structure

  4. Formal Language Theory Formal Language Theory  Chomsky Hierarchy Chomsky Hierarchy  Finite State Machines Finite State Machines  Context Free Grammars Context Free Grammars  Context Sensitive Grammars Context Sensitive Grammars  Generalized Rewrite Rules/Turing machines Generalized Rewrite Rules/Turing machines  As LM or as Understanding mechanism As LM or as Understanding mechanism  Folded into the ASR or only ran on output Folded into the ASR or only ran on output

  5. Finite State Machines Finite State Machines  Trigram is a word^2 FSM Trigram is a word^2 FSM  FSM for greeting FSM for greeting Hello Afternoon Good Morning

  6. Finite State Grammar Finite State Grammar  Sentences -> Start Greeting End Sentences -> Start Greeting End  Greeting -> “Hello” Greeting -> “Hello”  Greeting -> “Good” TOD Greeting -> “Good” TOD  TOD -> Morning TOD -> Morning  TOD -> Afternoon TOD -> Afternoon

  7. Context Free Grammar Context Free Grammar  X -> Y Z X -> Y Z  Y -> “Terminal” Y -> “Terminal”  Y -> NonTerminal NonTerminal Y -> NonTerminal NonTerminal

  8. JSGF JSGF  Simple grammar formalism for ASR Simple grammar formalism for ASR  Standard for writing ASR grammars Standard for writing ASR grammars  Actually finite state Actually finite state  http://www.w3.org/TR/jsgf http://www.w3.org/TR/jsgf

  9. Finite State Machines Finite State Machines  Finite State Machines: Finite State Machines:  Deterministic Deterministic  Each arc leaving a state has unique label Each arc leaving a state has unique label  There always exists a Deterministic machine There always exists a Deterministic machine representing a non-Deterministic one representing a non-Deterministic one  Minimal Minimal  There exists an FSM with less (or equal) states that There exists an FSM with less (or equal) states that accepts the same language accepts the same language

  10. Probabilistic FSMs Probabilistic FSMs  Each arc has a label and a probability Each arc has a label and a probability  Collect probabilities from data Collect probabilities from data  Can do smoothing like ngrams Can do smoothing like ngrams

  11. Natural Language Processing Natural Language Processing  Probably mildly context sensitive Probably mildly context sensitive  i.e. you need context sensitive rules i.e. you need context sensitive rules  But if we only accept context free But if we only accept context free  Probably OK Probably OK  If we only accept finite state If we only accept finite state  Probably OK too Probably OK too

  12. Writing Grammars for Speech Writing Grammars for Speech  What do people say? What do people say?  No what do people *really* say! No what do people *really* say!  Write examples Write examples  Please, I’d like a flight to Boston Please, I’d like a flight to Boston  I want to fly to Boston I want to fly to Boston  What do you have going to Boston What do you have going to Boston  What about Boston What about Boston  Boston Boston  Write rules grouping things together Write rules grouping things together

  13. Ignore the unimportant things Ignore the unimportant things  I’m terribly sorry but I would greatly I’m terribly sorry but I would greatly appreciate if you might be able to help me appreciate if you might be able to help me find an acceptable flight to Boston flight to Boston . . find an acceptable  I, I wanna want to go to ehm Boston. I, I wanna want to go to ehm Boston.

  14. What do people really say What do people really say  A: see who else will somebody else important all the A: see who else will somebody else important all the {mumble} the whole school are out for a week {mumble} the whole school are out for a week  B: oh really B: oh really  A: {lipsmack} {breath} yeah A: {lipsmack} {breath} yeah  B: okay {breath} well when are you going to come up then B: okay {breath} well when are you going to come up then  A: um let’s see well I guess I I could come up actually A: um let’s see well I guess I I could come up actually anytime anytime  B: okay well how about now B: okay well how about now  A: now A: now  B: yeah B: yeah  A: have to work tonight –laugh- A: have to work tonight –laugh-

  15. Class based language models Class based language models  Conflate all words in same class Conflate all words in same class  Cities, Names, numbers etc Cities, Names, numbers etc  Can be automatic or designed Can be automatic or designed

  16. Adaptive Language Models Adaptive Language Models  Update with new News stories Update with new News stories  Update your language model every day Update your language model every day  Update your language model with daily use Update your language model with daily use  Using user generated data (if ASR is good) Using user generated data (if ASR is good)

  17. Combining models Combining models  Use “background” model Use “background” model  General tri-gram/neural model General tri-gram/neural model  Use specific model Use specific model  Grammar based Grammar based  Very localized Very localized  Combine Combine  Interpolated (just a weight factor) Interpolated (just a weight factor)  More elaborate combinations More elaborate combinations  Maximum entropy models Maximum entropy models

  18. Vocabulary size Vocabulary size  Command and control Command and control  < 100 words, grammar based < 100 words, grammar based  Simple dialog Simple dialog  < 1000 words, grammar/tri-gram < 1000 words, grammar/tri-gram  Complex dialog Complex dialog  < 10K words, tri-gram (some grammar for control) < 10K words, tri-gram (some grammar for control)  Dictation Dictation  < 64K words, tri-gram < 64K words, tri-gram  Broadcast News Broadcast News  256K plus, tri-gram/neural (and lots of other possibilities) 256K plus, tri-gram/neural (and lots of other possibilities)

  19. Homework 1 Homework 1  Build a speech recognition system Build a speech recognition system  An acoustic model An acoustic model  A pronunciation lexicon A pronunciation lexicon  A language model A language model  Note it takes time to build Note it takes time to build  What is your initial WER What is your initial WER  How did you improve it How did you improve it  Two stages: Two stages:  Fri 25 Fri 25 th Sep 3:30pm install and run all software th Sep 3:30pm install and run all software  Fri 2 Fri 2 nd Oct 3:30pm final submission nd Oct 3:30pm final submission

Recommend


More recommend