How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL Spoken Language Systems March 6, 2007 1
Speech Recognition Today Dictation Transcribe spoken words to text Support punctuation and correction Dragon NaturallySpeaking (2004) Interactive Voice Response System-initiated dialog Saturday Night Live Mock (2005) 6.Insight - How to Wreck a Nice Beach 2
Theory 6.Insight - How to Wreck a Nice Beach 3
Speech Recognition Overview Acoustic Lexical Language Models Models Models Speech Recognized Signal Words Representation Search a r z m - t 0 t 3 t 1 t 2 t 4 t 5 t 6 t 7 t 8 Time 6.Insight - How to Wreck a Nice Beach 4
Speech Signal Processing Speech Spectrum 40 20 Energy (dB) 0 9 Davis Square, Somerville -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) MFCC Features (C 0 - C 12 ) 50 0 -50 -100 0 20 40 60 80 100 120 140 160 6.Insight - How to Wreck a Nice Beach 5 Frame (1 sec = 100 frames)
Acoustic Modeling Techniques Pattern match Dim reduction Challenges Lots of overlap Data annotation Speaker / Accent Noise 6.Insight - How to Wreck a Nice Beach 6
Lexical Modeling a (ax | ey) Techniques ● ● ● beach b iy ch Dictionary ● ● ● Pron generation nice n (iy | ay) s ● ● ● recognize r eh k ax gd n ay z Challenges ● ● ● speech s p- iy ch Missing words ● ● ● stata s t- (ey | aa) tf ax 6.Insight ● ● ● tomato t ax m (ey | aa) tf ow Pron variation ● ● ● wreck r eh kd Nice ● ● ● Stata 6.Insight - How to Wreck a Nice Beach 7
Language Modeling Purpose Constrain word order Assign probability 1. recognize speech 2. wreck a nice beach Techniques 0.036 the ● ● ● Context-free grammar 0.011 a good 0.026 a 0.003 a morning N-gram 0.018 of ● ● ● 0.086 good morning ● ● ● Challenges 0.007 good 0.026 good day ● ● ● Data sparsity 0.005 day ● ● ● 0.149 of a ● ● ● Domain adaptation 0.003 morning 0.057 of day ● ● ● ● ● ● 6.Insight - How to Wreck a Nice Beach 8
Search Techniques a Dyn programming A * search backtrace r Lexical Nodes Pruning z m Challenges - Huge search space t 0 t 2 t 3 t 5 t 6 t 7 t 1 t 4 t 8 Time - m a r z - 6.Insight - How to Wreck a Nice Beach 9
Practice 6.Insight - How to Wreck a Nice Beach 10
Command & Control Microsoft Windows Vista Speech Recognition Features Control PC apps Dictate documents Accessibility Challenges Constrained cmds User training 6.Insight - How to Wreck a Nice Beach 11
Interactive Dialog Systems SLS City Browser http://web.sls.csail.mit.edu/city/ Features Restaurants, POI Free-form dialog Query refinement Multimodal control Challenges Labor intensive Data collection 6.Insight - How to Wreck a Nice Beach 12
Audio Indexing & Search SLS Lecture Browser http://web.sls.csail.mit.edu/lectures/ Features Keyword search Topic segmentation Lecture transcript A/V navigation Challenges Disfluencies Jargons 6.Insight - How to Wreck a Nice Beach 13
Mobile Speech Recognition SLS Pocket SUMMIT Speech Recognizer Features Small-footprint Low CPU/memory Challenges Noise robustness Limited grammar 6.Insight - How to Wreck a Nice Beach 14
Challenges 6.Insight - How to Wreck a Nice Beach 15
Noise Robustness Microphone quality Close-Talking Headset Bluetooth Headset Mounted GPS Environmental/Background noise Music Babble Heating Vent 6.Insight - How to Wreck a Nice Beach 16
Adaptation Speaker Adaptation Gender Accent Domain Adaptation GPS navigation Lecture transcription 6.Insight - How to Wreck a Nice Beach 17
Application Diversity Labor Intensive Few Applications Weather Flight Reservation Restaurants Can the system automatically generate spoken dialogue systems via user feedback? 6.Insight - How to Wreck a Nice Beach 18
Resources 6.Insight - How to Wreck a Nice Beach 19
Related Courses Machine Learning Linguistics Natural Lang Proc 6.825, 6.867 24.901 6.864 Acoustic Lexical Language Models Models Models Speech Recognized Signal Words Representation Search Signal Processing Algorithms 6.003, 6.011, 6.341 6.034, 6.046, 6.851 Acoustic Phonetics Speech Recognition 6.541, 6.543, 6.551, 6.552 6.345 6.Insight - How to Wreck a Nice Beach 20
Research Groups @ MIT CSAIL Spoken Language Systems Group PIs – James Glass, Stephanie Seneff, Victor Zue Research – Speech recognition and dialog systems http://groups.csail.mit.edu/sls/ RLE Speech Communications Group PIs – Kenneth Stevens, Stefanie Shattuck-Hufnagel Research – Speech production and perception http://www.rle.mit.edu/speech/ 6.Insight - How to Wreck a Nice Beach 21
External Opportunities Companies & Research Labs (alphabetical order) AT&T BBN Google IBM Microsoft Nuance SRI VoiceSignal Technology Yahoo … 6.Insight - How to Wreck a Nice Beach 22
Conclusion To wreck a nice beach, you need: Shovel Bulldozer … Questions? Paul Hsu bohsu@mit.edu 32-G442 6.Insight - How to Wreck a Nice Beach 23
Recommend
More recommend