how to wreck a nice beach
play

How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL - PowerPoint PPT Presentation

How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL Spoken Language Systems March 6, 2007 1 Speech Recognition Today Dictation Transcribe spoken words to text Support punctuation and correction Dragon NaturallySpeaking


  1. How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL Spoken Language Systems March 6, 2007 1

  2. Speech Recognition Today  Dictation  Transcribe spoken words to text  Support punctuation and correction  Dragon NaturallySpeaking (2004)  Interactive Voice Response  System-initiated dialog  Saturday Night Live Mock (2005) 6.Insight - How to Wreck a Nice Beach 2

  3. Theory 6.Insight - How to Wreck a Nice Beach 3

  4. Speech Recognition Overview Acoustic Lexical Language Models Models Models Speech Recognized Signal Words Representation Search a r z m - t 0 t 3 t 1 t 2 t 4 t 5 t 6 t 7 t 8 Time 6.Insight - How to Wreck a Nice Beach 4

  5. Speech Signal Processing Speech Spectrum 40 20 Energy (dB) 0 9 Davis Square, Somerville -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) MFCC Features (C 0 - C 12 ) 50 0 -50 -100 0 20 40 60 80 100 120 140 160 6.Insight - How to Wreck a Nice Beach 5 Frame (1 sec = 100 frames)

  6. Acoustic Modeling Techniques  Pattern match  Dim reduction Challenges  Lots of overlap  Data annotation  Speaker / Accent  Noise 6.Insight - How to Wreck a Nice Beach 6

  7. Lexical Modeling a (ax | ey) Techniques ● ● ● beach b iy ch  Dictionary ● ● ●  Pron generation nice n (iy | ay) s ● ● ● recognize r eh k ax gd n ay z Challenges ● ● ● speech s p- iy ch  Missing words ● ● ● stata s t- (ey | aa) tf ax  6.Insight ● ● ● tomato t ax m (ey | aa) tf ow  Pron variation ● ● ● wreck r eh kd  Nice ● ● ●  Stata 6.Insight - How to Wreck a Nice Beach 7

  8. Language Modeling Purpose Constrain word order  Assign probability  1. recognize speech 2. wreck a nice beach Techniques 0.036 the ● ● ● Context-free grammar 0.011 a good  0.026 a 0.003 a morning N-gram 0.018 of  ● ● ● 0.086 good morning ● ● ● Challenges 0.007 good 0.026 good day ● ● ● Data sparsity 0.005 day ● ● ●  0.149 of a ● ● ● Domain adaptation 0.003 morning  0.057 of day ● ● ● ● ● ● 6.Insight - How to Wreck a Nice Beach 8

  9. Search Techniques a  Dyn programming  A * search backtrace r Lexical Nodes  Pruning z m Challenges -  Huge search space t 0 t 2 t 3 t 5 t 6 t 7 t 1 t 4 t 8 Time - m a r z - 6.Insight - How to Wreck a Nice Beach 9

  10. Practice 6.Insight - How to Wreck a Nice Beach 10

  11. Command & Control Microsoft Windows Vista Speech Recognition Features  Control PC apps  Dictate documents  Accessibility Challenges  Constrained cmds  User training 6.Insight - How to Wreck a Nice Beach 11

  12. Interactive Dialog Systems SLS City Browser http://web.sls.csail.mit.edu/city/ Features  Restaurants, POI  Free-form dialog  Query refinement  Multimodal control Challenges  Labor intensive  Data collection 6.Insight - How to Wreck a Nice Beach 12

  13. Audio Indexing & Search SLS Lecture Browser http://web.sls.csail.mit.edu/lectures/ Features  Keyword search  Topic segmentation  Lecture transcript  A/V navigation Challenges  Disfluencies  Jargons 6.Insight - How to Wreck a Nice Beach 13

  14. Mobile Speech Recognition SLS Pocket SUMMIT Speech Recognizer Features  Small-footprint  Low CPU/memory Challenges  Noise robustness  Limited grammar 6.Insight - How to Wreck a Nice Beach 14

  15. Challenges 6.Insight - How to Wreck a Nice Beach 15

  16. Noise Robustness Microphone quality  Close-Talking Headset  Bluetooth Headset  Mounted GPS Environmental/Background noise  Music  Babble  Heating Vent 6.Insight - How to Wreck a Nice Beach 16

  17. Adaptation Speaker Adaptation  Gender  Accent Domain Adaptation  GPS navigation  Lecture transcription 6.Insight - How to Wreck a Nice Beach 17

  18. Application Diversity Labor Intensive  Few Applications  Weather  Flight Reservation  Restaurants Can the system automatically generate spoken dialogue systems via user feedback? 6.Insight - How to Wreck a Nice Beach 18

  19. Resources 6.Insight - How to Wreck a Nice Beach 19

  20. Related Courses Machine Learning Linguistics Natural Lang Proc 6.825, 6.867 24.901 6.864    Acoustic Lexical Language Models Models Models Speech Recognized Signal Words Representation Search Signal Processing Algorithms 6.003, 6.011, 6.341 6.034, 6.046, 6.851   Acoustic Phonetics Speech Recognition 6.541, 6.543, 6.551, 6.552 6.345   6.Insight - How to Wreck a Nice Beach 20

  21. Research Groups @ MIT CSAIL Spoken Language Systems Group  PIs – James Glass, Stephanie Seneff, Victor Zue  Research – Speech recognition and dialog systems  http://groups.csail.mit.edu/sls/ RLE Speech Communications Group  PIs – Kenneth Stevens, Stefanie Shattuck-Hufnagel  Research – Speech production and perception  http://www.rle.mit.edu/speech/ 6.Insight - How to Wreck a Nice Beach 21

  22. External Opportunities Companies & Research Labs (alphabetical order)  AT&T  BBN  Google  IBM  Microsoft  Nuance  SRI  VoiceSignal Technology  Yahoo  … 6.Insight - How to Wreck a Nice Beach 22

  23. Conclusion To wreck a nice beach, you need: Shovel  Bulldozer  …  Questions? Paul Hsu bohsu@mit.edu 32-G442 6.Insight - How to Wreck a Nice Beach 23

Recommend


More recommend