eecs e6870
play

EECS E6870 converting speech to text Speech Recognition automatic - PowerPoint PPT Presentation

What Is Speech Recognition? EECS E6870 converting speech to text Speech Recognition automatic speech recognition (ASR), speech-to-text (STT) what its not Michael Picheny,


  1. ✝✞ ✂ ☛ ✠✡ ✁ � ✄☎ ✆ ✟ What Is Speech Recognition? EECS E6870 ■ converting speech to text Speech Recognition ● automatic speech recognition (ASR), speech-to-text (STT) ■ what it’s not Michael Picheny, Stanley F . Chen, Bhuvana Ramabhadran ● speaker recognition — recognizing who is speaking IBM T.J. Watson Research Center ● natural language understanding — understanding what is being said Yorktown Heights, NY, USA ● speech synthesis — converting text to speech (TTS) { picheny,stanchen,bhuvana } @us.ibm.com 8 September 2009 EECS E6870: Speech Recognition EECS E6870: Speech Recognition 1 Why Is Speech Recognition Important? Why Is Speech Recognition Important? ■ speech is potentially the fastest way people can communicate with machines Ways that people communicate ● natural; requires no specialized training modality method rate (words/min) ● can be used in parallel with other modalities sound speech 150–200 ■ remote speech access is ubiquitous sight sign language; gestures 100–150 touch typing; mousing 60 ● not everyone has Internet; everyone has a phone taste covering self in food < 1 ■ archiving/indexing/compressing/understanding human speech smell not showering < 1 ● e.g. , transcription: legal, medical, TV ● e.g. , transaction: flight information, name dialing ● e.g. , embedded: navigation from the car EECS E6870: Speech Recognition 2 EECS E6870: Speech Recognition 3

  2. ✟ ✁ ✆ ✝✞ ✂ ✄☎ � ☛ ✠✡ This Course Speech Recognition Is Multidisciplinary ■ too much knowledge to fit in one brain ■ cover fundamentals of ASR in depth (weeks 1–9) ● signal processing, machine learning ● linguistics ■ survey state-of-the-art techniques (weeks 10–13) ● computational linguistics, natural language processing ■ force you, the student, to implement key algorithms in C++ ● pattern recognition, artificial intelligence, cognitive science ● C++ is the international language of ASR ■ three lecturers (no TA?) ● Michael Picheny ● Stanley F . Chen ● Bhuvana Ramabhadran ■ from IBM T.J. Watson Research Center, Yorktown Heights, NY ● hotbed of speech recognition research EECS E6870: Speech Recognition 4 EECS E6870: Speech Recognition 5 Meets Here and Now Assignments ■ 1300 Mudd; 4:10-6:40pm Tuesday ■ four programming assignments (80% of grade) ● 5 minute break at 5:25pm ● implement key algorithms for ASR in C++ (best supported) ● some short written questions ■ hardcopy of slides distributed at each lecture ● optional exercises for those with excessive leisure time ● 4 per page ● check, check-plus, check-minus grading ■ final reading project (undecided; 20% of grade) ● choose paper(s) about topic not covered in depth in course; give 15- minute presentation summarizing paper(s) ● programming project ■ weekly readings ● journal/conference articles; book chapters EECS E6870: Speech Recognition 6 EECS E6870: Speech Recognition 7

  3. ✁ ✆ ☛ ✠✡ ✄☎ ✟ ✝✞ ✂ � Course Outline Programming Assignments week topic assigned due ■ C++ (g++ compiler) on x86 PC’s running Linux 1 Introduction; ● knowledge of C++ and Unix helpful 2 Signal processing; DTW lab 1 3 Gaussian mixture models; HMMs ■ extensive code infrastructure in C++ with SWIG to make it accessible from 4 Hidden Markov Models lab 2 lab 1 Java and Python (provided by IBM) 5 Language modeling ● you, the student, only have to write the “fun” parts 6 Pronunciation modeling,Decision lab 3 lab 2 ● by end of course, you will have written key parts of basic large vocabulary Trees continuous speech recognition system 7 LVCSR and finite-state transducers 8 Search lab 4 lab 3 ■ get account on ILAB computer cluster 9 Robustness; Adaptation ● complete the survey 10 Advanced language modeling project lab 4 ■ labs due Wednesday at 6pm 11 Discriminative training, ROVER 12 Spoken Document Retrieval, S2S 13 Project presentations project EECS E6870: Speech Recognition 8 EECS E6870: Speech Recognition 9 Readings How To Contact Us ■ PDF versions of readings will be available on the web site ■ in E-mail, prefix subject line with “EECS E6870:” !!! ■ recommended text (bookstore): ■ Michael Picheny — picheny@us.ibm.com ● Speech Synthesis and Recognition , Holmes, 2nd edition (paperback, 256 pp., 2001, ISBN 0748408576) [ Holmes ] ■ Stanley F . Chen — stanchen@watson.ibm.com ■ reference texts (library, online, bookstore, EE?): ■ Bhuvana Ramabhadran — bhuvana@us.ibm.com ● Fundmentals of Speech Recognition , Rabiner, Juang ● phone: 914-945-2593,914-945-2976 (paperback, 496 pp., 1993, ISBN 0130151572) [ R+J ] ● Speech and Language Processing , Jurafsky, Martin ■ office hours: right after class; or before class by appointment (2nd-Ed, hardcover, 1024 pp., 2008, ISBN 01318732210) [ J+M ] ■ Courseworks ● Statistical Methods for Speech Recognition , Jelinek ● for posting questions about labs (hardcover, 305 pp., 1998, ISBN 0262100665) [ Jelinek ] ● Spoken Language Processing , Huang, Acero, Hon (paperback, 1008 pp., 2001, ISBN 0130226165) [ HAH ] EECS E6870: Speech Recognition 10 EECS E6870: Speech Recognition 11

  4. ✟ ✄☎ ☛ ✠✡ � ✁ ✂ ✝✞ ✆ Web Site Help Us Help You ■ feedback questionnaire after each lecture (2 questions) http://www.ee.columbia.edu/˜stanchen/fall09/e6870/ ● feedback welcome any time ■ syllabus ■ EE’s may find CS parts challenging, and vice versa ■ slides from lectures (PDF) ● online by 8pm the night before each lecture ■ you, the student, are partially responsible for quality of course ■ lab assignments (PDF) ■ together, we can get through this ■ reading assignments (PDF) ■ let’s go! ● online by lecture they are assigned ● password-protected (not working right now) ● username: speech , password: pythonrules EECS E6870: Speech Recognition 12 EECS E6870: Speech Recognition 13 Outline For Rest of Today A Quick Historical Tour 1. a brief history of speech recognition 1. the early years: 1920–1960’s ■ ad hoc methods 2. speech recognition as pattern classification ■ why is speech recognition hard? 2. the birth of modern ASR: 1970–1980’s ■ maturation of statistical methods; basic HMM/GMM framework developed 3. speech production and perception 3. the golden years: 1990’s–now 4. introduction to signal processing ■ more processing power, data ■ variations on a theme; tuning; ■ demand from downstream technologies (search, translation) EECS E6870: Speech Recognition 14 EECS E6870: Speech Recognition 15

  5. ✟ ✝✞ ✆ ✄☎ ✂ ✁ � ✠✡ ☛ The Start of it All The Early Years: 1920–1960’s Ad hoc methods ■ simple signal processing/feature extraction ● detect energy at various frequency bands; or find dominant frequencies ■ many ideas central to modern ASR introduced, but not used all together ● e.g. , statistical training; language modeling ■ small vocabulary ● digits; yes/no; vowels ■ not tested with many speakers (usually < 10) Radio Rex (1920’s) ■ error rates < 10% ■ speaker-independent single-word recognizer (“Rex”) ● triggered if sufficient energy at 500Hz detected (from “e” in “Rex”) EECS E6870: Speech Recognition 16 EECS E6870: Speech Recognition 17 The Turning Point The Turning Point ■ killed ASR research at Bell Labs for many years Whither Speech Recognition? John Pierce, Bell Labs, 1969 ■ partially served as impetus for first (D)ARPA program (1971–1976) funding Speech recognition has glamour. Funds have been available. Results ASR research have been less glamorous . . . ● goal: integrate speech knowledge, linguistics, and AI to make a . . . General-purpose speech recognition seems far away. Special- breakthrough in ASR purpose speech recognition is severely limited. It would seem appropriate ● large vocabulary: 1000 words; artificial syntax for people to ask themselves why they are working in the field and what they can expect to accomplish . . . ● < 60 × “real time” . . . These considerations lead us to believe that a general phonetic typewriter is simply impossible unless the typewriter has an intelligence and a knowledge of language comparable to those of a native speaker of English . . . EECS E6870: Speech Recognition 18 EECS E6870: Speech Recognition 19

Recommend


More recommend