Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde rhegde@iitk.ac.in Associate Professor Dept. of EE Indian Institute of Technology Kanpur Several pictures used in this presentation have been collected from various sources available on the web and have been acknowledged in the slides.
Topics Covered • How can speech technology be used for developing applications on a mobile phone ? • What is Automatic speech recognition (ASR) and text to speech synthesis (TTS) ? • What are the challenges in implementing ASR and TTS systems on a mobile phone ? • What are the potential applications of speech technology in delivering personalized services on a mobile phone ?
Broad Objectives of Speech Technology for Machines Speech to Text (ASR) Text to Speech (TTS) Source : Reynolds et. al, Apple developer page
Speech Recognition for Mobile Phones • Speech recognition converts a speech signal, acquired by a mobile phone, to a sequence of words. • The recognition output can be used in command and control, email, search, and communication. • This output can also be used in dialog management and natural language understanding. • What you can do with it : Dictation, Call routing, Directory assistance, Travel planning, and Logistics.
Overview of the Automatic Speech Recognition (ASR) Technology Open Source Tools : HTK and CMU Sphinx Source : Google Image Search
Popular Commercial Applications : Siri, Google Voice Source : Google, Apple
Client and Server Based Speech Recognition on the Mobile Phone Server based Speech Speech Recognition at the Recognition on the Mobile Client Mobile Phone Phone Source : Pearce et. al. ETSI
Speech Recognition in a little bit of detail Source : MIT OC, Reynolds et. al
Speech Recognition on Mobile Phones Source : Rose et. al
ASR Issues on Mobile Phone • Memory Crunching • Computational Complexity • Power Requirement • Floating Point Support
ASR Issues on Mobile Phones : Search Complexity DH DH EH R [word] K AA R “Their Car” = P(“DH”) Source : Slides Krishna et.al, from U Michigan
ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH Source : Slides from Krishna et.al, U Michigan
ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH Source : Slides from Krishna et. al, U Michigan
ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R “Their” DH EH R AX IH AH IY “The” “Ear” [word] Source : Slides from Krishna et. al, U Michigan
ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R “Their” DH EH R AX IH AH IY “The” “Ear” [word] Source : Slides from Krishna et. al, U Michigan
ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R “Their” “Car” DH EH R K AA R AX IH AE P “Cap” AH IY T “The” “Ear” “Cat” [word] [word] Source : Slides from Krishna et. al, U Michigan
ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH EH R K AA R AX IH NH AE P AH IY L T N EH F S OY Source : Slides from Krishna et. al, U Michigan
ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH EH R K AA R AX IH NH AE P TH AH IY IY L T SH N EH F S T OY G OW Source : Slides from Krishna et. al, U Michigan
ASR Issues on Mobile Phones : Search Complexity AX V DH EH R [word] K AA R JH ZH GH G SH DH EH R K AA R OW Z AX IH NH AE P TH CH DK IH AH IY IY OW IY L T SH DUH N EH F ER F K S T IH OY G OW Source : Slides from Krishna et. al, U Michigan
SEARCH – Computing Requirements on the Mobile Phone 1. Search • Roughly 50% of total time for Speech Recognition is taken away by search • Even More for Large Vocabulary Recognition • Considerably less for Small vocabulary tasks 2. Solutions • Network optimization • Efficient search techniques • Pruning methods i) Look-ahead based strategy ii) Pruning threshold dependent on the grammar • Multi-pass methods i) A fast first pass to produce a short list of candidates or a lattice, followed by second pass rescoring with larger acoustic and language models Source : Rose et. al
Exploiting Task Constraints on the Mobile Phone Form Filling Example (Rose et, al) • Recognize first and last names independently • Switch between pre compiled grammars • Generate Dynamic grammars
What is Text to Speech Synthesis (TTS) • Process of converting a given text in a specific language to human like speech • Software or Hardware based methods • Software based methods are preferred • Involves Text Analysis, Automatic Phonetization, Dictionary or Rule based synthesis. • Types : Concatenative, Unit Selection, Diphone based, Formant based, Articulatory, and HMM based Synthesis. • What you can do with it : E-Learning, Screen Readers, Audio Books, ATM Banking, Call Centers, Interactive Kiosks
Overview of Text to Speech Speech Synthesis (TTS) Technology Open Source Tool : Festival speech synthesis system from CSTR Source : Google image search, Wikipedia
Cell Phone based Applications using Speech Inputs PSTN Web/Database Server Asterisk Server Network SMS Gateway
Cell Phone based Agriculture Information Systems Crop Advisory Weather Advisory Source : Digital Mandi for the Indian Kisan
Questions rhegde@iitk.ac.in URL : http://202.3.77.107/mips/ ?
Recommend
More recommend