Speech Technology for Mobile Phones Part I : ASR, and TTS on the - PowerPoint PPT Presentation

Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde rhegde@iitk.ac.in Associate Professor Dept. of EE Indian Institute of Technology Kanpur Several pictures used in this presentation have been collected from various sources available on the web and have been acknowledged in the slides.

Topics Covered • How can speech technology be used for developing applications on a mobile phone ? • What is Automatic speech recognition (ASR) and text to speech synthesis (TTS) ? • What are the challenges in implementing ASR and TTS systems on a mobile phone ? • What are the potential applications of speech technology in delivering personalized services on a mobile phone ?

Broad Objectives of Speech Technology for Machines Speech to Text (ASR) Text to Speech (TTS) Source : Reynolds et. al, Apple developer page

Speech Recognition for Mobile Phones • Speech recognition converts a speech signal, acquired by a mobile phone, to a sequence of words. • The recognition output can be used in command and control, email, search, and communication. • This output can also be used in dialog management and natural language understanding. • What you can do with it : Dictation, Call routing, Directory assistance, Travel planning, and Logistics.

Overview of the Automatic Speech Recognition (ASR) Technology Open Source Tools : HTK and CMU Sphinx Source : Google Image Search

Popular Commercial Applications : Siri, Google Voice Source : Google, Apple

Client and Server Based Speech Recognition on the Mobile Phone Server based Speech Speech Recognition at the Recognition on the Mobile Client Mobile Phone Phone Source : Pearce et. al. ETSI

Speech Recognition in a little bit of detail Source : MIT OC, Reynolds et. al

Speech Recognition on Mobile Phones Source : Rose et. al

ASR Issues on Mobile Phone • Memory Crunching • Computational Complexity • Power Requirement • Floating Point Support

ASR Issues on Mobile Phones : Search Complexity DH DH EH R [word] K AA R “Their Car” = P(“DH”) Source : Slides Krishna et.al, from U Michigan

ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH Source : Slides from Krishna et.al, U Michigan

ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R “Their” DH EH R AX IH AH IY “The” “Ear” [word] Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R “Their” “Car” DH EH R K AA R AX IH AE P “Cap” AH IY T “The” “Ear” “Cat” [word] [word] Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH EH R K AA R AX IH NH AE P AH IY L T N EH F S OY Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH EH R K AA R AX IH NH AE P TH AH IY IY L T SH N EH F S T OY G OW Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity AX V DH EH R [word] K AA R JH ZH GH G SH DH EH R K AA R OW Z AX IH NH AE P TH CH DK IH AH IY IY OW IY L T SH DUH N EH F ER F K S T IH OY G OW Source : Slides from Krishna et. al, U Michigan

SEARCH – Computing Requirements on the Mobile Phone 1. Search • Roughly 50% of total time for Speech Recognition is taken away by search • Even More for Large Vocabulary Recognition • Considerably less for Small vocabulary tasks 2. Solutions • Network optimization • Efficient search techniques • Pruning methods i) Look-ahead based strategy ii) Pruning threshold dependent on the grammar • Multi-pass methods i) A fast first pass to produce a short list of candidates or a lattice, followed by second pass rescoring with larger acoustic and language models Source : Rose et. al

Exploiting Task Constraints on the Mobile Phone Form Filling Example (Rose et, al) • Recognize first and last names independently • Switch between pre compiled grammars • Generate Dynamic grammars

What is Text to Speech Synthesis (TTS) • Process of converting a given text in a specific language to human like speech • Software or Hardware based methods • Software based methods are preferred • Involves Text Analysis, Automatic Phonetization, Dictionary or Rule based synthesis. • Types : Concatenative, Unit Selection, Diphone based, Formant based, Articulatory, and HMM based Synthesis. • What you can do with it : E-Learning, Screen Readers, Audio Books, ATM Banking, Call Centers, Interactive Kiosks

Overview of Text to Speech Speech Synthesis (TTS) Technology Open Source Tool : Festival speech synthesis system from CSTR Source : Google image search, Wikipedia

Cell Phone based Applications using Speech Inputs PSTN Web/Database Server Asterisk Server Network SMS Gateway

Cell Phone based Agriculture Information Systems Crop Advisory Weather Advisory Source : Digital Mandi for the Indian Kisan

Questions rhegde@iitk.ac.in URL : http://202.3.77.107/mips/ ?

Speech Technology for Mobile Phones Part I : ASR, and TTS on the - PowerPoint PPT Presentation

Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde rhegde@iitk.ac.in Associate Professor Dept. of EE Indian Institute of Technology Kanpur Several pictures used in this presentation have been

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Evaluating the Accuracy of Data Collection on Mobile Phones: Collection on Mobile Phones: A

Phones All human speech is composed from 40-50 phones, determined by the configuration of

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Scorekeeping with Smart Phones Scorekeeping with Smart Phones Mobile Solution for Outdoor Team

The Adoption of Network Goods Evidence from the Spread of Mobile Phones in Rwanda Daniel

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Technology in Mobile Phones Part II : Voice based Agriculture Information Systems Rajesh

Financial State of the Club Used the CPI to eliminate inflation (all in 2019 $) Going back

Metaphor Structure of reality built up through embodied interaction Categories created

Donor Recognition For financial contributions received July 1, 2016 - February 28, 2017 2016 -

How to attend a MSURA Virtual Meeting Presenter: Rick Vogt, Vice President, MSU Retirees

Sharing is Caring in the Land of The Long Tail Samy Bengio Real life setting Real problems

Background Unprecedented growth of multimedia data on the Internet. Application: cross-modal

The Gospel of Mark John Chapman September 26, 1774 March 18, 1845 American Evangelist

Safety models & accident models Eric Marsden <eric.marsden@risk-engineering.org> A

Sambuz

Useful Links

Newsletter

Mail Us

Speech Technology for Mobile Phones Part I : ASR, and TTS on the - PowerPoint PPT Presentation

Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde rhegde@iitk.ac.in Associate Professor Dept. of EE Indian Institute of Technology Kanpur Several pictures used in this presentation have been

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Evaluating the Accuracy of Data Collection on Mobile Phones: Collection on Mobile Phones: A

Phones All human speech is composed from 40-50 phones, determined by the configuration of

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Scorekeeping with Smart Phones Scorekeeping with Smart Phones Mobile Solution for Outdoor Team

The Adoption of Network Goods Evidence from the Spread of Mobile Phones in Rwanda Daniel

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Technology in Mobile Phones Part II : Voice based Agriculture Information Systems Rajesh

Financial State of the Club Used the CPI to eliminate inflation (all in 2019 $) Going back

Metaphor Structure of reality built up through embodied interaction Categories created

Donor Recognition For financial contributions received July 1, 2016 - February 28, 2017 2016 -

How to attend a MSURA Virtual Meeting Presenter: Rick Vogt, Vice President, MSU Retirees

Sharing is Caring in the Land of The Long Tail Samy Bengio Real life setting Real problems

Background Unprecedented growth of multimedia data on the Internet. Application: cross-modal

The Gospel of Mark John Chapman September 26, 1774 March 18, 1845 American Evangelist

Safety models &amp; accident models Eric Marsden &lt;eric.marsden@risk-engineering.org&gt; A

Sambuz

Useful Links

Newsletter

Mail Us

Safety models & accident models Eric Marsden <eric.marsden@risk-engineering.org> A