Speech Processing 11-492/18-495 Speech Processing Current Topics - PowerPoint PPT Presentation

Speech Processing 11-492/18-495 Speech Processing Current Topics and Future challenges Commercial and Research

Current and Future Current and Future  What are the hot topics in Speech What are the hot topics in Speech  What currently works What currently works  What could work soon (5-10years) What could work soon (5-10years)  What are the industry hot topics What are the industry hot topics  What are the research challenges What are the research challenges

Spoken Dialog: Now Spoken Dialog: Now  Industry: Industry:  Location based querying Location based querying  On phone: Apple (Siri) On phone: Apple (Siri)  In home: Amazon (Echo) In home: Amazon (Echo)  Smartphones, Tablets: Smartphones, Tablets:  (Owners have money) (Owners have money) – IoT deployment IoT deployment  How do you make money out of this … How do you make money out of this …

Spoken Dialog: Now Spoken Dialog: Now  Research Research  Error recovery Error recovery  Adaptive systems Adaptive systems  Rapid deployment Rapid deployment  Learning dialog structure from data Learning dialog structure from data  Non-task oriented dialog Non-task oriented dialog

ASR: Now ASR: Now  Industry Industry  Adapting cloud ASR per app. Adapting cloud ASR per app.  Broadcast news transcription Broadcast news transcription  Robust speech recognition: Robust speech recognition:  In car, outside, in noisy office, far field In car, outside, in noisy office, far field  LM adaptation from other sources LM adaptation from other sources  Using click through and search queries Using click through and search queries  Pronunciation variants (“wrong” ones too) Pronunciation variants (“wrong” ones too)  Medical transcription Medical transcription

ASR: Now ASR: Now  Research: Research:  Discriminative training Discriminative training  Acoustic parameter projections to discriminate Acoustic parameter projections to discriminate between the correct answers and competitors between the correct answers and competitors  Robust recognition Robust recognition  Far field microphones Far field microphones  Blind source separation Blind source separation  Out of vocabulary words Out of vocabulary words  Unsupervised training Unsupervised training  Deep Learning (Neural Nets) Deep Learning (Neural Nets)  Zero-resource ASR Zero-resource ASR

TTS: Now TTS: Now  Industry Industry  Building custom voices (and your voice) Building custom voices (and your voice)  Multilingual on small devices Multilingual on small devices  E.g. for GPS Navigation over Europe E.g. for GPS Navigation over Europe  Easy methods to build new languages Easy methods to build new languages  Conversational Speech Conversational Speech

TTS: Now TTS: Now  Research Research  Improving neural synthesis Improving neural synthesis • Quality/Resources/Runtime computation Quality/Resources/Runtime computation  Rapid support in new languages Rapid support in new languages  Emotional speech synthesis Emotional speech synthesis  Automatic building of voices from data Automatic building of voices from data  Without any human intervention Without any human intervention  Languages without Orthography Languages without Orthography  Synthesis beyond the sentence Synthesis beyond the sentence  Synthesis with more text analysis Synthesis with more text analysis

Speech to Speech Translation Speech to Speech Translation  Industry Industry  One way systems, domain limited systems One way systems, domain limited systems  Simple targeted cell phone systems Simple targeted cell phone systems  Youtube/Broadcast translation Youtube/Broadcast translation  Skype translation Skype translation  Research Research  Two way systems, large domains Two way systems, large domains  One way lecture/broadcast news One way lecture/broadcast news

VC and SID: Now VC and SID: Now  Voice conversion Voice conversion  Cross Lingual Voice Conversion Cross Lingual Voice Conversion  Emotion/style conversion Emotion/style conversion  Conversion without training data Conversion without training data  Speaker ID Speaker ID  Accuracy on large data sets (> 1000 speakers) Accuracy on large data sets (> 1000 speakers)  Cross channel/language ID Cross channel/language ID  More information in ID (prosody, vocab) More information in ID (prosody, vocab)

CALL: Now CALL: Now  Industry Industry  Pronunciation training Pronunciation training  Scenario practicing Scenario practicing  Research Research  Game based tools Game based tools  Measuring educational contribution Measuring educational contribution

Speech Processing Future Speech Processing Future  Hard challenges (PhD topics and beyond) Hard challenges (PhD topics and beyond)  All on the research side All on the research side  But maybe in Research Labs But maybe in Research Labs

Speech Reco without Speech Speech Reco without Speech  Using other modalities Using other modalities  Lip movement, muscle movement Lip movement, muscle movement  Silent speech Silent speech  No generated audio No generated audio  Just think about the words Just think about the words  Gesture recognition Gesture recognition  Brain Computer Interfaces Brain Computer Interfaces  ASR without text ASR without text  Find “….” in all this audio Find “….” in all this audio

Beyond the Words Beyond the Words  Recognition of more than words Recognition of more than words  Intent, style, emotion Intent, style, emotion  Human-Machine Human-Machine  Frustration, confidence, agreement Frustration, confidence, agreement  Human-Human Human-Human  Rapport, relationships, persuasion Rapport, relationships, persuasion  Truth and lies Truth and lies  Sentiment Sentiment

Conversational Systems Conversational Systems  Participant in a meeting Participant in a meeting  True conversational speech True conversational speech  Appropriate non-word speech generation Appropriate non-word speech generation  Know when to speak, when to laugh, when to listen Know when to speak, when to laugh, when to listen  Appropriate timing conversation Appropriate timing conversation  Able to interrupt when having something to say Able to interrupt when having something to say  Have something to say Have something to say

Summaries and Discussions Summaries and Discussions  Describe a paper/movie/event Describe a paper/movie/event  Appropriate summary Appropriate summary  Allow questions Allow questions  Know when to use style/emotion Know when to use style/emotion  Not just speech<->text Not just speech<->text  Understand more of the text content Understand more of the text content  Answer complex questions Answer complex questions  Engage user and discuss topic Engage user and discuss topic

Speech Processing 11-492/18-495 Speech Processing Current Topics - PowerPoint PPT Presentation

Speech Processing 11-492/18-495 Speech Processing Current Topics and Future challenges Commercial and Research Current and Future Current and Future What are the hot topics in Speech What are the hot topics in Speech What currently

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID What is in the audio

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Fast Strain Limiting Kim, Tae-Yong, Nuttapong Chentanez, and Matthias Mller-Fischer. "Long

Overview of Autonomous Driving Sept 26, 2017 Sahil Narang University of North Carolina, Chapel

Persistent Identification of Instruments Louise Darroch, Alessandro Oggioni, Cristiano Fugazza,

Drupal 8 for site builders About Me Andrey Yurtaev Drupal developer Session plan Review of

DESI https://www.youtube.com/watch? time_continue=191&v=kPXx9tqyzYg Dark Energy

NSP O NSP Open Forum pen Forum NSP Open Forum Q & A with HUD Staff September 10 th , 2013

9.5 .520/6.860: : Statistical Learning Theory ry and Applications Class: Tue, Thu 11:00 -

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Sambuz

Useful Links

Newsletter

Mail Us

Speech Processing 11-492/18-495 Speech Processing Current Topics - PowerPoint PPT Presentation

Speech Processing 11-492/18-495 Speech Processing Current Topics and Future challenges Commercial and Research Current and Future Current and Future What are the hot topics in Speech What are the hot topics in Speech What currently

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID What is in the audio

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Fast Strain Limiting Kim, Tae-Yong, Nuttapong Chentanez, and Matthias Mller-Fischer. &quot;Long

Overview of Autonomous Driving Sept 26, 2017 Sahil Narang University of North Carolina, Chapel

Persistent Identification of Instruments Louise Darroch, Alessandro Oggioni, Cristiano Fugazza,

Drupal 8 for site builders About Me Andrey Yurtaev Drupal developer Session plan Review of

DESI https://www.youtube.com/watch? time_continue=191&amp;v=kPXx9tqyzYg Dark Energy

NSP O NSP Open Forum pen Forum NSP Open Forum Q &amp; A with HUD Staff September 10 th , 2013

9.5 .520/6.860: : Statistical Learning Theory ry and Applications Class: Tue, Thu 11:00 -

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Sambuz

Useful Links

Newsletter

Mail Us

Fast Strain Limiting Kim, Tae-Yong, Nuttapong Chentanez, and Matthias Mller-Fischer. "Long

DESI https://www.youtube.com/watch? time_continue=191&v=kPXx9tqyzYg Dark Energy

NSP O NSP Open Forum pen Forum NSP Open Forum Q & A with HUD Staff September 10 th , 2013