Speech Processing 15-492/18-492 Using Speech with Computers Alan W - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Using Speech with Computers Alan W Black August 2008

Overview Practical and Theory: � Practical and Theory: � � Understand concepts, Implement Solutions Understand concepts, Implement Solutions � Speech Recognition � Speech Recognition � � Speech to text Speech to text � Speech Synthesis � Speech Synthesis � � Text to Speech Text to Speech � Spoken Dialog Systems � Spoken Dialog Systems � � Interaction with machines Interaction with machines �

Course Schedule MWF 3:30- -4:20 4:20 � MWF 3:30 � DH 1117 � DH 1117 � Lecturer: Alan W Black (awb@cs.cmu.edu awb@cs.cmu.edu) ) � Lecturer: Alan W Black ( � TA: David Huggins (dhuggins@cs.cmu.edu dhuggins@cs.cmu.edu) ) � TA: David Huggins ( � http://www.speech.cs.cmu.edu/15- -492/ 492/ � http://www.speech.cs.cmu.edu/15 �

Course Details Three lectures a week � Three lectures a week � 4 Homeworks Homeworks � 4 � � Speech Recognition Speech Recognition � � Speech Synthesis Speech Synthesis � � Spoken Dialog System Spoken Dialog System � � Other Other � Final Exam � Final Exam �

Homeworks (Mostly) Practical � (Mostly) Practical � � Build something that talks/can be spoken to Build something that talks/can be spoken to � � Software and speech data will be provided Software and speech data will be provided �  Will run on Windows/Linux or OSX Will run on Windows/Linux or OSX   Access to Linux servers if required Access to Linux servers if required  � Written description of what you did Written description of what you did �

Schedule Details th ) Week 1 (Aug 15 th ) � Week 1 (Aug 15 � � Applications, Human and Computer Speech Applications, Human and Computer Speech � Processing Processing rd ) Speech Recognition 4 (Sep 3 rd Week 2- -4 (Sep 3 ) Speech Recognition � Week 2 � � Signal representation, acoustic modeling Signal representation, acoustic modeling � � Language modeling, applications Language modeling, applications � � Tuning, evaluation, expectations Tuning, evaluation, expectations �

Course Details nd Sep) Speech Synthesis Week 5- -7 (22 7 (22 nd Sep) Speech Synthesis � Week 5 � � Text processing, prosody, waveform synthesis Text processing, prosody, waveform synthesis � � Building voices, evaluations, voice conversion Building voices, evaluations, voice conversion � th Oct) Week 8 (13 th Oct) Multilinguality Multilinguality � Week 8 (13 � � Supporting new languages efficiently Supporting new languages efficiently � th Oct) Dialog Systems 11 (20 th Week 9- -11 (20 Oct) Dialog Systems � Week 9 � � VoiceXML VoiceXML, Mixed initiative, barge , Mixed initiative, barge- -in in � � Design, installation and tuning. Design, installation and tuning. �

Course Details th Nov) Week 12 (10 th Nov) � Week 12 (10 � � Speech to Speech translation Speech to Speech translation � � Language support, tight integration Language support, tight integration � th Nov) Week 13 (17 th Nov) � Week 13 (17 � � Evaluation and expectations Evaluation and expectations � th ) Week 14 (24 th ) � Week 14 (24 � � Speaker ID, Silent Speech, Conversion Speaker ID, Silent Speech, Conversion � � What still needs to be done. What still needs to be done. � st Dec) Week 15 (1 st Dec) � Week 15 (1 � � Exam Exam �

Why Speech Most natural way to communicate � Most natural way to communicate � � (For Humans) (For Humans) � Not ideal for everything � Not ideal for everything � � Graphics and text can be better (sometimes) Graphics and text can be better (sometimes) � Doesn’t compress well � Doesn’t compress well � Hard to search � Hard to search �

Compression Alice in Wonderland � Alice in Wonderland � � Text Text �  150K uncompressed 150K uncompressed   43K compressed 43K compressed  � Speech (2hrs 20mins) Speech (2hrs 20mins) �  270M uncompressed 270M uncompressed   600K compressed (mp3, 24KBS) 600K compressed (mp3, 24KBS) 

Searching Find all NPR broadcasts mentioning Obama Obama � Find all NPR broadcasts mentioning � � Listen to them all Listen to them all � From lecture recordings � From lecture recordings � � Find all occurrences of “this will be in the exam” Find all occurrences of “this will be in the exam” � So listen to it faster … � So listen to it faster … � � Normal 2x speed Normal 2x speed � � 2x 4x 8x 2x 4x 8x �

Eyes/Hands Free � Interaction when driving Interaction when driving � � Look at screen to see next turnoff Look at screen to see next turnoff � � “In 200 yards turn right onto Murray Ave.” “In 200 yards turn right onto Murray Ave.” � � Blind users/ Assistive technology Blind users/ Assistive technology � � Text isn’t very useful Text isn’t very useful � � Alerts Alerts � � “Will self “Will self- -destruct in 10 seconds” destruct in 10 seconds” vs vs � � blinking light blinking light � � Telephone dialog systems Telephone dialog systems �

Speech Applications � Command and Control Command and Control � � Information Agents Information Agents � � Speech to Speech Translation Speech to Speech Translation � � Speech summarization Speech summarization � � Lecture or Meeting summarization Lecture or Meeting summarization � � Transcription/Dictation Transcription/Dictation � � Speaker Identification Speaker Identification � � emotion/dialect/language emotion/dialect/language � � Language Learning Language Learning �

“Hot” Commercial Applications Location- -based services: based services: � Location � � Yahoo GO Yahoo GO � � Google Google Maps Maps � � Microsoft Live Search Microsoft Live Search � All phone/pda pda based based � All phone/ � � Use speech Use speech- -in in � � Directions speech Directions speech- -out out �

Other Speech uses Spoken Dialog Systems Spoken Dialog Systems - - Let’s Go Public 412 268 3526 evenings 412 442 2000 Let’s Go Public 412 268 3526 evenings 412 442 2000 - - Pittsburgh bus timetables by phone Pittsburgh bus timetables by phone - - Assistive Technologies Assistive Technologies - - Screen readers Screen readers - - Augmentitive and assistive communication devices and assistive communication devices Augmentitive - - On- -line Personalization line Personalization On - - Blogcasts (your voice, or appropriate voice) Blogcasts (your voice, or appropriate voice) - - Game character customization Game character customization - - Talking Heads Talking Heads - - CMU’s roboceptionist roboceptionist CMU’s - - Singing Synthesis Singing Synthesis - - XML interface for song specification XML interface for song specification - -

Speech Processing 15-492/18-492 Using Speech with Computers Alan W - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Using Speech with Computers Alan W Black August 2008 Overview Practical and Theory: Practical and Theory: Understand concepts, Implement Solutions Understand concepts, Implement Solutions

Speech Processing for Speech Processing for Unwritten Languages Unwritten Languages Alan W

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Speech Processing 11-492/18-492 Speech Synthesis Signal Processing Signal Manipulation

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Signal Representations Part 2: Speech Signal Processing Hsin-min Wang References: 1 X.

Speech Processing 15-492/18-492 Speech Recognition Grammars Other ASR techniques But not just

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Cepstral analysis in speech processing From speech production model, we have: s[n] = (p[n]*g[n] +

Human Speech Hermansky Spring 2020 EN.520.680 Speech and Auditory Processing by Humans and

Unsupervised speech processing using acoustic word embeddings Herman Kamper School of

SYNTAX PROCESSING Statistical Natural Language Processing 23.04.19 1 Syntax, Grammars, Parsing

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

End-to-End Speech Processing: From Pipeline to Integrated Architecture Shinji Watanabe Center

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Microphone Array Processing for Distant Speech Recognition From close-talking microphones to

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

Workshop on the Role of Speech in Developing Robust Speech Processing Applications May 7-8, 2015

KALDI GPU ACCELERATION GTC - March 2019 1) Brief introduction to speech processing 2) What we

FINITE STATE MORPHOLOGY 24.05.19 Statistical Natural Language Processing 1 Morphology with FSAs

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

Speech Processing 15-492/18-492 Using Speech with Computers Alan W - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Using Speech with Computers Alan W Black August 2008 Overview Practical and Theory: Practical and Theory: Understand concepts, Implement Solutions Understand concepts, Implement Solutions

Speech Processing for Speech Processing for Unwritten Languages Unwritten Languages Alan W

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Speech Processing 11-492/18-492 Speech Synthesis Signal Processing Signal Manipulation

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Signal Representations Part 2: Speech Signal Processing Hsin-min Wang References: 1 X.

Speech Processing 15-492/18-492 Speech Recognition Grammars Other ASR techniques But not just

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Cepstral analysis in speech processing From speech production model, we have: s[n] = (p[n]*g[n] +

Human Speech Hermansky Spring 2020 EN.520.680 Speech and Auditory Processing by Humans and

Unsupervised speech processing using acoustic word embeddings Herman Kamper School of

SYNTAX PROCESSING Statistical Natural Language Processing 23.04.19 1 Syntax, Grammars, Parsing

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

End-to-End Speech Processing: From Pipeline to Integrated Architecture Shinji Watanabe Center

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Microphone Array Processing for Distant Speech Recognition From close-talking microphones to

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 5: Speech modeling and

Workshop on the Role of Speech in Developing Robust Speech Processing Applications May 7-8, 2015

KALDI GPU ACCELERATION GTC - March 2019 1) Brief introduction to speech processing 2) What we

FINITE STATE MORPHOLOGY 24.05.19 Statistical Natural Language Processing 1 Morphology with FSAs

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and