Speech Processing 15-492/18-492 Speech Synthesis Building Voices

Building a Voice Designing the Prompts � Designing the Prompts � Recording the Prompts � Recording the Prompts � Labeling the Utterances � Labeling the Utterances � Finding parameters (F0, MCEP) � Finding parameters (F0, MCEP) � Building the synthesis voice � Building the synthesis voice � Tuning and Testing � Tuning and Testing �

Software Requirements Festival Speech Synthesizer � Festival Speech Synthesizer � � Free software language independent Free software language independent � synthesizer synthesizer � Multiplatform: Windows, Linux, OSX Multiplatform: Windows, Linux, OSX � � Used for research and commercial synthesis Used for research and commercial synthesis � Festvox � Festvox � � Voice building tools Voice building tools � � Scripts, instructions, example databases Scripts, instructions, example databases � � Used for over 40 different languages Used for over 40 different languages �

Festival Speech Synthesis After Installation � After Installation � festival – –tts tts stuff.txt stuff.txt � festival � festival � festival � festival> (SayText SayText “hello world”) “hello world”) � festival> ( �

Building Synthetic Voices http://festvox.org/bsv � http://festvox.org/bsv � � Look at section on “Telling the Time” Look at section on “Telling the Time” �

Automatic Labeling

Automatic Labeling (bad)

Parameterization Extract pitch marks from data � Extract pitch marks from data � � Find voices/unvoiced regions Find voices/unvoiced regions � � Add “fake” pitch marks during unvoiced regions Add “fake” pitch marks during unvoiced regions � Extract MFCC pitch synchronously � Extract MFCC pitch synchronously � � Instead of a fixed frame advance (e.g. 5ms) Instead of a fixed frame advance (e.g. 5ms) � � Extract it at each pitch mark Extract it at each pitch mark � � Try to capture the spectrum at the pitch period Try to capture the spectrum at the pitch period �

Pitchmarks

Building a LDOM synthesizer Build cluster tree on each unit type � Build cluster tree on each unit type � � Not just on phones Not just on phones � � Tag phones with word they come from Tag phones with word they come from � � d_limited d_limited and and d_domain d_domain are treated as different are treated as different �

Tuning and Testing � Test it on some real data Test it on some real data � � Ensure number/symbol expansions are correct Ensure number/symbol expansions are correct � � Prompts should probably be word expanded Prompts should probably be word expanded � � Flight US187 Flight US187 - -> flight u s one eight seven > flight u s one eight seven � � Remove bad prompts Remove bad prompts � � Or fix labels Or fix labels � � Remember to keep access to the speaker Remember to keep access to the speaker � � If you have to update the system, you need the same If you have to update the system, you need the same � speaker available speaker available

Summary Building a voice � Building a voice � � Databases design, recording, labeling Databases design, recording, labeling � � Parameter extraction and model building Parameter extraction and model building � Limited domain synthesis � Limited domain synthesis �

Speech Processing 15-492/18-492 Speech Synthesis Building Voices - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Building Voices Building a Voice Designing the Prompts Designing the Prompts Recording the Prompts Recording the Prompts Labeling the Utterances Labeling the Utterances

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

EVERYONES RESPONSIBILITY Hosted by Nelson Harris, Jeff Calderone, and Nick Switzer Join Us for

Bi-clustering and co-clustering Going further in cluster analysis and classifjcation: publics ou

Low Cost solution for Pose Estimation of Quadrotor mangal@iitk.ac.in

1 LHC-ATLAS Micromegas

Cold Electronics and Ionization Charge Extraction in the MicroBooNE LArTPC New Perspectives 2018

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection &

CMB Polarization from the South Pole: BICEP1, BICEP2, and Keck Array Immanuel Buder for the

The TAx4 experiment E. Kido for the Telescope Array Collaboration Institute for Cosmic Ray

Sambuz

Useful Links

Newsletter

Mail Us