speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Synthesis Building Voices - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Building Voices Building a Voice Designing the Prompts Designing the Prompts Recording the Prompts Recording the Prompts Labeling the Utterances Labeling the Utterances


  1. Speech Processing 15-492/18-492 Speech Synthesis Building Voices

  2. Building a Voice Designing the Prompts � Designing the Prompts � Recording the Prompts � Recording the Prompts � Labeling the Utterances � Labeling the Utterances � Finding parameters (F0, MCEP) � Finding parameters (F0, MCEP) � Building the synthesis voice � Building the synthesis voice � Tuning and Testing � Tuning and Testing �

  3. Software Requirements Festival Speech Synthesizer � Festival Speech Synthesizer � � Free software language independent Free software language independent � synthesizer synthesizer � Multiplatform: Windows, Linux, OSX Multiplatform: Windows, Linux, OSX � � Used for research and commercial synthesis Used for research and commercial synthesis � Festvox � Festvox � � Voice building tools Voice building tools � � Scripts, instructions, example databases Scripts, instructions, example databases � � Used for over 40 different languages Used for over 40 different languages �

  4. Festival Speech Synthesis After Installation � After Installation � festival – –tts tts stuff.txt stuff.txt � festival � festival � festival � festival> (SayText SayText “hello world”) “hello world”) � festival> ( �

  5. Building Synthetic Voices http://festvox.org/bsv � http://festvox.org/bsv � � Look at section on “Telling the Time” Look at section on “Telling the Time” �

  6. Automatic Labeling

  7. Automatic Labeling (bad)

  8. Parameterization Extract pitch marks from data � Extract pitch marks from data � � Find voices/unvoiced regions Find voices/unvoiced regions � � Add “fake” pitch marks during unvoiced regions Add “fake” pitch marks during unvoiced regions � Extract MFCC pitch synchronously � Extract MFCC pitch synchronously � � Instead of a fixed frame advance (e.g. 5ms) Instead of a fixed frame advance (e.g. 5ms) � � Extract it at each pitch mark Extract it at each pitch mark � � Try to capture the spectrum at the pitch period Try to capture the spectrum at the pitch period �

  9. Pitchmarks

  10. Building a LDOM synthesizer Build cluster tree on each unit type � Build cluster tree on each unit type � � Not just on phones Not just on phones � � Tag phones with word they come from Tag phones with word they come from � � d_limited d_limited and and d_domain d_domain are treated as different are treated as different �

  11. Tuning and Testing � Test it on some real data Test it on some real data � � Ensure number/symbol expansions are correct Ensure number/symbol expansions are correct � � Prompts should probably be word expanded Prompts should probably be word expanded � � Flight US187 Flight US187 - -> flight u s one eight seven > flight u s one eight seven � � Remove bad prompts Remove bad prompts � � Or fix labels Or fix labels � � Remember to keep access to the speaker Remember to keep access to the speaker � � If you have to update the system, you need the same If you have to update the system, you need the same � speaker available speaker available

  12. Summary Building a voice � Building a voice � � Databases design, recording, labeling Databases design, recording, labeling � � Parameter extraction and model building Parameter extraction and model building � Limited domain synthesis � Limited domain synthesis �

Recommend


More recommend