text to speech synthesis
play

Text-to-Speech Synthesis Bernd Mbius Language Science and - PowerPoint PPT Presentation

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Mbius Formant synthesis 1 l Formant synthesis acoustic-parametric synthesis method modeling


  1. Text-to-Speech Synthesis Bernd Möbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Möbius Formant synthesis 1

  2. l Formant synthesis ▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model B Möbius Formant synthesis 2

  3. Source-filter model of speech production

  4. l Source-filter model of speech production B Möbius Formant synthesis 4

  5. Source-filter model of speech production Glottal excitation Vocal tract: frequency response Sound spectrum

  6. l Vocal tract as acoustic filter ▪ Vocal tract geometry, determined by tongue position (and jaw opening and lip protrusion, not shown) B Möbius Formant synthesis 6

  7. l Vocal tract: acoustic tube model [Clark et al., 2007a, p.241] B Möbius Formant synthesis 7

  8. l Idealized simple tube model ▪ acoustic signals evolve as longitudinal waves in vocal tract ▪ 2 physical parameters of acoustic waves ▪ sound pressure p : change of air pressure evoked by sound at place of measurement ▪ sound velocity v : speed of air particles caused by sound event (note: this is not speed of sound c !) ▪ perfect reflexion at sound-hard (lossless) walls of tube ▪ v = 0 at place of reflexion ▪ (lossy) reflexion at sound-soft transition from vocal tract to free acoustic field (i.e. from lips to air) ▪ p = 0 at place of radiation B Möbius Formant synthesis 8

  9. l Sound pressure waves in vocal tract p=0 p=0 v=0 v=0 [Hess, ms.] B Möbius Formant synthesis 9

  10. l Computing formant frequencies ▪ resonance frequencies of neutral vocal tract computed as speed of sound divided by wave length: f i = c / λ i ▪ frequencies of resonances/formants: F1 = 340 / (4 * 0.17) = 340 / 0.68 = 500 Hz F2 = 340 / (4/3 * 0.17) = 3 * 340 / (4 * 0.17) = 1500 Hz F3 = 340 / (4/5 * 0.17) = 5 * 340 / (4 * 0.17) = 2500 Hz ▪ distribution of formant frequencies in neutral vocal tract corresponds to formants of central vowel 'schwa' [ ǝ ] ▪ simple tube model, with constant cross-section, is inadequate for computing formants of other vowels (cf. acoustic theory of vowel articulation [Ungeheuer 1962] ) B Möbius Formant synthesis 10

  11. l Tube model with varying cross-section [Clark et al., 2007a, p.246] B Möbius Formant synthesis 11

  12. l Acoustic theory of vowel articulation B Möbius Formant synthesis 12

  13. l Vowels (IPA) F2 F1 B Möbius Formant synthesis 13

  14. l Vowels (German, [Pompino-Marschall 1995] ) B Möbius Formant synthesis 14

  15. l Vowels (German, F1/F2/F3 [Möbius 2001a] ) B Möbius Formant synthesis 15

  16. l Cascade vs. parallel resonators [Allen et al. 1987] B Möbius Formant synthesis 16

  17. l Cascade/parallel resonators and voice source [Allen et al. 1987] B Möbius Formant synthesis 17

  18. l Klatt's formant synthesizer [Klatt 1980] B Möbius Formant synthesis 18

  19. l Klatt parameter values [Allen et al. 1987] B Möbius Formant synthesis 19

  20. l IMSkpe: Klatt parameter editor ▪ Klatt parameter editor GUI ▪ interactive tool for doing formant synthesis http://sourceforge.net/projects/imskpe/ https://github.com/imskpe/imskpe/ (Andreas Madsack, IMS, Univ. Stuttgart) B Möbius Formant synthesis 20

  21. l Formant synthesis: Summary ▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model ▪ explicit control of voice source parameters and prosody ▪ fair approximation of formant structure of speech sounds ▪ extensive knowledge acquisition and rule building phases ▪ TTS Systems: Klatt-Talk (MITalk, DECtalk), Delta, Infovox B Möbius Formant synthesis 21

  22. l Essential content Formant synthesis ▪ architecture and functional principle of a formant synthesizer, here: Klatt synthesizer ▪ relationship between a formant synthesizer and the source-filter model of speech production B Möbius Formant synthesis 22

Recommend


More recommend