Text-to-Speech Synthesis Bernd Möbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Möbius Formant synthesis 1
l Formant synthesis ▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model B Möbius Formant synthesis 2
Source-filter model of speech production
l Source-filter model of speech production B Möbius Formant synthesis 4
Source-filter model of speech production Glottal excitation Vocal tract: frequency response Sound spectrum
l Vocal tract as acoustic filter ▪ Vocal tract geometry, determined by tongue position (and jaw opening and lip protrusion, not shown) B Möbius Formant synthesis 6
l Vocal tract: acoustic tube model [Clark et al., 2007a, p.241] B Möbius Formant synthesis 7
l Idealized simple tube model ▪ acoustic signals evolve as longitudinal waves in vocal tract ▪ 2 physical parameters of acoustic waves ▪ sound pressure p : change of air pressure evoked by sound at place of measurement ▪ sound velocity v : speed of air particles caused by sound event (note: this is not speed of sound c !) ▪ perfect reflexion at sound-hard (lossless) walls of tube ▪ v = 0 at place of reflexion ▪ (lossy) reflexion at sound-soft transition from vocal tract to free acoustic field (i.e. from lips to air) ▪ p = 0 at place of radiation B Möbius Formant synthesis 8
l Sound pressure waves in vocal tract p=0 p=0 v=0 v=0 [Hess, ms.] B Möbius Formant synthesis 9
l Computing formant frequencies ▪ resonance frequencies of neutral vocal tract computed as speed of sound divided by wave length: f i = c / λ i ▪ frequencies of resonances/formants: F1 = 340 / (4 * 0.17) = 340 / 0.68 = 500 Hz F2 = 340 / (4/3 * 0.17) = 3 * 340 / (4 * 0.17) = 1500 Hz F3 = 340 / (4/5 * 0.17) = 5 * 340 / (4 * 0.17) = 2500 Hz ▪ distribution of formant frequencies in neutral vocal tract corresponds to formants of central vowel 'schwa' [ ǝ ] ▪ simple tube model, with constant cross-section, is inadequate for computing formants of other vowels (cf. acoustic theory of vowel articulation [Ungeheuer 1962] ) B Möbius Formant synthesis 10
l Tube model with varying cross-section [Clark et al., 2007a, p.246] B Möbius Formant synthesis 11
l Acoustic theory of vowel articulation B Möbius Formant synthesis 12
l Vowels (IPA) F2 F1 B Möbius Formant synthesis 13
l Vowels (German, [Pompino-Marschall 1995] ) B Möbius Formant synthesis 14
l Vowels (German, F1/F2/F3 [Möbius 2001a] ) B Möbius Formant synthesis 15
l Cascade vs. parallel resonators [Allen et al. 1987] B Möbius Formant synthesis 16
l Cascade/parallel resonators and voice source [Allen et al. 1987] B Möbius Formant synthesis 17
l Klatt's formant synthesizer [Klatt 1980] B Möbius Formant synthesis 18
l Klatt parameter values [Allen et al. 1987] B Möbius Formant synthesis 19
l IMSkpe: Klatt parameter editor ▪ Klatt parameter editor GUI ▪ interactive tool for doing formant synthesis http://sourceforge.net/projects/imskpe/ https://github.com/imskpe/imskpe/ (Andreas Madsack, IMS, Univ. Stuttgart) B Möbius Formant synthesis 20
l Formant synthesis: Summary ▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model ▪ explicit control of voice source parameters and prosody ▪ fair approximation of formant structure of speech sounds ▪ extensive knowledge acquisition and rule building phases ▪ TTS Systems: Klatt-Talk (MITalk, DECtalk), Delta, Infovox B Möbius Formant synthesis 21
l Essential content Formant synthesis ▪ architecture and functional principle of a formant synthesizer, here: Klatt synthesizer ▪ relationship between a formant synthesizer and the source-filter model of speech production B Möbius Formant synthesis 22
Recommend
More recommend