Text-to-Speech Synthesis Bernd Mbius Language Science and - PowerPoint PPT Presentation

Text-to-Speech Synthesis Bernd Möbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Möbius Formant synthesis 1

l Formant synthesis ▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model B Möbius Formant synthesis 2

Source-filter model of speech production

l Source-filter model of speech production B Möbius Formant synthesis 4

Source-filter model of speech production Glottal excitation Vocal tract: frequency response Sound spectrum

l Vocal tract as acoustic filter ▪ Vocal tract geometry, determined by tongue position (and jaw opening and lip protrusion, not shown) B Möbius Formant synthesis 6

l Vocal tract: acoustic tube model [Clark et al., 2007a, p.241] B Möbius Formant synthesis 7

l Idealized simple tube model ▪ acoustic signals evolve as longitudinal waves in vocal tract ▪ 2 physical parameters of acoustic waves ▪ sound pressure p : change of air pressure evoked by sound at place of measurement ▪ sound velocity v : speed of air particles caused by sound event (note: this is not speed of sound c !) ▪ perfect reflexion at sound-hard (lossless) walls of tube ▪ v = 0 at place of reflexion ▪ (lossy) reflexion at sound-soft transition from vocal tract to free acoustic field (i.e. from lips to air) ▪ p = 0 at place of radiation B Möbius Formant synthesis 8

l Sound pressure waves in vocal tract p=0 p=0 v=0 v=0 [Hess, ms.] B Möbius Formant synthesis 9

l Computing formant frequencies ▪ resonance frequencies of neutral vocal tract computed as speed of sound divided by wave length: f i = c / λ i ▪ frequencies of resonances/formants: F1 = 340 / (4 * 0.17) = 340 / 0.68 = 500 Hz F2 = 340 / (4/3 * 0.17) = 3 * 340 / (4 * 0.17) = 1500 Hz F3 = 340 / (4/5 * 0.17) = 5 * 340 / (4 * 0.17) = 2500 Hz ▪ distribution of formant frequencies in neutral vocal tract corresponds to formants of central vowel 'schwa' [ ǝ ] ▪ simple tube model, with constant cross-section, is inadequate for computing formants of other vowels (cf. acoustic theory of vowel articulation [Ungeheuer 1962] ) B Möbius Formant synthesis 10

l Tube model with varying cross-section [Clark et al., 2007a, p.246] B Möbius Formant synthesis 11

l Acoustic theory of vowel articulation B Möbius Formant synthesis 12

l Vowels (IPA) F2 F1 B Möbius Formant synthesis 13

l Vowels (German, [Pompino-Marschall 1995] ) B Möbius Formant synthesis 14

l Vowels (German, F1/F2/F3 [Möbius 2001a] ) B Möbius Formant synthesis 15

l Cascade vs. parallel resonators [Allen et al. 1987] B Möbius Formant synthesis 16

l Cascade/parallel resonators and voice source [Allen et al. 1987] B Möbius Formant synthesis 17

l Klatt's formant synthesizer [Klatt 1980] B Möbius Formant synthesis 18

l Klatt parameter values [Allen et al. 1987] B Möbius Formant synthesis 19

l IMSkpe: Klatt parameter editor ▪ Klatt parameter editor GUI ▪ interactive tool for doing formant synthesis http://sourceforge.net/projects/imskpe/ https://github.com/imskpe/imskpe/ (Andreas Madsack, IMS, Univ. Stuttgart) B Möbius Formant synthesis 20

l Formant synthesis: Summary ▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model ▪ explicit control of voice source parameters and prosody ▪ fair approximation of formant structure of speech sounds ▪ extensive knowledge acquisition and rule building phases ▪ TTS Systems: Klatt-Talk (MITalk, DECtalk), Delta, Infovox B Möbius Formant synthesis 21

l Essential content Formant synthesis ▪ architecture and functional principle of a formant synthesizer, here: Klatt synthesizer ▪ relationship between a formant synthesizer and the source-filter model of speech production B Möbius Formant synthesis 22

Text-to-Speech Synthesis Bernd Mbius Language Science and - PowerPoint PPT Presentation

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Mbius Formant synthesis 1 l Formant synthesis acoustic-parametric synthesis method modeling

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

11-752: Speech Synthesis Objectives Understand basic processing in speech synthesis

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Speech Synthesis Talking heads Singing Synthesis More

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Formal Verification of a State-of-the-Art Integer Square Root Guillaume Melquiond Rapha el

Performance analysis and formal verification of cognitive wireless networks Gian-Luca Dei Rossi

Towards Formal Verification in Cryptographic Web Applications A Three Year Evolution Nadim

A Proof Repository for Formal Verification of Software Michael Franssen WASDeTT- 3 September 20

CAN STANDARD ANALYSIS TOOLS BE USED ON DECOMPRESSED SPEECH? R.J.J.H. van Son Institute of

Questions about homework? (note the blank page on older version) In part 4, use

DSP HW2-2 Speech Analysis Outline 1. Introduction 2.

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

Text-to-Speech Synthesis Bernd Mbius Language Science and - PowerPoint PPT Presentation

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Mbius Formant synthesis 1 l Formant synthesis acoustic-parametric synthesis method modeling

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

11-752: Speech Synthesis Objectives Understand basic processing in speech synthesis

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Speech Synthesis Talking heads Singing Synthesis More

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Formal Verification of a State-of-the-Art Integer Square Root Guillaume Melquiond Rapha el

Performance analysis and formal verification of cognitive wireless networks Gian-Luca Dei Rossi

Towards Formal Verification in Cryptographic Web Applications A Three Year Evolution Nadim

A Proof Repository for Formal Verification of Software Michael Franssen WASDeTT- 3 September 20

CAN STANDARD ANALYSIS TOOLS BE USED ON DECOMPRESSED SPEECH? R.J.J.H. van Son Institute of

Questions about homework? (note the blank page on older version) In part 4, use

DSP HW2-2 Speech Analysis Outline 1. Introduction 2.

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 5: Speech modeling and

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and