Introduction to Articulatory Speech Synthesis Eva Lasarcyk, M.A. - PowerPoint PPT Presentation

Foundations of Language Science and Technology Introduction to Articulatory Speech Synthesis Eva Lasarcyk, M.A. January 25, 2010 Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Guten Tag, liebe Zuhörer. (Hello, dear listeners.) Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Why speech synthesis? Applications Machine reads aloud text for you handicapped people for authors to check their texts Avatars Telephone dialog systems Natural interaction with service robots Part of "Speech-To-Speech" translation systems Research – phonetic applications Imitate, manipulate, and understand speech production And perception Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

How can we create synthetic speech? 3 main strategies Imitate acoustics directly – Formant synthesis Record speech, chop it up, regroup – Concatenative synthesis Imitate, simulate speech production process – Articulatory synthesis Most systems - Long history nowadays use - Some recent major this technique improvements Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Concatenation of speech segments Record speech, chop it up, regroup – Concatenative synthesis Goal: Record a LOT to manipulate LITTLE Trend: Huge databases with intelligent selection of units Advantages Willkommen beim Tag der offenen Tür. Sounds quite natural You need little phonetic knowledge, it's more a signal processing task High quality can be obtained by using a LOT of speech data Disadvantages Data recording costly (time/money) Speaker-dependent, post-hoc manipulations decrease quality, structurally new words may easily sound "funny" Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

… "ideal" synthesis should be able to … Cf.: Christine H. Shadle and Robert I. Damper (2001). Prospects for Articulatory Synthesis: A Position Paper. In: Proceedings 4th International Speech Communication Association (ISCA) Workshop on Speech Synthesis, Pitlochry. 121-126. sound as natural & intelligible as a human - highly complex recreate a specific voice - simulation time intensive - high quality create "generic" voices hard to achieve sound like extraordinary speakers (opera singer, alien) speak any language with any emotion without much effort … be freely controllable … allow us insights into speech production and perception  Do it yourself: Imitate speech production Physical simulation of sound with an articulatory model Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

How are speech waves created? Source + Filter = Speech signal Speech Vocal folds Vocal Tract Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

The source: Vocal fold oscillation Different default positions for breathing, speaking and e.g. whispering. Oscillation is not only "open-close" but has a vertical component, too. Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

The filter – resonance cavity shapes x-ray movie showing articulation movements during speaking Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Filter: Tongue position of vowels Chart of vocal tract shapes for different vowels Depending on the vowel, the tongue has different shapes Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Now we've almost all we need ... Source + Filter = Speech signal Speech Vocal Vocal folds Tract … to create speech sounds ourselves! Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Mechanical speaking machine Wolfgang von Kempelen image see e.g. http://www.acoustics.hut.fi/p 1791: "Mechanismus der ublications/files/theses/lem menschlichen Sprache nebst metty_mst/chap2.html der Beschreibung einer sprechenden Maschine." One of the first attempts to recreate human speech Available in the Phonetics department Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Vocal tract: Geometrical model Oral cavity Area slice Subglottal system Supraglottal system Mouth Lungs Nostrils Nasal cavity Glottis Glottis Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Supraglottal system /a:/ Hyoid bone (2), lower jaw (3), lips /i:/ (2), velum (1), tongue (12) Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Computer speaking machine – control... Temporal coordination of gestures needs to be controlled A "brain" needs to give the instructions In this synthesis system it is realized by the "gestural score" Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

3D articulatory speech synthesizer 3D model Aerodynamic-acoustic Gestural vocal tract; simulation score glottis Main advantage over other synthesis strategies: Speech production becomes transparent VocalTractLab by Peter Birkholz, Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 University Hospital Aachen, Articulatory Synthesis www.vocaltractlab.de

Consonants and vowels vocalic gesture consonantal gesture glottal gesture Only the targets are specified, the transitions are calculated automatically. Sometimes the target realizations change due to the phonetic context (e.g. [g] target in [i:gi:] vs. [u:gu:]) [a:sa: i:si: u:su:] more examples on simple gesture [aSa iSi uSu] patterns ... Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Single gestures: Lips Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Single gestures: Velum Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Gestural score vocalic gestures gestural consonantal control model gestures + dominance velic model gestures glottal gestures F0 (pitch) gestures F0 (pitch) gestures pulmonic gestures Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

Introduction to Articulatory Speech Synthesis Eva Lasarcyk, M.A. - PowerPoint PPT Presentation

Foundations of Language Science and Technology Introduction to Articulatory Speech Synthesis Eva Lasarcyk, M.A. January 25, 2010 Eva Lasarcyk Foundations of Language Science and Technology: Saarland University 2010 Articulatory Synthesis

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Articulatory Phonetics The Articulatory System and the International Phonetic Alphabet The IPA:

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Articulatory Phonetics IPA: The Vowels and the International Phonetic Alphabet Practice

Artimate : an articulatory animation framework for audiovisual speech synthesis Ingmar Steiner

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Using multimodal speech production data to evaluate articulatory animation for audiovisual speech

11-752: Speech Synthesis Objectives Understand basic processing in speech synthesis

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Towards an Articulatory Understanding of Historical Phonology Z.L. Zhou zzhou1@swarthmore.edu

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Performance Art UDLS Sep 20 2019 Paul Bucci Background - Did my undergrad in visual art at UBC

1 So you have a variety of different sites to grab information that could be useful for your

Inductive Bias of Deep Networks through Language Patterns Roy Schwartz University of Washington

* JulieRenee.com + I am free to live a life of spiritual freedom and full self expression

Openings and Closings with the Rules of the Road Patrick Malone Roadmap to a Winning System

An Ethnomethodological Approach to the Interpretation of Qualitative Data

BUSINESS MANAGER Hello to everyone at Djarragun. My name is A ntho ny Bo nnici but everyone can

Nice, Insightful, Awesome, Educational, Good-looking, Generous, Lean, Young, Interesting, Funny

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us