Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis - PowerPoint PPT Presentation

Seminar on Language Technology Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis Konstantin Tretjakov kt@ut.ee 11.12.07

Speech Synthesis “Computers are getting smarter all the time. Scientists tell us that soon they will be able to talk with us. (By “they”, I mean computers. I doubt scientists will ever be able to talk to us.) - Dave Barry

Speech Synthesis in year 1791

Speech Synthesis in year 1835 J. Faber “Euphonia” http://www.ling.su.se/staff/hartmut/kemplne.htm

Speech Synthesis in year 1937 Riesz Model http://www.ling.su.se/staff/hartmut/kemplne.htm

Speech Synthesis in year 1939 H.Dudley “VODER” http://www.ling.su.se/staff/hartmut/kemplne.htm

Speech Synthesis in year 1953 Gunnar Fant's “OVE” (Orator Verbis Electris) Formant Synthesizer for vowels http://www.ling.su.se/staff/hartmut/kemplne.htm

Formant Synthesis

http://www.geofex.com/Article_Folders/wahpedl/voicewah.htm

Modern Speech Synthesis ● 1968 - First full TTS (Umeda et al.) ● 1977 – Diphone concat. (J. Olive) ● 1979 – MITTalk (Allen et al) ● 1984 – DECTalk (Klatt, DEC) ● 1995 – Eurovocs ● 200? - IBM

Modern Speech Synthesis ● 1968 - First full TTS (Umeda et al.) ● 1977 – Diphone concat. (J. Olive) ● 1979 – MITTalk (Allen et al) ● 1984 – DECTalk (Klatt, DEC) ● 1995 – Eurovocs Rule-based ● 200? - IBM Data-driven

Outline ● History of Speech Synthesis ● Text-To-Speech System Architecture

Text-to-Speech System Text Text Analysi Analysis ● Text normalization ● PoS tagging Phoneti onetic c analys nalysis ● Homonym disambiguation ● Dictionary Lookup ● Grapheme-to-Phoneme Pros rosod odic A ic Ana nalys lysis is ● Boundary placement ● Pitch accent assignment ● Duration computation Wa Wavefor orm Synth ynthes esis is http://www.stanford.edu/class/linguist236/

Text-to-Speech System Data-driven? Text Text Analysi Analysis ● Text normalization ● PoS tagging Phoneti onetic c analys nalysis ● Homonym disambiguation ● Dictionary Lookup ● Grapheme-to-Phoneme Pros rosod odic A ic Ana nalys lysis is ● Boundary placement ● Pitch accent assignment ● Duration computation Wa Wavefor orm Synth ynthes esis is

1) Text Normalization ● He stole $100 million from the bank. ● It's 13 St. Andrews St. ● The home page is http://www.ut.ee. Method: ● Split to tokens. ● Map tokens to words. ● Identify types for words.

2) Phonetic Analysis ● My latest project is to learn how to better project my voice. ● On May 5 1996, the university bought 1996 computers. ● Yesterday it rained 3 in. Take 1 out, then put 3 in.

2) Phonetic Analysis ● How to pronounce a word? – Look in the dictionary! ● But what about unknown words and names? ● Complex languages: German/French/Turkish – Letter to sound rules ● .. also neural networks (NETTalk) ● .. pr. by analogy (PRONOUNCE) ● .. case-based (MBRTalk) more later ● ... and muc uch more.

3) Prosodic Analysis ● Prosody: phrases, accents, F0 contour, duration ● The Tilt Intonation Model e.g. Trees

4) Waveform synthesis ● Articulatory synthesis (a-la VODER) ● Formant (a-la OVE) ● Concatenative synthesis – Domain-specific (“talking clock”, “weather”) – Diphones (PSOLA, MBROLA) – Unit selection

4) Waveform synthesis ● Domain-specific synthesis is easy: #!/bin/bash hours=`date +"%-l"` mins=`date +"%-M"` ampm=`date +"%-P"` play $hours.wav play $mins.wav play $ampm.wav

4) Waveform synthesis ● Diphone synthesis – Use diphones: middle of one phone to middle of next. – Just a bit of DSP to connect diphones. ● PSOLA ● MBROLA

4) Waveform synthesis ● Unit selection – Use the entire speech corpus as the acoustic inventory. – Select at runtime the longest available string of phonetic segments. – Minimize number of concatenations. – Reduce DSP.

Text-to-Speech System Data-driven? Text Text Analysi Analysis ● Text normalization ● PoS tagging Phoneti onetic c analys nalysis ● Homonym disambiguation ● Dictionary Lookup ● Grapheme-to-Phoneme Pros rosod odic A ic Ana nalys lysis is ● Boundary placement ● Pitch accent assignment ● Duration computation Wa Wavefor orm Synth ynthes esis is

Outline ● History of Speech Synthesis ● Text-To-Speech System Architecture ● Grapheme-to-Phoneme transcription

GTP transcription ● Lexicon: – “cepstra” -> (k eh p)' (s t r aa) – What about unknown words? – Commercial systems have 3-part system: ● Big dictionary ● Special code for names/acronyms/etc ● Mach Machine-learned ine-learned let letter ter-to-soun o-sound (LTS) syst (LTS) system em for other unknown words

Learning LTS rules ● Induce LTS from a dictionary of the language (Black et al. 1998) ● Two steps: – Alignment – Decision tree-based rule-induction

Alignment ● Letters: c h e c k e d ● Phones: ch _ eh _ k _ t ● Black et al. propose 2 methods: – Expectation-Maximization – Estimate p(letter | phone) from valid alignments, take best. ● Devil in the details

Decision trees for LTS ● Now that aligned data is available, train a decision tree: – ### c hek -> ch – che c ked -> _ ● 92-96% letter acc. (58-75% word acc.) for English

GTP transcription ● Decision-tree based (Black et al.) ● ANN-based (NETTalk, Sejnowski et al.) ● Pronunciation-by-Analogy (Damper et al.) ● Memory-based (MBRTalk, Stanfill) ● Transducer-based (I. Bulyko) ● Non-segmental (A. Cohen)

Outline ● History of Speech Synthesis ● Text-To-Speech System Architecture ● Grapheme-to-Phoneme transcription ● Conclusion

Text-to-Speech System Text Text Analysi Analysis ● Text normalization ● PoS tagging Phoneti onetic c analys nalysis ● Homonym disambiguation ● Dictionary Lookup ● Grapheme-to-Phoneme Pros rosod odic A ic Ana nalys lysis is ● Boundary placement ● Pitch accent assignment ● Duration computation Wa Wavefor orm Synth ynthes esis is http://www.stanford.edu/class/linguist236/

Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis - PowerPoint PPT Presentation

Seminar on Language Technology Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis Konstantin Tretjakov kt@ut.ee 11.12.07 Speech Synthesis Computers are getting smarter all the time. Scientists tell us that soon they will

So Sorting ing do documents uments by b y base se the heme me wit ith h sy synt nthe

Cont ntrolling lling p potent ntia ial g l geno notoxic xic imp impur urit itie ies s

Fi Fission an and lan lanthan anid ide productio ion in in r -pr process nuc nucleosynt

Problem Formulation Specialized algorithms are required for clock (and power nets) due to

Speech-Based Interaction Using Speech as a Natural Data Type Speech as Input Chief

MIKE AMBINDER, PhD VALVE DATA TO DRIVE DECISION-MAKING HOW AND WHY VALVE USES DATA TO DRIVE THE

Using multimodal speech production data to evaluate articulatory animation for audiovisual speech

for Speech Synthesis and Sensor Data Augmentation Deep Generative Neural Network Speech Text

Improving the Compositionality of Word Embeddings M ASTER T HESIS Supervisors: Author: dr.

Protecting your Photos Mike Richards Typical Installation Laptop and basic desktop System Drive

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

FRAMEW AMEWORK ORK SYNTHESI HESIS Tatyana A. Stroganova, Vladimir K. Vasilin, Georgiy A.

Using Data to Drive Results The price of the light is less than the cost of the darkness.

DATA COLLECTION & PREPARATION FOR SPEECH SYSTEMS Chevy Levitan Mentor: Erica Cooper

Marijuana & De gner Drugs Synt tics Impact, Availability and Emerging Trends

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Presentation constrained optimization Wenda Chen Speech Data and Constrained Optimization

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Acts Speech Act Theory: Basic concept of Speech Act Theory is Saying is part of doing

Enhance CRM with integrated data to drive strategic decisions + Data Integration RigDig CRM

speech pathologists do? Leanne Stein, M.S., CCC-SLP Speech-Language Pathologist Speech Therapy

Clock lock Tree ee Res esynt nthes hesis is for or Mult ulti-cor i-corner ner Mult

DEI PRACTITIONER AS DATA ANALYST: Using Data Analysis to Drive Cultural and Institutional Change

Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis - PowerPoint PPT Presentation

Seminar on Language Technology Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis Konstantin Tretjakov kt@ut.ee 11.12.07 Speech Synthesis Computers are getting smarter all the time. Scientists tell us that soon they will

So Sorting ing do documents uments by b y base se the heme me wit ith h sy synt nthe

Cont ntrolling lling p potent ntia ial g l geno notoxic xic imp impur urit itie ies s

Fi Fission an and lan lanthan anid ide productio ion in in r -pr process nuc nucleosynt

Problem Formulation Specialized algorithms are required for clock (and power nets) due to

Speech-Based Interaction Using Speech as a Natural Data Type Speech as Input Chief

MIKE AMBINDER, PhD VALVE DATA TO DRIVE DECISION-MAKING HOW AND WHY VALVE USES DATA TO DRIVE THE

Using multimodal speech production data to evaluate articulatory animation for audiovisual speech

for Speech Synthesis and Sensor Data Augmentation Deep Generative Neural Network Speech Text

Improving the Compositionality of Word Embeddings M ASTER T HESIS Supervisors: Author: dr.

Protecting your Photos Mike Richards Typical Installation Laptop and basic desktop System Drive

Speech &amp; Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

FRAMEW AMEWORK ORK SYNTHESI HESIS Tatyana A. Stroganova, Vladimir K. Vasilin, Georgiy A.

Using Data to Drive Results The price of the light is less than the cost of the darkness.

DATA COLLECTION &amp; PREPARATION FOR SPEECH SYSTEMS Chevy Levitan Mentor: Erica Cooper

Marijuana &amp; De gner Drugs Synt tics Impact, Availability and Emerging Trends

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Presentation constrained optimization Wenda Chen Speech Data and Constrained Optimization

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Acts Speech Act Theory: Basic concept of Speech Act Theory is Saying is part of doing

Enhance CRM with integrated data to drive strategic decisions + Data Integration RigDig CRM

speech pathologists do? Leanne Stein, M.S., CCC-SLP Speech-Language Pathologist Speech Therapy

Clock lock Tree ee Res esynt nthes hesis is for or Mult ulti-cor i-corner ner Mult

DEI PRACTITIONER AS DATA ANALYST: Using Data Analysis to Drive Cultural and Institutional Change

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

DATA COLLECTION & PREPARATION FOR SPEECH SYSTEMS Chevy Levitan Mentor: Erica Cooper

Marijuana & De gner Drugs Synt tics Impact, Availability and Emerging Trends