Hidden Markov ov Model (HMM) based S Speech Synthesis using ing - PowerPoint PPT Presentation

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer Nawaz Research Officer (III)

Speech Synthesis Overvie rview: Text to be Synthesized Natural Language Processing (NLP) (NLP) Speech Synthesis Engine Synthesized Speech 2

Introduction: Rule-based, formant synthesis � Hand-crafting each phonetic units by rule rules CORPUS-BASED: � Concatenative synthesis � High quality speech can be synthesized ized using waveform concatenation algorithms. concatenation algorithms. � To obtain various voices, a large amoun ount of speech data is necessary. � Statistical parametric synthesis � Generate speech parameters from stat statistical models � Voice quality can easily be changed by by transforming HMM parameters. 3

Approaches at CLE: CORPUS-BASED: � Unit Selection � HMM based. Comparison of two Approaches: Unit Unit Selection Selection HMM based HMM based Unit Unit Selection Selection HMM based HMM based Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: High Quality at Waveform level • Sma Small Foot Print (Specific Domain) • Smo Smooth • Stab Stable Quality Disadvantages: Disadvantages: Disadvantages: Disadvantages: • Large footprints Vocode coder sound • Discontinuous (Domai main-independent) • Unstable quality 4

Synthesis Model: ource Filter Model: Source excitation part Source Source Source Vocal tract Vocal tract Vocal tract Vocal tract tract tract tract tract resonance part Pulse train Excitation Linear ( n ( n ) ) e e time-invariant ant Speech Speech system system ( n ) h ( ) ( ) * ( ) = x n h n e n White noise � The h(n) is defined by the state output put vector of the HMM e.g mel-cepstrum 5

General Overview(HTS): Training Part Training Part Training Part Training Part Speech Input Extract Spectrum, rum, F 0 , labels Labels Train Acoustic tic Models Stored Stored Models Parameter Synthesis Part Synthesis Part Synthesis Part Synthesis Part Text Input Generation Synthesis Synthesized Filter Speech 6

Challenges: Generation of the full-context style la le labels. Addition of Stress/Syllable Layer. Defining the Question Set. Optimizing the Synthesized Quality. Optimizing the Synthesized Quality. 7

Full-Context Label Style: P A K I S T_D A N A N Phoneme sequence T_D _D-A-N P-A-K Tri- Tri -phone context dependen phone context dependen ndent model ndent model Tri Tri - - phone context dependen phone context dependen ndent model ndent model Phoneme P A K I S T_D A N A N sequence x^P x^P- x^P x^P - - -A A A A+K +K +K +K= =I@x_x = = I@x_x/A … I@x_x I@x_x /A … /A … /A … S^T_D- S^T_D S^T_D S^T_D - -A - - - - - A+N= A A +N=x@x_x +N= +N= x@x_x x@x_x/A … x@x_x /A … /A … /A … Full Full- -context style context style context depe context depe ependent model ependent model Full Full - - context style context style context depe context depe ependent model ependent model 8

Full-Context Format: x^x- SIL +A=L@1_0/A:0_0_0/B:0-0-0@1-0& &1-1#1-1$1-1!0-0;0- … x^SIL- A +L=I_I@1_1/A:0_0_0/B:0-0-1@1-2& &1-9#1-3$1-1!0-2;0- … SIL^A- L +I_I=A@1_2/A:0_0_1/B:0-0-2@2-1 1&2-8#1-3$1-1!0-1;0-0 … A^L- I_I +A=P@2_1/A:0_0_1/B:0-0-2@2-1& &2-8#1-3$1-1!0-1;0- … ۔۔۔ ��ا �� 9 9

Full-Context Format: SIL^A-L+I_I=A@ 1_2/A:0_0_1/B:0-0-2@2-1& 1&2-8#1-3$1-1!0-1; 0-0|I_I/C:1+0+2/D:0_0/E:co /E:content+2@1+5&1+ 4#0+1/F:content_2/G:0_0/ _0/H:9=5^1=2|NONE /I:8=6/J:17+11-2 Segmental Context Supra-Segmental Supr Context Context Segmental Segmental Segmental Segmental Supra Supra Supra Supra- - -Segmental - Segmental Segmental Segmental • Current Phoneme • Syl Syllable • Previous two Phonemes • Str Stress • Next two Phonemes • Wo Word • Ph Phrase • PO POS 10 10

teps to Generate Full-Conte ontext Labels: Extract Segmental & extGrid File Word Layer Apply Stress & Syllabification Rules Rules Align Syllable Boundaries with Segmental Layer Generate new ew Convert to Full- TextGrid File with with Context format Additional Layer ayers 11 11

TextGrid Format: 12 12

teps to Generate Full-Conte ontext Labels: Extract Segmental & extGrid File Word Layer Apply Stress & Syllabification Rules Rules Align Syllable Boundaries with Segmental Layer Generate new ew Convert to Full- TextGrid File with with Context format Additional Layer ayers 13 13

extGrid Format with Add Additional Layers: 14 14

Context Clustering (Quest uestion Set) 1/2: Number of possible combinations are s are quite enormous with these 53 53 53 different contexts. 53 With only Segmental Context Possible sible models are: 66 5 ≈ 1252 mil million If we consider all the context, it will b If we consider all the context, it will b ill be practically infinite. ill be practically infinite. Solution: Solution: Solution: Solution: Record data having maximum phonem oneme coverage at tri-phone or di-phone level. Apply context clustering technique to e to classify and share acoustically similar models 15 15

Context Clustering (Quest uestion Set) 2/2: Phoneme � {preceding, current, succeeding} phone onemes Stress/Syllable/Word/ � # of phonemes at {preceding, current, s # of phonemes at {preceding, current, s nt, succeeding} syllable nt, succeeding} syllable � stress of {preceding, current, succeedin eding} syllable � Position of current syllable in current w nt word � # of syllables {from previous, to next} st stressed syllable � Vowel within current syllable � # of syllables in {preceding, current, suc , succeeding} word 16 16

Some Synthesized Examp mples: Seen Context Seen Context: : Training Set: Training Set: Seen Context Seen Context : : Training Set: Training Set: Un Un Un- Un - -seen - seen Context seen seen Context Context: Context : : : Different Carrier Word: Different Carrier Word: Different Carrier Word: Different Carrier Word: 17 17

Questio Questio stions ? stions ? 18 18

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing - PowerPoint PPT Presentation

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer Nawaz Research Officer (III) Speech Synthesis Overvie rview: Text to be Synthesized Natural Language Processing (NLP) (NLP) Speech

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Introduction to Hmm Introduction to Hmm Joe Wu Nov 4 th 2011 Agenda The applications of HMM.

Cell implementation HMM (HMM hidden Markov model) Authors: Jakub Hork Ji Hona

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Model (HMM) Sensor Markov assumption: P ( E t | X 0: t , E 1: t 1 ) = P ( E t | X

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models (HMM) Many slides from Michael Collins and HMMs Overview I The Tagging

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

arXiv:1508.01991v1 [cs.CL] 9 Aug 2015 els include LSTM networks, bidirectional layer on the

Data for Official Statistics Marco Puts, Piet Daas, Martijn Tennekes Road sensors Road sensor

MARKOV MODELING AND TRAFFIC FLOW MODELING FILTERS APPLIED IN EXISTING SIGNALING OF CELLULAR

The Hidden Stories Maria Wolters Reader in Design Informatics University of Edinburgh of

Temporal Models for Predicting Student Dropout in Massive Open Online Courses Fei Mi, Dit-Yan

Tampa Bay Water Piloting Utility Modeling Applications Alison Adams, Ph.D., P.E. Jeff Geurink,

Some computational and modeling issues for hierarchical models Andrew Gelman Dept of Statistics

Spatial Statistical Methods Paul Voss Carolina Population Center Odum Institute for Research in