hidden markov ov model hmm based s speech synthesis using
play

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing - PowerPoint PPT Presentation

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer Nawaz Research Officer (III) Speech Synthesis Overvie rview: Text to be Synthesized Natural Language Processing (NLP) (NLP) Speech


  1. Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer Nawaz Research Officer (III)

  2. Speech Synthesis Overvie rview: Text to be Synthesized Natural Language Processing (NLP) (NLP) Speech Synthesis Engine Synthesized Speech 2

  3. Introduction: Rule-based, formant synthesis � Hand-crafting each phonetic units by rule rules CORPUS-BASED: � Concatenative synthesis � High quality speech can be synthesized ized using waveform concatenation algorithms. concatenation algorithms. � To obtain various voices, a large amoun ount of speech data is necessary. � Statistical parametric synthesis � Generate speech parameters from stat statistical models � Voice quality can easily be changed by by transforming HMM parameters. 3

  4. Approaches at CLE: CORPUS-BASED: � Unit Selection � HMM based. Comparison of two Approaches: Unit Unit Selection Selection HMM based HMM based Unit Unit Selection Selection HMM based HMM based Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: High Quality at Waveform level • Sma Small Foot Print (Specific Domain) • Smo Smooth • Stab Stable Quality Disadvantages: Disadvantages: Disadvantages: Disadvantages: • Large footprints Vocode coder sound • Discontinuous (Domai main-independent) • Unstable quality 4

  5. Synthesis Model: ource Filter Model: Source excitation part Source Source Source Vocal tract Vocal tract Vocal tract Vocal tract tract tract tract tract resonance part Pulse train Excitation Linear ( n ( n ) ) e e time-invariant ant Speech Speech system system ( n ) h ( ) ( ) * ( ) = x n h n e n White noise � The h(n) is defined by the state output put vector of the HMM e.g mel-cepstrum 5

  6. General Overview(HTS): Training Part Training Part Training Part Training Part Speech Input Extract Spectrum, rum, F 0 , labels Labels Train Acoustic tic Models Stored Stored Models Parameter Synthesis Part Synthesis Part Synthesis Part Synthesis Part Text Input Generation Synthesis Synthesized Filter Speech 6

  7. Challenges: Generation of the full-context style la le labels. Addition of Stress/Syllable Layer. Defining the Question Set. Optimizing the Synthesized Quality. Optimizing the Synthesized Quality. 7

  8. Full-Context Label Style: P A K I S T_D A N A N Phoneme sequence T_D _D-A-N P-A-K Tri- Tri -phone context dependen phone context dependen ndent model ndent model Tri Tri - - phone context dependen phone context dependen ndent model ndent model Phoneme P A K I S T_D A N A N sequence x^P x^P- x^P x^P - - -A A A A+K +K +K +K= =I@x_x = = I@x_x/A … I@x_x I@x_x /A … /A … /A … S^T_D- S^T_D S^T_D S^T_D - -A - - - - - A+N= A A +N=x@x_x +N= +N= x@x_x x@x_x/A … x@x_x /A … /A … /A … Full Full- -context style context style context depe context depe ependent model ependent model Full Full - - context style context style context depe context depe ependent model ependent model 8

  9. Full-Context Format: x^x- SIL +A=L@1_0/A:0_0_0/B:0-0-0@1-0& &1-1#1-1$1-1!0-0;0- … x^SIL- A +L=I_I@1_1/A:0_0_0/B:0-0-1@1-2& &1-9#1-3$1-1!0-2;0- … SIL^A- L +I_I=A@1_2/A:0_0_1/B:0-0-2@2-1 1&2-8#1-3$1-1!0-1;0-0 … A^L- I_I +A=P@2_1/A:0_0_1/B:0-0-2@2-1& &2-8#1-3$1-1!0-1;0- … ۔۔۔ �����ا ��� 9 9

  10. Full-Context Format: SIL^A-L+I_I=A@ 1_2/A:0_0_1/B:0-0-2@2-1& 1&2-8#1-3$1-1!0-1; 0-0|I_I/C:1+0+2/D:0_0/E:co /E:content+2@1+5&1+ 4#0+1/F:content_2/G:0_0/ _0/H:9=5^1=2|NONE /I:8=6/J:17+11-2 Segmental Context Supra-Segmental Supr Context Context Segmental Segmental Segmental Segmental Supra Supra Supra Supra- - -Segmental - Segmental Segmental Segmental • Current Phoneme • Syl Syllable • Previous two Phonemes • Str Stress • Next two Phonemes • Wo Word • Ph Phrase • PO POS 10 10

  11. teps to Generate Full-Conte ontext Labels: Extract Segmental & extGrid File Word Layer Apply Stress & Syllabification Rules Rules Align Syllable Boundaries with Segmental Layer Generate new ew Convert to Full- TextGrid File with with Context format Additional Layer ayers 11 11

  12. TextGrid Format: 12 12

  13. teps to Generate Full-Conte ontext Labels: Extract Segmental & extGrid File Word Layer Apply Stress & Syllabification Rules Rules Align Syllable Boundaries with Segmental Layer Generate new ew Convert to Full- TextGrid File with with Context format Additional Layer ayers 13 13

  14. extGrid Format with Add Additional Layers: 14 14

  15. Context Clustering (Quest uestion Set) 1/2: Number of possible combinations are s are quite enormous with these 53 53 53 different contexts. 53 With only Segmental Context Possible sible models are: 66 5 ≈ 1252 mil million If we consider all the context, it will b If we consider all the context, it will b ill be practically infinite. ill be practically infinite. Solution: Solution: Solution: Solution: Record data having maximum phonem oneme coverage at tri-phone or di-phone level. Apply context clustering technique to e to classify and share acoustically similar models 15 15

  16. Context Clustering (Quest uestion Set) 2/2: Phoneme � {preceding, current, succeeding} phone onemes Stress/Syllable/Word/ � # of phonemes at {preceding, current, s # of phonemes at {preceding, current, s nt, succeeding} syllable nt, succeeding} syllable � stress of {preceding, current, succeedin eding} syllable � Position of current syllable in current w nt word � # of syllables {from previous, to next} st stressed syllable � Vowel within current syllable � # of syllables in {preceding, current, suc , succeeding} word 16 16

  17. Some Synthesized Examp mples: Seen Context Seen Context: : Training Set: Training Set: Seen Context Seen Context : : Training Set: Training Set: Un Un Un- Un - -seen - seen Context seen seen Context Context: Context : : : Different Carrier Word: Different Carrier Word: Different Carrier Word: Different Carrier Word: 17 17

  18. Questio Questio stions ? stions ? 18 18

Recommend


More recommend