speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody Part of Speech Tagging


  1. Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules

  2. Speech Synthesis Linguistic Analysis � Linguistic Analysis � � Pronunciations Pronunciations � � Prosody Prosody �

  3. Part of Speech Tagging Find the most likely tag for each word � Find the most likely tag for each word � � Most words only have one tag (92% correct) Most words only have one tag (92% correct) � Context often defines tag type � Context often defines tag type � � “The project” “The project” vs vs “To project” “To project” � Use HMM Part of Speech tagger tagger � Use HMM Part of Speech � � But need data to train it (English But need data to train it (English � PennTreeBank) ) PennTreeBank

  4. Poor Man’s PoS Tagger Hand list “function” word types � Hand list “function” word types � � (determiners a an the this) (determiners a an the this) � � (conjunctions and or but) (conjunctions and or but) � � (pp in on to) (pp in on to) � � (content everything else) (content everything else) � Better than nothing � Better than nothing � � Easy to do on new languages Easy to do on new languages �

  5. Pronunciation Lexicon � List of words and their pronunciation List of words and their pronunciation � � (“pencil” n (p eh1 n s (“pencil” n (p eh1 n s ih ih l)) l)) � � (“table” n (t ey1 b ax l)) (“table” n (t ey1 b ax l)) � � Need the right phoneme set Need the right phoneme set � � Need other information Need other information � � Part of speech Part of speech � � Lexical stress Lexical stress � � Other information (Tone, Lexical accent …) Other information (Tone, Lexical accent …) � � Syllable boundaries Syllable boundaries �

  6. Homograph Representation Must distinguish different pronunciations � Must distinguish different pronunciations � � (“project” n (p r aa1 (“project” n (p r aa1 jh jh eh k t)) eh k t)) � � (“project” v (p r ax (“project” v (p r ax jh jh eh1 k t)) eh1 k t)) � � (“bass” (“bass” n_music n_music (b ey1 s)) (b ey1 s)) � � (“bass” (“bass” n_fish n_fish (b ae1 s)) (b ae1 s)) � ASR multiple pronunciations � ASR multiple pronunciations � � (“route” n (r (“route” n (r uw uw t)) t)) � � (“route(2)” n (r aw t)) (“route(2)” n (r aw t)) �

  7. Pronunciation of Unknown Words How do you pronounce new words � How do you pronounce new words � 4% of tokens (in news) are new � 4% of tokens (in news) are new � You can’t synthesis them without � You can’t synthesis them without � pronunciations pronunciations You can’t recognize them without � You can’t recognize them without � pronunciations pronunciations Letter- -to to- -Sounds rules Sounds rules � Letter � Grapheme- -to to- -Phoneme rules Phoneme rules � Grapheme �

  8. LTS: Hand written Hand written rules � Hand written rules � � [ [LeftContext LeftContext] X [ ] X [RightContext RightContext] ] - -> Y > Y � � e.g. e.g. � � c [h r] c [h r] - -> k > k � � c [h] c [h] - -> > ch ch � � c [i] c [i] - -> s > s � � c c - -> k > k �

  9. LTS: Machine Learning Techniques Need an existing lexicon � Need an existing lexicon � � Pronunciations: words and phones Pronunciations: words and phones � � But different number of letters and phones But different number of letters and phones � Need an alignment � Need an alignment � � Between letters and phones Between letters and phones � � checked checked - -> > ch ch eh k t eh k t �

  10. LTS: alignment checked - -> > ch ch eh k t eh k t � checked � c h e c k e d c h e c k e d ch _ eh k _ _ t ch _ eh k _ _ t Some letters go to nothing � Some letters go to nothing � Some letters go to two phones � Some letters go to two phones � � box box - -> b > b aa aa k k- -s s � � table table - -> t > t ey ey b ax b ax- -l l - - �

  11. Find alignment automatically � Epsilon scattering Epsilon scattering � � Find all possible alignments Find all possible alignments � � Estimate Estimate p(L,P p(L,P) on each alignment ) on each alignment � � Find most probable alignment Find most probable alignment � � Hand seed Hand seed � � Hand specify allowable pairs Hand specify allowable pairs � � Estimate Estimate p(L,P p(L,P) on each possible alignment ) on each possible alignment � � Find most probable alignment Find most probable alignment � � Statistical Machine Translation (IBM model 1) Statistical Machine Translation (IBM model 1) � � Estimate Estimate p(L,P p(L,P) on each possible alignment ) on each possible alignment � � Find most probable alignment Find most probable alignment �

  12. Not everything aligns 0, 1, and 2 letter cases � 0, 1, and 2 letter cases � � e e - -> epsilon “moved” > epsilon “moved” � � x x - -> > k k- -s s, , g g- -z z “box” “example” “box” “example” � � e e - -> > y y- -uw uw “askew” “askew” � Some alignments aren’t sensible � Some alignments aren’t sensible � � dept dept - -> d > d ih ih p p aa aa r t m ax n t r t m ax n t � � cmu cmu - -> s > s iy iy eh m y eh m y uw uw �

  13. Training LTS models Use CART trees � Use CART trees � � One model for each letter One model for each letter � Predict phone (epsilon, phone, dual phone) � Predict phone (epsilon, phone, dual phone) � � From letter 3 From letter 3- -context (and POS) context (and POS) � # # # c h e c - -> > ch ch � # # # c h e c � # # c h e c k - -> _ > _ � # # c h e c k � # c h e c k e - -> eh > eh � # c h e c k e � c h e c k e d - -> k > k � c h e c k e d �

  14. LTS results Split lexicon into train/test 90%/10% � Split lexicon into train/test 90%/10% � � i.e. every tenth entry is extracted for testing i.e. every tenth entry is extracted for testing � Lexicon Letter Acc Word Acc Lexicon Letter Acc Word Acc OALD 95.80% 75.56% OALD 95.80% 75.56% CMUDICT 91.99% 57.80% CMUDICT 91.99% 57.80% BRULEX 99.00% 93.03% BRULEX 99.00% 93.03% DE- -CELEX CELEX 98.79% 89.38% DE 98.79% 89.38% Thai 95.60% 68.76% Thai 95.60% 68.76%

  15. Example Tree

  16. But we need more than phones What about lexical stress � What about lexical stress � � p r aa1 j eh k t p r aa1 j eh k t - -> p r > p r aa aa j eh1 k t j eh1 k t � Two possibilities � Two possibilities � � A separate prediction model A separate prediction model � � Join model Join model – – introduce eh/eh1 (BETTER) introduce eh/eh1 (BETTER) � LTP+S LTPS LTP+S LTPS L no S 96.36% 96.27% L no S 96.36% 96.27% Letter --- 95.80% Letter --- 95.80% W no S 76.92% 74.69% W no S 76.92% 74.69% Word 63.68% 74.56% Word 63.68% 74.56%

  17. Does it really work 40K words from Time Magazine � 40K words from Time Magazine � � 1775 (4.6%) not in OALD 1775 (4.6%) not in OALD � � LTS gets 70% correct (test set was 74%) LTS gets 70% correct (test set was 74%) � Occurs % Occurs % Names 1360 76.6 Names 1360 76.6 Unknown 351 19.8 Unknown 351 19.8 US Spelling 57 3.2 US Spelling 57 3.2 Typos 7 0.4 Typos 7 0.4

  18. Dialect Lexicons � Need different lexicons for different dialects Need different lexicons for different dialects � � US, UK, Indian, Australia, Europeans US, UK, Indian, Australia, Europeans � � Build dialect independent lexicons Build dialect independent lexicons � � Dialect independent vowels (“key Dialect independent vowels (“key- -vowels”) vowels”) � coffee and conference The vowel in coffee and conference  The vowel in   Map to Map to aa aa in US, and o in the UK in US, and o in the UK  � Post Post- -vocalic r in UK English vocalic r in UK English �  Car Car - -> k > k aa aa  � Specific words Specific words �  Leisure, route, tortoise, poem Leisure, route, tortoise, poem 

  19. Post-lexical Rules Sometime you need context � Sometime you need context � “the” as dh ax or dh iy iy � “the” as dh ax or dh � � The banana and The apple The banana and The apple � R- -insertion in UK English insertion in UK English � R � � Car door Car door vs vs car alarm car alarm � Liaison in French � Liaison in French � � Petit Petit vs vs Petit Petit ami ami �

  20. Summary Linguistic analysis � Linguistic analysis � � Part of speech tagging Part of speech tagging � � Pronunciation Pronunciation �  Phones, stress, (syllables) Phones, stress, (syllables)   Letter to sound rules Letter to sound rules  � Post lexical rules Post lexical rules �

Recommend


More recommend