speech processing 15 492 18 492 18 492 492 speech
play

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing - PowerPoint PPT Presentation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody Speech Synthesis Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody


  1. Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

  2. Speech Synthesis Speech Synthesis Linguistic Analysis � Linguistic Analysis � � Pronunciations Pronunciations � � Prosody Prosody �

  3. Prosody Prosody How the phonemes will be said � How the phonemes will be said � Four aspects of prosody � Four aspects of prosody � � Phrasing: where the breaks will be Phrasing: where the breaks will be � � Intonation: pitch accents and F0 generation Intonation: pitch accents and F0 generation � � Duration: how long the phonemes will be Duration: how long the phonemes will be � � Power: energy in signal Power: energy in signal �

  4. Phrase Breaks Phrase Breaks � Need to take a breath Need to take a breath � � Need to chunk relevant parts together Need to chunk relevant parts together � � Sub Sub- -sentential sentential � � Supra Supra- -word word � � First approximation First approximation � � At punctuation (comma, semicolon, etc.) At punctuation (comma, semicolon, etc.) � � Too little Too little � � Second approximation Second approximation � � At each (or some) of the content/function words At each (or some) of the content/function words � � Too much Too much �

  5. Phrasing Phrasing Punctuation � Punctuation � � Next week, some inmates released early from Next week, some inmates released early from � the Hampton County jail in Springfield, will be the Hampton County jail in Springfield, will be wearing a wristband that hooks up to a special wearing a wristband that hooks up to a special jack on their home phones. jack on their home phones. Content/function words � Content/function words � � Next week || some inmates released early || Next week || some inmates released early || � from the Hampton County jail || in Springfield || from the Hampton County jail || in Springfield || will be wearing || a wristband || that hooks || up will be wearing || a wristband || that hooks || up with a special jack || on their home phones. with a special jack || on their home phones.

  6. Phrasing Phrasing � Bachenko Bachenko and Fitzpatrick 90 and Fitzpatrick 90 � � Rule driven with punctuation, POS and syntax Rule driven with punctuation, POS and syntax � � Balanced phrasing Balanced phrasing � � (the boy saw) (the girl in the park) (the boy saw) (the girl in the park) � � (the boy in the park) (saw the girl) (the boy in the park) (saw the girl) � � Hirschberg and Hirschberg and Prieto Prieto 94 94 � � CART trees (similar features) CART trees (similar features) � � Ostendorf Ostendorf and and Veilleux Veilleux 94 94 � � Hierarchical statistical model Hierarchical statistical model � � Multilevel breaks Multilevel breaks �

  7. Phrasing (Black and Taylor 97) Phrasing (Black and Taylor 97) Balance length of phrases � Balance length of phrases � � Predict probability of break with CART (use POS) Predict probability of break with CART (use POS) � � Use n Use n- -gram of B/NB to keep balance gram of B/NB to keep balance � Trained on BBC Radio 4 (NPR- -like) like) � Trained on BBC Radio 4 (NPR � � 31,707 words, 6,346 breaks 31,707 words, 6,346 breaks � � 91% correct with 6 91% correct with 6- -gram gram � � Still makes errors Still makes errors – – especially around especially around “ “I I” ” �

  8. Phrasing Phrasing What is correct? � What is correct? � � Lots of answers are correct. Lots of answers are correct. � � But some are definitely bad. But some are definitely bad. � Ostendorf and and Vielleux Vielleux 94 94 � Ostendorf � � Multiple people read same paragraphs Multiple people read same paragraphs � � If your method matches any single person If your method matches any single person’ ’s s � version it is correct. version it is correct.

  9. Intonation Intonation The fundamental tune � The fundamental tune � � Accents (highlighting important parts) Accents (highlighting important parts) � � F0 generation (the tune itself) F0 generation (the tune itself) �

  10. Intonation Contour Intonation Contour

  11. Intonation Information Intonation Information Large pitch range (female) � Large pitch range (female) � Authoritative since goes down at the end � Authoritative since goes down at the end � � News reader News reader � Emphasis for Finance H* � Emphasis for Finance H* � Final has a raise – – more information to more information to � Final has a raise � come come Female American newsreader from WBUR � Female American newsreader from WBUR � � (Boston University Public Radio) (Boston University Public Radio) �

  12. Intonation Examples Intonation Examples Fixed durations, flat F0. � Fixed durations, flat F0. � Declining F0 � Declining F0 � “hat hat” ” accents on stressed syllables accents on stressed syllables � “ � accents and end tones � accents and end tones � statistically trained � statistically trained �

  13. Intonational Phonology Phonology Intonational Accents and Boundaries � Accents and Boundaries � � Where are the important changes in F0? Where are the important changes in F0? � Accents on syllables � Accents on syllables � � Identifies Identifies “ “important important” ” words words �  It will be RAINY today in Boston It will be RAINY today in Boston   It will be rainy TODAY in Boston It will be rainy TODAY in Boston   It will BE rainy today IN Boston (strange) It will BE rainy today IN Boston (strange) 

  14. Where do the accents go? Where do the accents go? On important words � On important words � First approximation � First approximation � � On stressed syllables in content words On stressed syllables in content words �  It WILL be RAINY TODAY in BOSTON It WILL be RAINY TODAY in BOSTON  � About 80% correct on news reader speech About 80% correct on news reader speech � CART training on more features � CART training on more features � � Content, proper nouns, POS, position in text Content, proper nouns, POS, position in text � � (not semantic information) (not semantic information) �

  15. ToBI ToBI Tones and Break Indices � Tones and Break Indices � � A labeling for intonation (English) A labeling for intonation (English) � Different accent types � Different accent types � � H*, !H, L*, L+H* H*, !H, L*, L+H* � Different boundary types � Different boundary types � � L+L%, L+H%, H+H%, L+L%, L+H%, H+H%, �

  16. ToBI examples examples ToBI

  17. F0 Generation F0 Generation Contour from accents (and durations) � Contour from accents (and durations) � Piece together shapes of different accents � Piece together shapes of different accents � Generated � Generated � � By rule By rule � � Trained from data Trained from data �

  18. Using real contours Using real contours From a data base of different contours � From a data base of different contours � � Select most appropriate one Select most appropriate one � Record lots of different intonation examples � Record lots of different intonation examples � � He DID then KNOW what HAD occurred He DID then KNOW what HAD occurred � � TARZAN and JANE raised THEIR heads TARZAN and JANE raised THEIR heads � � … … � Label them and select the contours when � Label them and select the contours when � you want emphasis you want emphasis

  19. Emphasis Synthesis Emphasis Synthesis This is a short example � This is a short example � THIS is a short example � THIS is a short example � This IS a short example � This IS a short example � This is A short example � This is A short example � This is a SHORT example � This is a SHORT example � This is a short EXAMPLE � This is a short EXAMPLE �

  20. Duration Prediction Duration Prediction Each phone needs a duration � Each phone needs a duration � � Make it 80ms Make it 80ms � Vowels are typically longer than consonants � Vowels are typically longer than consonants � Emphasis/accent/stress lengthens them � Emphasis/accent/stress lengthens them � Initial and final phones are longer � Initial and final phones are longer �

  21. Prediction Models Prediction Models By rule � By rule � � Klatt Klatt rules rules � By training (using Klatt Klatt features) features) � By training (using � � CART / linear regression CART / linear regression � � Easy to get reasonable durations Easy to get reasonable durations � � Hard to get very good durations Hard to get very good durations �

  22. Fast and Slow Speech Fast and Slow Speech Speaking fast: not uniformly shorter durations � Speaking fast: not uniformly shorter durations � � Have less prosodic breaks Have less prosodic breaks � � Reduce syllables Reduce syllables � � Make consonants shorter Make consonants shorter � � Make vowels a little shorter Make vowels a little shorter � Speaking slow: not uniformly longer durations � Speaking slow: not uniformly longer durations � � Add more prosodic breaks Add more prosodic breaks � � Small increases in vowel duration (?) Small increases in vowel duration (?) �

Recommend


More recommend