Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody
Speech Synthesis Speech Synthesis Linguistic Analysis � Linguistic Analysis � � Pronunciations Pronunciations � � Prosody Prosody �
Prosody Prosody How the phonemes will be said � How the phonemes will be said � Four aspects of prosody � Four aspects of prosody � � Phrasing: where the breaks will be Phrasing: where the breaks will be � � Intonation: pitch accents and F0 generation Intonation: pitch accents and F0 generation � � Duration: how long the phonemes will be Duration: how long the phonemes will be � � Power: energy in signal Power: energy in signal �
Phrase Breaks Phrase Breaks � Need to take a breath Need to take a breath � � Need to chunk relevant parts together Need to chunk relevant parts together � � Sub Sub- -sentential sentential � � Supra Supra- -word word � � First approximation First approximation � � At punctuation (comma, semicolon, etc.) At punctuation (comma, semicolon, etc.) � � Too little Too little � � Second approximation Second approximation � � At each (or some) of the content/function words At each (or some) of the content/function words � � Too much Too much �
Phrasing Phrasing Punctuation � Punctuation � � Next week, some inmates released early from Next week, some inmates released early from � the Hampton County jail in Springfield, will be the Hampton County jail in Springfield, will be wearing a wristband that hooks up to a special wearing a wristband that hooks up to a special jack on their home phones. jack on their home phones. Content/function words � Content/function words � � Next week || some inmates released early || Next week || some inmates released early || � from the Hampton County jail || in Springfield || from the Hampton County jail || in Springfield || will be wearing || a wristband || that hooks || up will be wearing || a wristband || that hooks || up with a special jack || on their home phones. with a special jack || on their home phones.
Phrasing Phrasing � Bachenko Bachenko and Fitzpatrick 90 and Fitzpatrick 90 � � Rule driven with punctuation, POS and syntax Rule driven with punctuation, POS and syntax � � Balanced phrasing Balanced phrasing � � (the boy saw) (the girl in the park) (the boy saw) (the girl in the park) � � (the boy in the park) (saw the girl) (the boy in the park) (saw the girl) � � Hirschberg and Hirschberg and Prieto Prieto 94 94 � � CART trees (similar features) CART trees (similar features) � � Ostendorf Ostendorf and and Veilleux Veilleux 94 94 � � Hierarchical statistical model Hierarchical statistical model � � Multilevel breaks Multilevel breaks �
Phrasing (Black and Taylor 97) Phrasing (Black and Taylor 97) Balance length of phrases � Balance length of phrases � � Predict probability of break with CART (use POS) Predict probability of break with CART (use POS) � � Use n Use n- -gram of B/NB to keep balance gram of B/NB to keep balance � Trained on BBC Radio 4 (NPR- -like) like) � Trained on BBC Radio 4 (NPR � � 31,707 words, 6,346 breaks 31,707 words, 6,346 breaks � � 91% correct with 6 91% correct with 6- -gram gram � � Still makes errors Still makes errors – – especially around especially around “ “I I” ” �
Phrasing Phrasing What is correct? � What is correct? � � Lots of answers are correct. Lots of answers are correct. � � But some are definitely bad. But some are definitely bad. � Ostendorf and and Vielleux Vielleux 94 94 � Ostendorf � � Multiple people read same paragraphs Multiple people read same paragraphs � � If your method matches any single person If your method matches any single person’ ’s s � version it is correct. version it is correct.
Intonation Intonation The fundamental tune � The fundamental tune � � Accents (highlighting important parts) Accents (highlighting important parts) � � F0 generation (the tune itself) F0 generation (the tune itself) �
Intonation Contour Intonation Contour
Intonation Information Intonation Information Large pitch range (female) � Large pitch range (female) � Authoritative since goes down at the end � Authoritative since goes down at the end � � News reader News reader � Emphasis for Finance H* � Emphasis for Finance H* � Final has a raise – – more information to more information to � Final has a raise � come come Female American newsreader from WBUR � Female American newsreader from WBUR � � (Boston University Public Radio) (Boston University Public Radio) �
Intonation Examples Intonation Examples Fixed durations, flat F0. � Fixed durations, flat F0. � Declining F0 � Declining F0 � “hat hat” ” accents on stressed syllables accents on stressed syllables � “ � accents and end tones � accents and end tones � statistically trained � statistically trained �
Intonational Phonology Phonology Intonational Accents and Boundaries � Accents and Boundaries � � Where are the important changes in F0? Where are the important changes in F0? � Accents on syllables � Accents on syllables � � Identifies Identifies “ “important important” ” words words � It will be RAINY today in Boston It will be RAINY today in Boston It will be rainy TODAY in Boston It will be rainy TODAY in Boston It will BE rainy today IN Boston (strange) It will BE rainy today IN Boston (strange)
Where do the accents go? Where do the accents go? On important words � On important words � First approximation � First approximation � � On stressed syllables in content words On stressed syllables in content words � It WILL be RAINY TODAY in BOSTON It WILL be RAINY TODAY in BOSTON � About 80% correct on news reader speech About 80% correct on news reader speech � CART training on more features � CART training on more features � � Content, proper nouns, POS, position in text Content, proper nouns, POS, position in text � � (not semantic information) (not semantic information) �
ToBI ToBI Tones and Break Indices � Tones and Break Indices � � A labeling for intonation (English) A labeling for intonation (English) � Different accent types � Different accent types � � H*, !H, L*, L+H* H*, !H, L*, L+H* � Different boundary types � Different boundary types � � L+L%, L+H%, H+H%, L+L%, L+H%, H+H%, �
ToBI examples examples ToBI
F0 Generation F0 Generation Contour from accents (and durations) � Contour from accents (and durations) � Piece together shapes of different accents � Piece together shapes of different accents � Generated � Generated � � By rule By rule � � Trained from data Trained from data �
Using real contours Using real contours From a data base of different contours � From a data base of different contours � � Select most appropriate one Select most appropriate one � Record lots of different intonation examples � Record lots of different intonation examples � � He DID then KNOW what HAD occurred He DID then KNOW what HAD occurred � � TARZAN and JANE raised THEIR heads TARZAN and JANE raised THEIR heads � � … … � Label them and select the contours when � Label them and select the contours when � you want emphasis you want emphasis
Emphasis Synthesis Emphasis Synthesis This is a short example � This is a short example � THIS is a short example � THIS is a short example � This IS a short example � This IS a short example � This is A short example � This is A short example � This is a SHORT example � This is a SHORT example � This is a short EXAMPLE � This is a short EXAMPLE �
Duration Prediction Duration Prediction Each phone needs a duration � Each phone needs a duration � � Make it 80ms Make it 80ms � Vowels are typically longer than consonants � Vowels are typically longer than consonants � Emphasis/accent/stress lengthens them � Emphasis/accent/stress lengthens them � Initial and final phones are longer � Initial and final phones are longer �
Prediction Models Prediction Models By rule � By rule � � Klatt Klatt rules rules � By training (using Klatt Klatt features) features) � By training (using � � CART / linear regression CART / linear regression � � Easy to get reasonable durations Easy to get reasonable durations � � Hard to get very good durations Hard to get very good durations �
Fast and Slow Speech Fast and Slow Speech Speaking fast: not uniformly shorter durations � Speaking fast: not uniformly shorter durations � � Have less prosodic breaks Have less prosodic breaks � � Reduce syllables Reduce syllables � � Make consonants shorter Make consonants shorter � � Make vowels a little shorter Make vowels a little shorter � Speaking slow: not uniformly longer durations � Speaking slow: not uniformly longer durations � � Add more prosodic breaks Add more prosodic breaks � � Small increases in vowel duration (?) Small increases in vowel duration (?) �
Recommend
More recommend