CHiVE Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark
Modelling intonation in prosody
A conditional variational autoencoder captures the difgerent intonations
Language has a hierarchical linguistic structure Sentence Words sil hello sil Syllables sil h+e l+ou sil Phonemes sil h e l ou sil Frames
Add linguistic knowledge to the network
The structured model is betuer Baseline (30.7%) CHiVE (46.1%) No preference (23.2%)
Recommend
More recommend