tts and data selection improving systems for low resource
play

TTS and Data Selection: Improving Systems for Low-Resource Languages - PowerPoint PPT Presentation

TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015 outline I. Project II. Approach III. Methods IV. Status V. Future I. Project synthesize natural, intelligible voices for low resource languages


  1. TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015

  2. outline I. Project II. Approach III. Methods IV. Status V. Future

  3. I. Project synthesize natural, intelligible voices for low resource languages using data selection

  4. motivation ▷ bridge the gap

  5. motivation ▷ bridge the gap ▷ allow for cross-language communication

  6. why data selection?

  7. HRLs vs. LRLs prepared data found data ★ ★ abundance of limited training ★ ★ training material material high quality speech low quality speech systems systems

  8. A. filter out unwanted data from training set

  9. A. filter out unwanted data from training set B. supplement limited LRL data with choice data from similar HRL

  10. II. APPROACH preparing the experiment

  11. corpus ▷ Boston Radio News Corpus ▷ pre-processed ▷ English

  12. extract features data selection process sort values create subsets synthesize data

  13. evaluate.

  14. evaluate. compare/contrast voices

  15. example VOICE 1 VOICE 2

  16. solution 1. subset data 2. complete dataset

  17. III. METHODS testing our hypothesis

  18. standards ★ follow standard procedures for evaluating TTS voices

  19. standards ★ follow standard procedures for evaluating TTS voices ★ successful voice = intelligible + natural

  20. standards ★ follow standard procedures for evaluating TTS voices ★ successful voice = intelligible + natural ★ use crowdsourcing for unbiased results

  21. mechanical turk Intelligibility transcribe nonsense sentences ➔ accurate transcription = intelligible voice ➔

  22. mechanical turk Intelligibility transcribe nonsense sentences ➔ accurate transcription = intelligible voice ➔ Naturalness use Likert scale to rate voices from very unnatural to very natural ➔ identify the voices are categorized as natural+ ➔

  23. IV. STATUS our current state

  24. intelligibility HIT ✓ create subsets

  25. intelligibility HIT ✓ create subsets ✓ synthesize voices with this data

  26. intelligibility HIT ✓ create subsets ✓ synthesize voices with this data ✓ design and implement HIT

  27. intelligibility HIT ✓ create subsets ✓ synthesize voices with this data ✓ design and implement HIT ✓ publish on MTurk site

  28. intelligibility HIT ✓ create subsets ✓ synthesize voices with this data ✓ design and implement HIT ✓ publish on MTurk site ✓ workers complete HITs

  29. intelligibility HIT ✓ created subsets ✓ synthesized voices with this data ✓ design and implement HIT ✓ publish on MTurk site ✓ workers complete HITs ✓ accept/reject work

  30. naturalness HIT ✓ create subsets

  31. naturalness HIT ✓ create subsets ✓ synthesize voices with this data

  32. naturalness HIT ✓ create subsets ✓ synthesize voices with this data ✓ design and implement HIT

  33. naturalness HIT ✓ create subsets ✓ synthesize voices with this data ✓ design and implement HIT - publish on MTurk site - workers complete HITs - accept/reject work

  34. V. FUTURE further exploration of this research

  35. evaluation analyze mechanical turk responses

  36. evaluation analyze mechanical turk responses low-resource implement data selection for LRLs

  37. evaluation analyze mechanical turk responses low-resource implement data selection for LRLs text apply similar methods to automatically select text data

  38. Thanks! Any questions?

Recommend


More recommend