Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 - PowerPoint PPT Presentation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Evaluating Speech Synthesis Evaluating Speech Synthesis  How good is the voice? How good is the voice?  This voice is a 45.67 This voice is a 45.67  Is voice X better than voice Y Is voice X better than voice Y  Why? Why?

Evaluation Evaluation  Objective measures Objective measures  Run a program and get a number Run a program and get a number  Subjective measures Subjective measures  Have human listeners extract a score Have human listeners extract a score  Do Object and Subjective scores correlate Do Object and Subjective scores correlate

Human Tests Human Tests  Synthesis people are warped Synthesis people are warped  The more you listen the better it becomes The more you listen the better it becomes  They hear things others don’t They hear things others don’t  Non-synthesis people are warped Non-synthesis people are warped  People very sensitive to listening conditions People very sensitive to listening conditions  What question do you ask What question do you ask  What hardware you play it on What hardware you play it on  There are (at least) two orthogonal scales There are (at least) two orthogonal scales  Understandability Understandability  Naturalness Naturalness

Standard Tests Standard Tests  DRT: diagnostic rhyme tests DRT: diagnostic rhyme tests  Test confusable phones Test confusable phones  “ “bat” vs “pat” bat” vs “pat”  Good for identifying phone errors Good for identifying phone errors  Sometimes in carrier sentences Sometimes in carrier sentences  Now we will say pat again. Now we will say pat again.  Unit selection Unit selection  Just include the standard works in the database Just include the standard works in the database

Standard Tests Standard Tests  SUS: Semantically unpredictable sentences SUS: Semantically unpredictable sentences  Det adj noun verb det adj noun Det adj noun verb det adj noun  Automatically filled in with low frequency words Automatically filled in with low frequency words  The parklike holders threw the vague vegetables The parklike holders threw the vague vegetables  The simplistic consonants swam the episcopal quartet The simplistic consonants swam the episcopal quartet  The dark geniuses woke the humane emptiness. The dark geniuses woke the humane emptiness.  The masterly serials withdrew the collaborative brochure The masterly serials withdrew the collaborative brochure  Test for understandability Test for understandability  Ask users to type in what they hear Ask users to type in what they hear  Good as discrimination Good as discrimination  Very hard for even fluent non-natives Very hard for even fluent non-natives

Standard tests Standard tests  MOS: mean opinion scores MOS: mean opinion scores  1-5 quality, naturalness, “like it” 1-5 quality, naturalness, “like it”  Take average score Take average score

Some experimental problems Some experimental problems  Order of presentation Order of presentation  Other aids change perception Other aids change perception  Showing the text makes it much easier Showing the text makes it much easier  Having a talking head “improves” the synthesis Having a talking head “improves” the synthesis  Hardware quality Hardware quality  Some voices better on the telephone Some voices better on the telephone  Loud speaker quality (headphone quality) Loud speaker quality (headphone quality)  Room acoustics Room acoustics  Volume Volume  Understandability Understandability  Harder if doing other task Harder if doing other task  Personal preference Personal preference  Voice is full understandable but “creepy” Voice is full understandable but “creepy”  Voice is incomprehensible but “funny” Voice is incomprehensible but “funny”  Sounds like my grade school teacher Sounds like my grade school teacher

TTS Evaluation TTS Evaluation  How good are your ears? How good are your ears?

SUS Sentences SUS Sentences  sus_00005 sus_00005  sus_00012 sus_00012  sus_00017 sus_00017  sus_00022 sus_00022

SUS Sentences SUS Sentences  The sorrowful premieres sang the The sorrowful premieres sang the ostentation gymnast ostentation gymnast  The temperamental gateways forgave the The temperamental gateways forgave the weatherbeaten finalist weatherbeaten finalist  The disruptive billboards blew the sugary The disruptive billboards blew the sugary endorsement endorsement  The serene adjustments foresaw the The serene adjustments foresaw the acceptable acquisition acceptable acquisition

TTS Evaluation TTS Evaluation

TTS Evaluation TTS Evaluation  In mud eels are, in mud none are In mud eels are, in mud none are  A 1918 state constitutional amendment A 1918 state constitutional amendment made Massachusetts one of 23 states made Massachusetts one of 23 states where citizens can enact laws by plebiscite. where citizens can enact laws by plebiscite.  Which is which Which is which  The numbers are 25 and 34. The numbers are 25 and 34.  The numbers 20 5 and 34. The numbers 20 5 and 34.  What is the temperature in Pittsburgh What is the temperature in Pittsburgh

Objective Synthesis Tests Objective Synthesis Tests  Text analysis Text analysis  How well do you cover NSWs How well do you cover NSWs  How well do you cover homographs How well do you cover homographs  Lexical coverage Lexical coverage  How often do you see a new word How often do you see a new word  Lexical correctness Lexical correctness  How correct are pronunciations How correct are pronunciations  For unseen words For unseen words  For seen words For seen words  Phonetic intelligibility Phonetic intelligibility  DRT tests DRT tests  Semantic intelligibility Semantic intelligibility  SUS tests SUS tests

Blizzard Challenge Blizzard Challenge  Annual Event from 2005 (15 years plus) Annual Event from 2005 (15 years plus)  Distribute large databases of speech Distribute large databases of speech  Participants Participants  Build a voice Build a voice  Synthesize a set of sentences Synthesize a set of sentences  Listeners Listeners  Listen and grade results Listen and grade results

Blizzard Challenge Blizzard Challenge 2005: US English synthesis, 4 voices, 1 hour each 2005: US English synthesis, 4 voices, 1 hour each  4 teams plus “Studio” (human speech) 4 teams plus “Studio” (human speech)  2006: US English: 1 voice: 6 hours and 1 hour 2006: US English: 1 voice: 6 hours and 1 hour  12 teams 12 teams  2007: US English: 1 voice: 9 hours and 1 hour 2007: US English: 1 voice: 9 hours and 1 hour  14 teams 14 teams  2008: UK English: 15 hours: Mandarin 5 hours 2008: UK English: 15 hours: Mandarin 5 hours  19 teams 19 teams  2009: UK English: 15 hours: Mandarin 5 hours 2009: UK English: 15 hours: Mandarin 5 hours  2010: UK English 18 hours: Mandarin 6 hours 2010: UK English 18 hours: Mandarin 6 hours  2010- Audio Books, Indian Languages, Speaking in Noise 2010- Audio Books, Indian Languages, Speaking in Noise  Split between industry and academia Split between industry and academia  Split between Asia, Europe, America (mostly Europe and Asia). Split between Asia, Europe, America (mostly Europe and Asia). 

Listeners Listeners  Three sets of listeners Three sets of listeners  Speech experts (participants) Speech experts (participants)  Paid undergrads (native speakers) Paid undergrads (native speakers)  Volunteers Volunteers  Types of tests Types of tests  MOS tests (1-5) MOS tests (1-5)  SUS tests SUS tests  DRT tests DRT tests  About 300 listeners in total About 300 listeners in total

Listening Listening  Web based Web based  So everyone did it in a different environment So everyone did it in a different environment  But we got access to more people But we got access to more people  Asked to do it in quiet office with headphone Asked to do it in quiet office with headphone  Could listen multiple times Could listen multiple times

Blizzard Challenge Results Blizzard Challenge Results  Speech Experts Speech Experts  Like synthesis better Like synthesis better  Understand synthesis better Understand synthesis better  Volunteers don’t always finish tests Volunteers don’t always finish tests  Undergrads sometimes finish tests Undergrads sometimes finish tests  (or put in filler answers) (or put in filler answers)  Results were correlated over different Results were correlated over different subgroups subgroups

Application Tests Application Tests  How does it work *in* the application How does it work *in* the application  With real application data With real application data  A good voice is not noticed A good voice is not noticed  Have *real* users evaluate it Have *real* users evaluate it  Give them a choice (even if artificial) Give them a choice (even if artificial)  CEO chooses the one they like! CEO chooses the one they like!

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 - PowerPoint PPT Presentation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Limited Benefit Health Insurance Plans Limited Benefit Health Insurance Plans For Individuals and

5G Berlin an Open 5G Test-Field for early 5G Prototyping Thomas Haustein, Fraunhofer HHI,

ONSEN Lab Tests and Development 2831 May 2017 21st DEPFET Ws., May 2017 ONSEN Lab Tests and

Dirty Tests Practise improving brittle, complicated, incomprehensible automated tests The case

MDCL Modulation Dependent Carrier Level Agenda Overview Theory What is it? Jeff

1:Hemophilia A carriers X 1 = log 10 (AHF activity) X 2 = log 10 (AHF-like antigen)

Cutting Edge Genetics Made Research funding, Natera Easy Consultant and advisory board

Total Cost of Care (TCOC) Workgroup September 25, 2019 Agenda Introductions & Updates 1.

Sambuz

Useful Links

Newsletter

Mail Us

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 - PowerPoint PPT Presentation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Limited Benefit Health Insurance Plans Limited Benefit Health Insurance Plans For Individuals and

5G Berlin an Open 5G Test-Field for early 5G Prototyping Thomas Haustein, Fraunhofer HHI,

ONSEN Lab Tests and Development 2831 May 2017 21st DEPFET Ws., May 2017 ONSEN Lab Tests and

Dirty Tests Practise improving brittle, complicated, incomprehensible automated tests The case

MDCL Modulation Dependent Carrier Level Agenda Overview Theory What is it? Jeff

1:Hemophilia A carriers X 1 = log 10 (AHF activity) X 2 = log 10 (AHF-like antigen)

Cutting Edge Genetics Made Research funding, Natera Easy Consultant and advisory board

Total Cost of Care (TCOC) Workgroup September 25, 2019 Agenda Introductions &amp; Updates 1.

Sambuz

Useful Links

Newsletter

Mail Us

Total Cost of Care (TCOC) Workgroup September 25, 2019 Agenda Introductions & Updates 1.