Analysis of the Voice Conversion Challenge 2016 Evaluation Results - PowerPoint PPT Presentation

Analysis of the Voice Conversion Challenge 2016 Evaluation Results Mirjam Wester, Zhizheng Wu & Junichi Yamagishi I V N E U R S E C I H T S Y T T O H R G F R E U D B I N

Voice Conversion Voice converted voices were evaluated in terms of naturalness and similarity. The questions we addressed were: 1. How natural does the voice converted voice sound? 2. How similar does the voice converted voice sound compared to the target speaker and to the source speaker?

Naturalness • How to make task do-able for listeners? • How to measure naturalness?

Amount of data… • 5 target and 5 source speakers -> 25 voices. • 17 participants + baseline: 20 * 18 = 450 voices ! • Reduced source-target (ST) pairs from 25 to 16 • 288 voices + 4 source + 4 target = 296 stimuli —> 50 minutes • It would take too long for a single listener to judge naturalness and similarity

Amount of data… • Instead of asking each listener to judge all ST pairs how about just one single ST pair? • In terms of time this would be an excellent solution. • However, each listener would then only encounter one gender condition and listeners needed to encounter the full range of gender conditions as ratings are context-sensitive.

Our solution… • Intermediate solution: each listener hears 8 source-target (ST) pairs • Two from each gender condition, to make the two sets as comparable as possible.

How to measure? • Standard MOS like Blizzard for naturalness • (1) totally unnatural to (5) completely natural • The subjects were instructed that the score should reflect their opinion of how natural or unnatural the sentence sounded

Listeners • Each set was rated by 100 subjects • Duration roughly 25 minutes • The order of stimuli was random • Each sentence selected at random with replacement from pool of 30 test sentences • Sentences > 5 sec or < 2 sec were removed for the listening tests (hence not 54 sentences)

Similarity • Judging how similar voices are on a scale from 1 to 5 may not be all that meaningful. • Judging how similar two voices are not part of everyday speech perception. • However, recognising speakers is something we do all the time. • —> Same/different paradigm

Similarity: exp set-up • Listeners were given pairs of stimuli and the instructions: • “Do you think these two samples could have been produced by the same speaker? Some of the samples may sound somewhat degraded/distorted. Please try to listen beyond the distortion and concentrate on identifying the voice. Are the two voices the same or different? You have the option to indicate how sure you are of your decision.”

Similarity: exp set-up • The scale for judging was: • Same: absolutely sure • Same: not sure • Different: not sure • Different: absolutely sure • VC stimuli compared to target speaker and to source speaker.

Similarity: exp set-up • Each listener was given three ST pairs to judge, one within-gender, one cross-gender and one at random ensuing all ST pairs were covered across listeners. • 200 listeners

Results • Naturalness -MOS

5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Score 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● S T N K J L O P G F A B Q E H D M I B_ C System

Set 1 5 4 3 2 1 S T N K J O L P G F Q B A E H D M I B_ C Set 2 5 4 3 2 1 S T N K J L O P G F A B Q E H D M I B_ C

Set 1 5 4 3 2 1 S T N K J O L P G F Q B A E H D M I B_ C Set 2 5 4 3 2 1 S T N K J L O P G F A B Q E H D M I B_ C Significance All S T N K J L O E H D M I B_ C P G F A B Q ST pairs Set 1 S T N K J O L E H D M I B_ P G F A B Q C Set 2 P G F A B Q S T N K J L O E H D M I B_ C

MM FF 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● S T N K O L J P A H F Q B G E D M B_ I C T S N J K L P Q G F B O H E A D M I B_ C MF FM 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● S T K G L O F J P N A E B D Q M H B_ C I S T O K N J G L B P F A Q E D M H I B_ C

MM FF 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● S T N K O L J P A H F Q B G E D M B_ I C T S N J K L P Q G F B O H E A D M I B_ C MF FM 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● S T K G L O F J P N A E B D Q M H B_ C I S T O K N J G L B P F A Q E D M H I B_ C I B_ C FF T S N J K L P Q G F B O H E A D M MM S T N K O L J P A H F Q B G E D M B_ I C FM S T O K N J G L B P F A Q E D M H I B_ C MF S T K G L O F J P N A E B D Q M H B_ C I

Results • Similarity: Same-Different

100 Target Different: Absolutely sure Different: Not sure 80 Same: Not sure Same: Absolutely sure 60 40 20 0 T J P G O L D A B K B_ Q M F H E I S N C 100 Source Different: absolutely sure Different: not sure 80 Same: not sure Same: absolutely sure 60 40 20 0 S H K N E I P T B Q D F B_ J O A L C G M J P D G A O L B B_ M Q K F I E H C N T S

VCC - evaluation • Such a large evaluation complex, compromises inevitable. • Two sets of source-target pairs for naturalness ratings not ideal. • Including comparisons to source as well as target was informative.

VCC data set • Database (training and test samples) • Participants’ submissions • Listening test materials • Available at: http:/ /dx.doi.org/10.7488/ds/1430 22

Analysis of the Voice Conversion Challenge 2016 Evaluation Results - PowerPoint PPT Presentation

Analysis of the Voice Conversion Challenge 2016 Evaluation Results Mirjam Wester, Zhizheng Wu & Junichi Yamagishi I V N E U R S E C I H T S Y T T O H R G F R E U D B I N Voice Conversion Voice converted voices

5 CONVERSION FUNCTIONS Data type conversion Implicit data type Explicit data type conversion

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Data$Conversion ADC$and$DAC (aka$A/D$&$D/A) 1 Embedded$System 2 Signal$Conversion$System

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

hadronic matter to quark matter Shock Induced Conversion Diffusion Induced Conversion Phys. Rev.

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

Welcome! Medicaid Operations Conversion Go-Live: August 1 , 2 0 1 5 2015 Medicaid Conversion 1

Conversion Plans Presented by: OPERS Employer Services 1 Agenda What is a Conversion Plan?

Closure conversion Expressing higher-order functions in first-order function languages Theory of

Digital Design Discussion: Numbers Binary to Decimal Conversion Decimal to Binary Conversion

PV1x Photovoltaic Energy Conversion Photovoltaic energy conversion PV1x Photovoltaic Energy

Pr Progr gram T am Trans ansforma,o rma,on f n for A r Aiding iding St Sta,c a,c A

Acknowledgement Frank Chen, Glenn Holloway, Dan Janni, Peter Mattson, Lifeng Nai, David

Practical Traffic Analysis Attacks on Secure Messaging Applications Alireza Bahramali, Ramin

Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter

in FPGA HLS to improve Maximum Frequency Licheng Guo, Jason Lau, Yuze Chi, Jie Wang, Cody Hao

The use of SMT in financial news sentiment analysis Thomas Dohmen SemLab SemLab founded in

REDACTED x x

An Analysis of Linux Scalability to Many Cores Silas Boyd-Wickizer, Austin T. Clements, Yandong

Sambuz

Useful Links

Newsletter

Mail Us

Analysis of the Voice Conversion Challenge 2016 Evaluation Results - PowerPoint PPT Presentation

Analysis of the Voice Conversion Challenge 2016 Evaluation Results Mirjam Wester, Zhizheng Wu & Junichi Yamagishi I V N E U R S E C I H T S Y T T O H R G F R E U D B I N Voice Conversion Voice converted voices

5 CONVERSION FUNCTIONS Data type conversion Implicit data type Explicit data type conversion

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Data$Conversion ADC$and$DAC (aka$A/D$&amp;$D/A) 1 Embedded$System 2 Signal$Conversion$System

VAST CHALLENGE 2017 Bianca Barnucz &amp; Stephanie Wegscheidl OVERVIEW VAST Challenge

hadronic matter to quark matter Shock Induced Conversion Diffusion Induced Conversion Phys. Rev.

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

Welcome! Medicaid Operations Conversion Go-Live: August 1 , 2 0 1 5 2015 Medicaid Conversion 1

Conversion Plans Presented by: OPERS Employer Services 1 Agenda What is a Conversion Plan?

Closure conversion Expressing higher-order functions in first-order function languages Theory of

Digital Design Discussion: Numbers Binary to Decimal Conversion Decimal to Binary Conversion

PV1x Photovoltaic Energy Conversion Photovoltaic energy conversion PV1x Photovoltaic Energy

Pr Progr gram T am Trans ansforma,o rma,on f n for A r Aiding iding St Sta,c a,c A

Acknowledgement Frank Chen, Glenn Holloway, Dan Janni, Peter Mattson, Lifeng Nai, David

Practical Traffic Analysis Attacks on Secure Messaging Applications Alireza Bahramali, Ramin

Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter

in FPGA HLS to improve Maximum Frequency Licheng Guo*, Jason Lau*, Yuze Chi, Jie Wang, Cody Hao

The use of SMT in financial news sentiment analysis Thomas Dohmen SemLab SemLab founded in

REDACTED x x

An Analysis of Linux Scalability to Many Cores Silas Boyd-Wickizer, Austin T. Clements, Yandong

Sambuz

Useful Links

Newsletter

Mail Us

Data$Conversion ADC$and$DAC (aka$A/D$&$D/A) 1 Embedded$System 2 Signal$Conversion$System

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

in FPGA HLS to improve Maximum Frequency Licheng Guo, Jason Lau, Yuze Chi, Jie Wang, Cody Hao