A study of hypo- and hyper-articulated synthesized speech Mauro - PowerPoint PPT Presentation

A study of hypo- and hyper-articulated synthesized speech Mauro Nicolao Speech and Hearing Research Group - Department of Computer Science The University of Sheffield SCALE - Speech Communication with Adaptive Learning 2 nd Winter School, Aachen, February 15, 2011

Outline a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Speech Synthesis by Analysis Project Modifications of human speech: • Success in communication: • ‒ voice intensity increasing ‒ to produce an intelligible speech ‒ speech rate adjustments ‒ to satisfy listener’s needs ‒ noise rhythm adaptation ‒ to transfer a concept form talker’s to listener’s mind ‒ signal processing (i.e. Lombard effect) ‒ change of word vocabulary Lindblom (1990), Lane et al. (2007), Levelt et al. (1999) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Speech Synthesis by Analysis Project • Automatic TTS ignore environmental effects on speech and any feedback from listener. • Many researchers in different disciplines are investigating model to describe the human behaviour • New way of thinking automatic speech synthesis Moore (2007), Casserly and Pisoni (2010) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Complete project architecture FEEDFORWARD FEEDBACK SII Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

TTS prototype with control on speech quality • Control function: ‒ none • Synthesis: • Control actions: ‒ HTS + SAT synthesis ‒ Phoneme substitution ‒ STRAIGHT parameters ‒ MLLR transformation ‒ GV control ‒ GV gaussian model manipulation ‒ Dynamic feature weight control Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

TTS prototype with control on speech quality Hyper-articulated speech Hypo-articulated speech HTS-Demo speech Intelligible but unnatural Muttered but “friendly” • Aim: ‒ Manipulate HTS model parameters to shift the speech quality along this line ‒ Act on generation parameters ‒ Only acoustic model manipulation • Strategies ‒ Weighted MLLR transformation ‒ Global Variance model manipulation ‒ Dynamic- vs static-feature weight control in speech generation Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Weighted MLLR transformation Idea: hypo articulation can be obtained by reducing all the normally-articulated vowels to minimally articulated schwa. A CMLLR can be trained to perform this change. Ideally, the “opposite” CMLLR transformation should define a transformation from the standard to the hyper-articulated acoustic space T’1 T’2 T1 T2 Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Weighted MLLR transformation 1. Substituting in each vowel in generation label files with a schwa vowel, because this is the less articulated vowel amongst the others. 2. Generating a small corpus of hypo-articulated speech HTS-Demo Hypo speech examples (about 1100 utterances) speech 3. Training a CMLLR transformation from standard to hypo acoustic model. 4. New observation vectors (spectrum, F0 and duration) o � = Ao + b o: observation vector generated by standard model. A, b: parameters of transformation 5. This transformation can be weighted by using a scalar α. − I: Identity matrix 0: all-zero matrix o = ( α ∗ A + (1 − α ) I ) o + ( α ∗ b + (1 − α ) O ) ˆ 6. Ideally, the “opposite” CMLLR transformation should define a transformation from the standard to the hyper-articulated HTS-Demo Hyper speech acoustic space. speech 7. The inverse transformation has been computed: − ∗ − ∗ − ∗ o = ( α ∗ A + (1 − α ) I ) − 1 ˆ o − ( α ∗ A + (1 − α ) I ) − 1 ( α ∗ b + (1 − α ) O ) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Global Variance model manipulation Idea: to change Global-Variance model parameters either to reduce or to amplify the range of variations in the generated feature vectors. ‒ generation of c vectors with Global Variance term � P ( c | λ , λ ν ) = P ( Wc , Q | λ ) ω P ( ν ( c ) | λ ν ) Toda and Tokuda (2007) all Q ‒ Manipulation of GV model is the manipulation of the variance value range of observation vectors ‒ Scaling factors are used to control the transformation (none for F0) ‒ This allows for a increasing of variance but the mean of observation vector is still leading the feature generation Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Dynamic- vs static-feature weight control Idea: to give more importance to dynamic vs. static features in the speech generation process 1. By increasing (decreasing) the window weights in generation process, among the possible realizations of a phoneme it is chosen the one with the low (high) variations c = ( W T ˆ U − 1 W ) − 1 W T ˆ U − 1 ˆ µ 2. Different weight for each dynamic feature. Transformation defined by [α 1 α 2 α 3 ] vector       . . . . . . . . . . . . . . .       c t α 1 0 α 1 I α 1 0 c t − 1       · · · · · ·       = − α 2 I / 2 − α 2 I / 2 ∆ c t α 2 0 c t       · · · · · ·       ∆ 2 c t − 2 α 3 I α 3 I α 3 I c t +1       · · · · · ·       . . . . . . . . . . . . . . . � �� = o W c 3. α 1 usually set to 1 for F0 (pitch shifting) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Dynamic- vs static-feature weight control F1 0.1 0.1 0.1 0.463141502 0.463141502 0.463141502 1000 1000 1000 α 1 =1 α 2 =0.2 α 3 =0.2 Formant frequency (Hz) Formant frequency (Hz) Formant frequency (Hz) α 1 =1 α 2 =1 α 3 =1 α 1 =1 α 2 =10 α 3 =10 ae l ax s 0 0 0 0.1 0.1 0.1 0.4631 0.4631 0.4631 Time (s) Time (s) Time (s) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Audio examples Hyper-articulated speech Hypo-articulated speech HTS-Demo speech Vowel Reduction GV weight Dynamic control Dynamic + reduction Dynamic + reduction in noise GUI Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Outline a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) First realizations: a) TTS prototype with extended Speech Intelligibility Index (SII) feedback b) TTS prototype with control on speech quality (towards H&H) d) Next steps Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Next steps … • Add articulatory constraints • Find new parameters to control feature generation • Complete the control feedback by: ‒ defining an optimization function ‒ adding recognition function ‒ real-time reactions • Investigate formant synthesiser as possible vocoder • Add more generalization in the parameter generation process: ‒ Multiple phonetization activated by same word ‒ Bayesan synthesiser (ref. Zen, H.) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Thank you Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

A study of hypo- and hyper-articulated synthesized speech Mauro - PowerPoint PPT Presentation

A study of hypo- and hyper-articulated synthesized speech Mauro Nicolao Speech and Hearing Research Group - Department of Computer Science The University of Sheffield SCALE - Speech Communication with Adaptive Learning 2 nd Winter School,

Indirect Left Turns Study Indirect Left Turns Study Indirect Left Turns Study Indirect Left

Turners Falls Instream Flow Study Study Process Overview Study Plan Scoping Meeting April 16,

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

ITU ITU D Study Study Gr Groups oups Update on activities of ITU D Study Group 1 and 2

Study 109 Switch to Elvitegravir-Cobicistat-TAF-FTC Study 109: Design Study Design: Study 109

VI. The Feasibility Study VI. The Feasibility Study What is a feasibility study? What is a

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

Setting the Scene Study 1 PI/ Study 2 Study 3- II PIII CRO Integrated Study 2

MULTIMODAL CORRIDOR STUDY Public Meeting November 17 th , 2016 Agenda Study Area &

Expansion Study F Expansion Study For Oswego East High School Expansion Study F Expansion Study

CIGRE Study Committes A3 CIGRE Study Committes A3 CIGRE Study Committes A3 CIGRE Study Committes

1 Dr. Alaa Olama, Project consultant 1 1- Objectives of the Study 2- The Technical Study 3-

Study 112 Elvitegravir-Cobicistat-TAF-FTC in Renal Impairment Study 112: Design Study Design:

Study 1249 Elvitegravir-Cobicistat-TAF-FTC in HIV/HBV Coinfection Study 1249: Design Study

Why Study World Religions? Why Study World Religions? Why Study World Religions? Why Study World

Study 119 Simplification to EVG-COBI-TAF-FTC plus DRV Study 119: Design Study Design: Study 119

Title of Presentation Speaker Name, Title Watchara Kingkaew ; IT Specialist Microsoft

Arguing prosecutor error on appeal Prosecutor error the use of improper methods to attempt to

Hockey Canada Initiation Program Branch Logo Tom Renney, President and CEO 2 3 Hockey Canada

Economic Experts: how necessary are they? ACCC 2009 Regulatory Conference Simon Uthmeyer,

Gateways: Hyper-diverse, Established and Emerging Turnstiles of Human Settlement Marie Price

Implementing NVIDIA GRID with XenDesktop Technical Deep Dive Who are we? Garrett Taylor

Lessons Learnt from Running a Container Native Cloud Xu Wang (@gnawux) CTO & Cofounder,

Earnings Presentation Q2 2018 www.savola.com DISCLAIMER This presentation contains

A study of hypo- and hyper-articulated synthesized speech Mauro - PowerPoint PPT Presentation

A study of hypo- and hyper-articulated synthesized speech Mauro Nicolao Speech and Hearing Research Group - Department of Computer Science The University of Sheffield SCALE - Speech Communication with Adaptive Learning 2 nd Winter School,

Indirect Left Turns Study Indirect Left Turns Study Indirect Left Turns Study Indirect Left

Turners Falls Instream Flow Study Study Process Overview Study Plan Scoping Meeting April 16,

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

ITU ITU D Study Study Gr Groups oups Update on activities of ITU D Study Group 1 and 2

Study 109 Switch to Elvitegravir-Cobicistat-TAF-FTC Study 109: Design Study Design: Study 109

VI. The Feasibility Study VI. The Feasibility Study What is a feasibility study? What is a

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

Setting the Scene Study 1 PI/ Study 2 Study 3- II PIII CRO Integrated Study 2

MULTIMODAL CORRIDOR STUDY Public Meeting November 17 th , 2016 Agenda Study Area &amp;

Expansion Study F Expansion Study For Oswego East High School Expansion Study F Expansion Study

CIGRE Study Committes A3 CIGRE Study Committes A3 CIGRE Study Committes A3 CIGRE Study Committes

1 Dr. Alaa Olama, Project consultant 1 1- Objectives of the Study 2- The Technical Study 3-

Study 112 Elvitegravir-Cobicistat-TAF-FTC in Renal Impairment Study 112: Design Study Design:

Study 1249 Elvitegravir-Cobicistat-TAF-FTC in HIV/HBV Coinfection Study 1249: Design Study

Why Study World Religions? Why Study World Religions? Why Study World Religions? Why Study World

Study 119 Simplification to EVG-COBI-TAF-FTC plus DRV Study 119: Design Study Design: Study 119

Title of Presentation Speaker Name, Title Watchara Kingkaew ; IT Specialist Microsoft

Arguing prosecutor error on appeal Prosecutor error the use of improper methods to attempt to

Hockey Canada Initiation Program Branch Logo Tom Renney, President and CEO 2 3 Hockey Canada

Economic Experts: how necessary are they? ACCC 2009 Regulatory Conference Simon Uthmeyer,

Gateways: Hyper-diverse, Established and Emerging Turnstiles of Human Settlement Marie Price

Implementing NVIDIA GRID with XenDesktop Technical Deep Dive Who are we? Garrett Taylor

Lessons Learnt from Running a Container Native Cloud Xu Wang (@gnawux) CTO &amp; Cofounder,

Earnings Presentation Q2 2018 www.savola.com DISCLAIMER This presentation contains

MULTIMODAL CORRIDOR STUDY Public Meeting November 17 th , 2016 Agenda Study Area &

Lessons Learnt from Running a Container Native Cloud Xu Wang (@gnawux) CTO & Cofounder,