S2S ASR Advanced issues Tight coupling Tight coupling ASR should - PowerPoint PPT Presentation

S2S ASR Advanced issues � Tight coupling Tight coupling � � ASR should output N ASR should output N- -best best � � Translated all (lattice) Translated all (lattice) � � Choose best translation Choose best translation � � (MT as a LM for ASR) (MT as a LM for ASR) � � Remove Remove disfluencies/hestitations disfluencies/hestitations � � Add more relevant data Add more relevant data � � Automatically convert past tense/third person data to Automatically convert past tense/third person data to � present tense/first+second first+second person … person … present tense/

S2S TTS Advance Issues MT output isn’t gramtical gramtical � MT output isn’t � � TTS doesn’t care and just says it TTS doesn’t care and just says it � � TTS should try to say MT output with more TTS should try to say MT output with more � breaks. breaks. TTS (unit selection) � TTS (unit selection) � � As a LM on MT output As a LM on MT output � � Choose the best translation on what is said best Choose the best translation on what is said best �

Speech Processing 15-492/18-492 Voice Conversion

Voice Conversion � Live (or offline) Live (or offline) � � Convert an existing voice to another Convert an existing voice to another � � Use only a small amount of target speech Use only a small amount of target speech � � Uses: Uses: � � Synthesis without collecting lots of data Synthesis without collecting lots of data � � Disguising voices Disguising voices � � Emotional voices without full synthesis support Emotional voices without full synthesis support � � Also called Also called � � Voice transformation, Voice morphing Voice transformation, Voice morphing �

Voice Identity What makes a voice identity � What makes a voice identity � � Lexical Choice: Lexical Choice: �  Woo Woo- -hoo hoo, ,   I pity the fool … I pity the fool …  � Phonetic choice Phonetic choice � � Intonation and duration Intonation and duration � � Spectral qualities (vocal tract shape) Spectral qualities (vocal tract shape) � � Excitation Excitation �

Voice Conversion techniques Full ASR and TTS � Full ASR and TTS � � Much too hard to do reliably Much too hard to do reliably � Codebook transformation � Codebook transformation � � ASR HMM state to HMM state transformation ASR HMM state to HMM state transformation � GMM based transformation � GMM based transformation � � Build a mapping function between frames Build a mapping function between frames �

Learning VC models First need to get parallel speech � First need to get parallel speech � � Source and Target say same thing Source and Target say same thing � � Use DTW to align (in the spectral domain) Use DTW to align (in the spectral domain) � � Trying to learn a functional mapping Trying to learn a functional mapping � � 20 20- -50 utterances 50 utterances � “Text- -independent” VC independent” VC � “Text � � Means no parallel speech available Means no parallel speech available � � Use some form of synthesis to generate it Use some form of synthesis to generate it �

VC Training process Extract F0, power and MFCC from source � Extract F0, power and MFCC from source � and target utterances and target utterances DTW align source and target � DTW align source and target � Loop until convergence � Loop until convergence � � Build GMM to map between source/target Build GMM to map between source/target � � DTW source/target using GMM mapping DTW source/target using GMM mapping �

VC Training process

VC Run-time

Voice Transformation - Festvox Festvox GMM transformation suite (Toda) GMM transformation suite (Toda) - awb bdl bdl jmk slt awb jmk slt awb awb bdl bdl jmk jmk slt slt

VC in Synthesis Can be used as a post filter in synthesis � Can be used as a post filter in synthesis � � Build Build kal_diphone kal_diphone to target VC to target VC � � Use on all output of Use on all output of kal_diphone kal_diphone � Can be used to convert a full DB � Can be used to convert a full DB � � Convert a full db and rebuild a voice Convert a full db and rebuild a voice �

Style/Emotion Conversion Unit Selection (or SPS) � Unit Selection (or SPS) � � Require lots of data in desired style/emotion Require lots of data in desired style/emotion � VC technique � VC technique � � Use as filter to main voice (same speaker) Use as filter to main voice (same speaker) � � Convert neutral to angry, sad, happy … Convert neutral to angry, sad, happy … �

Can you say that again? Voice conversion for speaking in noise � Voice conversion for speaking in noise � Different quality when you repeat things � Different quality when you repeat things � Different quality when you speak in noise � Different quality when you speak in noise � � Lombard effect (when very loud) Lombard effect (when very loud) � � “Speech “Speech- -in in- -noise” in regular noise noise” in regular noise �

Speaking in Noise (Langner) � Collect data Collect data � � Randomly play noise in person’s ears Randomly play noise in person’s ears � � Normal Normal � � In Noise In Noise � � Collect 500 of each type Collect 500 of each type � � Build VC model Build VC model � � Normal Normal - -> in > in- -Noise Noise � � Actually Actually � � Spectral, duration, f0 and power differences Spectral, duration, f0 and power differences �

Synthesis in Noise � For bus information task For bus information task � � Play different synthesis information Play different synthesis information utts utts � � With SIN synthesizer With SIN synthesizer � � With SWN synthesizer With SWN synthesizer � � With VC (SWN With VC (SWN- ->SIN) synthesizer >SIN) synthesizer � � Measure their understanding Measure their understanding � � SIN synthesizer better (in Noise) SIN synthesizer better (in Noise) � � SIN synthesizer better (without Noise for elderly) SIN synthesizer better (without Noise for elderly) �

Transterpolation Incrementally transform a voice X% � Incrementally transform a voice X% � � BDL BDL- -SLT by 10% SLT by 10% � � SLT SLT- -BDL by 10% BDL by 10% � Count when you think it changes from M- -F F � Count when you think it changes from M � Fun but what are the uses … � Fun but what are the uses … �

De-identification Remove speaker identity � Remove speaker identity � � But keep it still human like But keep it still human like � Health Records � Health Records � � HIPAA laws require this HIPAA laws require this � � Not just removing names and Not just removing names and SSNs SSNs � Use Voice conversion to get “new” voices � Use Voice conversion to get “new” voices �

VC and SPS Becoming closely related � Becoming closely related � � Small amount of target speaker Small amount of target speaker � � Use larger background models Use larger background models �

Cross Lingual Voice Conversion Use phonetic mapping synthesis � Use phonetic mapping synthesis � � Sounds like very accented speech Sounds like very accented speech � Use VC to convert the output � Use VC to convert the output � � Require only small amount of target language Require only small amount of target language �

S2S ASR Advanced issues Tight coupling Tight coupling ASR should - PowerPoint PPT Presentation

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should output N- -best best Translated all (lattice) Translated all (lattice) Choose best translation Choose best translation

Sandymount to Seapoint Corridor Study S2S Supporters Meeting November 15 th 2016 Michael Collins

CPSC 503 - Intro to E2E ASR Peter Sullivan - April 24th 2020 Lecture Overview Intro to ASR

Speech Processing 15-492/18-492 Speech Recognition Systems Other ASR techniques ASR Systems

S2S Cycleway & Footway Interim Works (Bull Road to Causeway Road) Presentation to Elected

Dialog in NLP applica.ons VELJKO MILJANIC Overview Applica(ons in S2S

Use of f th the SA SAWS ASR ASR for r Sp Spri ringflow Protection Optimization through

Memories of the Future S2S Presentation to the Foresight Synergy Network February 28, 2020

S2S Cycleway & Footway Interim Works (Bull Road to Causeway Road) Presentation to Elected

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation

1 In this presentation the two types of alkali-aggregate reaction ASR and ACR will de

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Water Authoritys ASR Policy Perspective RICK SHEAN, WATER QUALITY HYDROLOGIST AUG. 16, 2017

pler Sulfide Expansion Project Photographic Update February 2017 TSX: ASR / ASX: AQG / 1

1 Remember from the presentation on Fundamentals of ASR we learned that there are three

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Outline Septic Systems: A brief overview Collection systems Countywide S2S Ranking

COMS 4160: Problems on Transformations and OpenGL Ravi Ramamoorthi 1. Write the homogeneous 4x4

Heterogeneous Granularity Systems Muhao Chen 1 , Shi Gao 1 , X. Sean Wang 2 Department Of Computer

Dynamic Test Genera/on To Find Integer Bugs in x86 Binary Linux Programs David Molnar Xue Cong

Equivalence of PDA, CFG Conversion of CFG to PDA Conversion of PDA to CFG 1 Overview When

Representations with Instance Normalization Ju-Chieh Chou , Hung-yi Lee, Interspeech 2019. Outline

A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Multi-rate Signal Processing 4. Multistage Implementations 5. Multirate Application: Subband

Virtual Currencies: Obstacles and Applications Beyond Currency Sarah Meiklejohn (University

Sambuz

Useful Links

Newsletter

Mail Us

S2S ASR Advanced issues Tight coupling Tight coupling ASR should - PowerPoint PPT Presentation

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should output N- -best best Translated all (lattice) Translated all (lattice) Choose best translation Choose best translation

Sandymount to Seapoint Corridor Study S2S Supporters Meeting November 15 th 2016 Michael Collins

CPSC 503 - Intro to E2E ASR Peter Sullivan - April 24th 2020 Lecture Overview Intro to ASR

Speech Processing 15-492/18-492 Speech Recognition Systems Other ASR techniques ASR Systems

S2S Cycleway &amp; Footway Interim Works (Bull Road to Causeway Road) Presentation to Elected

Dialog in NLP applica.ons VELJKO MILJANIC Overview Applica(ons in S2S

Use of f th the SA SAWS ASR ASR for r Sp Spri ringflow Protection Optimization through

Memories of the Future S2S Presentation to the Foresight Synergy Network February 28, 2020

S2S Cycleway &amp; Footway Interim Works (Bull Road to Causeway Road) Presentation to Elected

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation

1 In this presentation the two types of alkali-aggregate reaction ASR and ACR will de

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Water Authoritys ASR Policy Perspective RICK SHEAN, WATER QUALITY HYDROLOGIST AUG. 16, 2017

pler Sulfide Expansion Project Photographic Update February 2017 TSX: ASR / ASX: AQG / 1

1 Remember from the presentation on Fundamentals of ASR we learned that there are three

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Outline Septic Systems: A brief overview Collection systems Countywide S2S Ranking

COMS 4160: Problems on Transformations and OpenGL Ravi Ramamoorthi 1. Write the homogeneous 4x4

Heterogeneous Granularity Systems Muhao Chen 1 , Shi Gao 1 , X. Sean Wang 2 Department Of Computer

Dynamic Test Genera/on To Find Integer Bugs in x86 Binary Linux Programs David Molnar Xue Cong

Equivalence of PDA, CFG Conversion of CFG to PDA Conversion of PDA to CFG 1 Overview When

Representations with Instance Normalization Ju-Chieh Chou , Hung-yi Lee, Interspeech 2019. Outline

A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Multi-rate Signal Processing 4. Multistage Implementations 5. Multirate Application: Subband

Virtual Currencies: Obstacles and Applications Beyond Currency Sarah Meiklejohn (University

Sambuz

Useful Links

Newsletter

Mail Us

S2S Cycleway & Footway Interim Works (Bull Road to Causeway Road) Presentation to Elected

S2S Cycleway & Footway Interim Works (Bull Road to Causeway Road) Presentation to Elected

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System