Representations with Instance Normalization Ju-Chieh Chou , Hung-yi - PowerPoint PPT Presentation

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization Ju-Chieh Chou , Hung-yi Lee, Interspeech 2019.

Outline 1. Introduction 2. Proposed Approach • Model • Experiments 3. Conclusion

Voice conversion Change the characteristic of an utterance while maintaining the ● language content the same. Characteristic: accent, speaker identity, emotion … ● This work: focuses on speaker identity conversion. ● Speaker A Speaker 1 Model How are you How are you

Conventional: supervised VC with parallel data Same sentences, different signal from 2 speakers. ● Formulated as a supervised learning problem. Speaker A ● Speaker 1 Problem: require parallel data, which is hard ● to collect. How are you How are you Parallel data Nice to meet you Nice to meet you I am fine I am fine

Recently: unsupervised VC with non-parallel data Trained on non-parallel corpus, which is more attainable. ● Prior work: utilize deep generative model, ex. VAE, GAN, cycleGAN. ● Problem: cannot convert to speakers not in the training data. ● Our goal: train a model which is able to convert to speakers not in ● the training data. Speaker 1 Speaker A Don ’ t have to speak same sentences.

Motivation ● Intuition: speech signals inherently carry both content and speaker information. ● Learn the content/speaker representation separately. ● Synthesize the target voice by combining the source content representation and target speaker representation. Target speaker representation Source speaker representation Encoder Decoder content How are you How are you representation: ” How are you ”

Outline 1. Introduction 2. Proposed Approach • Model • Experiments 3. Conclusion

Model overview One-shot VC: use a utterance from target speaker as reference, and ● synthesize this reference speaker ’ s voice. Idea: separately encode speaker and content information with some ● special designed layers.

Idea ● Speaker information - invariant within an utterance. ● Content information - varying within an utterance. Special Designed Layers: Feature map Channel Instance Normalization Layer: normalizing speaker IN ′ = 𝑁 𝑑 − 𝜈 𝑑 information ( 𝜈, 𝜏 ) while preserving content information. M 𝑑 𝜏 𝑑 Intuition: normalize global information out (ex. high frequency), retain 𝑈 𝑁 𝑑 changes over time. Average Pooling Layer: calculating speaker information ( 𝛿, 𝛾) . AVG 𝑢 ′ = ෍ 𝑁 𝑑 𝑈 𝑢=1 Adaptive Instance Normalization Layer: provide speaker 𝑁 𝑑 − 𝜈 𝑑 AdaIN ′ = 𝛿 𝑑 information (𝛿, 𝛾) . M 𝑑 + 𝛾 𝑑 𝜏 𝑑

Model - training Problem: how to factorize the representations? S peaker AVG 𝑨 𝑡 Encoder 𝐹 𝑡 𝑦 𝑦 AdaIN Content Decoder 𝑨 𝑑 IN Encoder 𝐹 𝑑 D calculating speaker information( 𝛿, 𝛾 ). AVG IN normalizing speaker information ( 𝜈, 𝜏 ) while preserving content information. provide speaker information ( 𝛿, 𝛾 ). AdaIN

Model - testing Target speaker ’ s utterance S peaker AVG 𝑨 𝑡 Encoder 𝐹 𝑡 𝑦 𝑦 Converted AdaIN Content Decoder 𝑨 𝑑 IN Encoder 𝐹 𝑑 D Source speaker ’ s utterance calculating speaker information( 𝛿, 𝛾 ). AVG IN normalizing speaker information ( 𝜈, 𝜏 ) while preserving content information. provide speaker information ( 𝛿, 𝛾 ). AdaIN

Experiments – effect of IN Train another speaker classifier to see how much speaker information ● in content representations. The lower the accuracy is, the less speaker information it contains. ● Content encoder + IN: less speaker information. ● Speaker 𝑨 𝑑 (content Predict representation) speaker Classifier 𝑭 𝒅 With IN 𝑭 𝒅 Without IN Acc. 0.375 0.658

Experiments – speaker embedding visualization • Does speaker encoder learns meaningful representations? • One color represents one speaker ’ s utterances. • 𝑨 𝑡 from different speakers are well separated. S peaker Unseen speakers ’ AVG 𝑨 𝑡 Encoder 𝐹 𝑡 utterances

Experiments - subjective • Ask subjects to score the similarity between 2 utterances in 4-scales.

Experiments - subjective • Ask subjects to score the similarity between 2 utterances in 4-scales. • Our model is able to generate the voice similar to target speaker ’ s.

Demo page: https://jjery2243542.github.io/one-shot- vc-demo/ Demo (unseen) Male to Male Source: Target: Converted: Female to Male Source: Target: Converted:

Conclusion • We proposed a one-shot VC model, which is able to convert to unseen speaker with one reference utterance. • By IN and AdaIN, our model is able to learn factorized representations.

Thank you for your attention.

Representations with Instance Normalization Ju-Chieh Chou , Hung-yi - PowerPoint PPT Presentation

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization Ju-Chieh Chou , Hung-yi Lee, Interspeech 2019. Outline 1. Introduction 2. Proposed Approach Model Experiments 3. Conclusion Outline

Normal forms and normalization An example of normalization using normal forms We assume we have

Database Normalization Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th)

Learning for Categorization Sample Category Learning Problem A training example is an instance

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO & PARTNERS IS THE

Normalization Redundancy causes several anomalies : insert, delete and update

Normalization-Invariant Fuzzy Logic Need for Normalization Operations Explain Empirical Success

Normalization by Evaluation for Martin-Lf Type Theory with One Universe Peter Dybjer,

Genealogical Place Name Normalization Bob Leaman (bob.leaman@asu.edu) What is meant by

in Data Mining (An overview to Multiple Instance Learning) Sebastin Ventura Soto Knowledge

Relational Normalization Theory Chapter 6 1 Limitations of E-R Designs Provides a set of

Strong normalization for the parameter-free Strong polymorphic lambda calculus based on the

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Formalizing Strong Normalization Proofs Kazuhiko Sakaguchi College of Information Science,

From reference genes to global mean normalization Jo Vandesompele professor, Ghent University

AMR Normalization for Fairer Evaluation Michael Wayne Goodman goodmami@uw.edu Nanyang

Normalization by evaluation for Thorsten Altenkirch Tarmo Uustalu University of Nottingham

Normalization by Evaluation and the Foundations of Constructive Mathematics 1972 - 2009 Peter

On SAT representations of XOR constraints (towards a theory of good SAT representations) Oliver

CPSC 213 2.4.4-2.4.6 Textbook 2ed: 3.9.1 1ed: 3.9.1 Introduction to Computer

1 Hypothesis Space Inductive Learning Hypothesis Restrict learned functions a priori to a

Continuous Semantics for Strong Normalization Ulrich Berger Swansea 1 Contents 1. Motivation

On SAT representations of XOR constraints (towards a theory of good SAT representations) Matthew

RNAseq: Normalization and differential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack.

Type Theory and Coq Herman Geuvers Lecture: Normalization for and 2 1 Properties of

Representations with Instance Normalization Ju-Chieh Chou , Hung-yi - PowerPoint PPT Presentation

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization Ju-Chieh Chou , Hung-yi Lee, Interspeech 2019. Outline 1. Introduction 2. Proposed Approach Model Experiments 3. Conclusion Outline

Normal forms and normalization An example of normalization using normal forms We assume we have

Database Normalization Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th)

Learning for Categorization Sample Category Learning Problem A training example is an instance

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO &amp; PARTNERS IS THE

Normalization Redundancy causes several anomalies : insert, delete and update

Normalization-Invariant Fuzzy Logic Need for Normalization Operations Explain Empirical Success

Normalization by Evaluation for Martin-Lf Type Theory with One Universe Peter Dybjer,

Genealogical Place Name Normalization Bob Leaman (bob.leaman@asu.edu) What is meant by

in Data Mining (An overview to Multiple Instance Learning) Sebastin Ventura Soto Knowledge

Relational Normalization Theory Chapter 6 1 Limitations of E-R Designs Provides a set of

Strong normalization for the parameter-free Strong polymorphic lambda calculus based on the

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Formalizing Strong Normalization Proofs Kazuhiko Sakaguchi College of Information Science,

From reference genes to global mean normalization Jo Vandesompele professor, Ghent University

AMR Normalization for Fairer Evaluation Michael Wayne Goodman goodmami@uw.edu Nanyang

Normalization by evaluation for Thorsten Altenkirch Tarmo Uustalu University of Nottingham

Normalization by Evaluation and the Foundations of Constructive Mathematics 1972 - 2009 Peter

On SAT representations of XOR constraints (towards a theory of good SAT representations) Oliver

CPSC 213 2.4.4-2.4.6 Textbook 2ed: 3.9.1 1ed: 3.9.1 Introduction to Computer

1 Hypothesis Space Inductive Learning Hypothesis Restrict learned functions a priori to a

Continuous Semantics for Strong Normalization Ulrich Berger Swansea 1 Contents 1. Motivation

On SAT representations of XOR constraints (towards a theory of good SAT representations) Matthew

RNAseq: Normalization and differential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack.

Type Theory and Coq Herman Geuvers Lecture: Normalization for and 2 1 Properties of

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO & PARTNERS IS THE