Neural Voice Cloning with a Few Samples Sercan O. Arik, Jitong - PowerPoint PPT Presentation

Dec 12, 2023 •125 likes •260 views

Neural Voice Cloning with a Few Samples Sercan O. Arik, Jitong Chen, Kainan Peng* , Wei Ping, Yanqi Zhou Motivations Text-to-speech (TTS) models can be conditioned on text and speaker identity. Text: linguistic information, content of the

Neural Voice Cloning with a Few Samples Sercan O. Arik, Jitong Chen, Kainan Peng* , Wei Ping, Yanqi Zhou
Motivations • Text-to-speech (TTS) models can be conditioned on text and speaker identity. • Text: linguistic information, content of the generated speech. • Speaker identity: speaker information (accent, pitch, speech rate…).
Motivations • Text-to-speech (TTS) models can be conditioned on text and speaker identity. • Text: linguistic information, content of the generated speech. • Speaker identity: speaker information (accent, pitch, speech rate…). • Limitations: • Can only generate speech for observed speakers during training. • Require lots of speech samples per speaker (e.g., Deep Voice 2).
Voice Cloning • Voice cloning: synthesize the voices of new speakers from a few speech samples (few-shot generative model). • Applications: personalized speech interfaces, content creation, assistive technology…
Voice Cloning • Voice cloning: synthesize the voices of new speakers from a few speech samples (few-shot generative model). • Applications: personalized speech interfaces, content creation, assistive technology… • Challenges: • Generalization: learn the voice of a new speaker. • Efficiency: extract the speaker characteristics from a few speech samples. • Computational cost: cloning with low latency and small footprint. • Two approaches: • Speaker adaptation. • Speaker encoding.
Speaker Adaptation • Fine-tune a pre-trained multi-speaker model for a new speaker. • Training data: a few text and audio pairs.
Speaker Adaptation • Fine-tune a pre-trained multi-speaker model for a new speaker. • Training data: a few text and audio pairs. • Two options for speaker adaptation: Fine-tune the whole model Fine-tune the speaker embedding only
Speaker Adaptation Analysis Speaker Adaptation Approaches Embedding-only Whole-model Cloning time 8 h 5 min # of parameters 128 25 million per speaker
Speaker Encoding • Directly predict a new speaker embedding for a multi-speaker model. • Train a speaker encoder with audio and speaker embedding pairs.
Speaker Encoding • Directly predict a new speaker embedding for a multi-speaker model. • Train a speaker encoder with audio and speaker embedding pairs. • Cloning time: a few seconds, more favorable for low-resource deployment.
Results • Vocoder: classical Griffin-Lim algorithm. • Demo website: ht http://au audiodemos.g .github.i .io Speaker Adaptation Speaker Approaches Encoding Embedding-only Whole-model Naturalness 2.67 3.16 2.99 (5-scale) Mean Opinion Score (MOS) Similarity 2.95 3.16 2.85 (4-scale)
Voice Morphing via Embedding Manipulation • BritishMale + AveragedFemale - AveragedMale = BritishFemale • BritishMale + AveragedAmerican - AveragedBritish = AmericanMale
Thank you! Welcome to our poster, and listen to samples! Today, Session B, #91

Recommend

SHEEP CLONING Paley Li, Nicholas Cameron, and James Noble 2 Object cloning How do you do

1 SHEEP CLONING Paley Li, Nicholas Cameron, and James Noble 2 Object cloning How do you do object cloning? 3 Shallow cloning Copies an object and alias the references in that object. 3 Shallow cloning Copies an object and alias

833 views • 79 slides

ENZYMES IN CLONING PART I Dr.Sarookhani / / Cloning Cloning -

/ / ENZYMES IN CLONING PART I Dr.Sarookhani / / Cloning Cloning - - a definition a definition From the Greek From the Greek - - klon, a twig klon, a twig An

856 views • 61 slides

Cloning Tools Photoshop Tutorials Introduction In a skilled and experienced hand, the cloning

Cloning Tools Photoshop Tutorials Introduction In a skilled and experienced hand, the cloning tools lead to phenomenal results. In the hands of a careless artist, Photoshop cloning can be disastrous to the credibility of the result. This

268 views • 25 slides

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

The Leader's Voice Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The Leader's Voice Slide 5 Page: 7 The Leader's Voice Slide 6 Page: 8 The Leader's Voice Slide 7 Page: 9 The Leader's Voice [

667 views • 49 slides

Ligase-Independent Cloning Ligase-Independent Cloning for BioBrick Preparation for BioBrick

Ligase-Independent Cloning Ligase-Independent Cloning for BioBrick Preparation for BioBrick Preparation Calvin Christian School Lethbridge, Alberta, Canada November 2008 Lethbridge CCS Team Lethbridge Edmonton Calgary Lethbridge, Alberta,

700 views • 36 slides

DNA CLONING DNA CLONING Dr.Sarookhani Dr.Sarookhani / /

/ / DNA CLONING DNA CLONING Dr.Sarookhani Dr.Sarookhani / / Dr.Sarookhani Dr.Sarookhani / / Dr.Sarookhani Dr.Sarookhani / /

1.15k views • 87 slides

Pseudorandom States, No-Cloning Pseudorandom States, No-Cloning Theorems and Quantum Money

Pseudorandom States, No-Cloning Pseudorandom States, No-Cloning Theorems and Quantum Money Theorems and Quantum Money Zhengfeng Ji (UTS:QSI) QCrypt 2018, Shanghai 1 . 1 A Joint Work With A Joint Work With Yi-Kai Liu Fang Song (NIST and

257 views • 24 slides

Many Features, Few Samples: Many Features, Few Samples: From cheminformatics cheminformatics to

Many Features, Few Samples: Many Features, Few Samples: From cheminformatics cheminformatics to bioinformatics to bioinformatics From Kristin P. Bennett Kristin P. Bennett Department of Mathematical Sciences Department of Mathematical

633 views • 28 slides

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes Presented by N7MOT Lenny Gemar Amateur radio began with spark gap transmitters, evolving to Morse code and analog voice

442 views • 20 slides

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF 4FSK VHF C4FM AMBE, AMBE+2, Codec-2 HF OFDM Audio Vocoder Modulator Integrate data into protocol A/D Compressed 10101111010 Mic

729 views • 13 slides

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural IR tasks Neural IR architecture Feature Representations Neural IR query auto completion Neural IR query suggestion Neural IR document

1.48k views • 18 slides

Samples Advertising of samples and handing out samples Advertising Education and Assurance

Samples Advertising of samples and handing out samples Advertising Education and Assurance Section Regulatory Compliance Branch Regulatory Practice and Support Division CHP Australia -Therapeutic Goods Advertising Code Seminar March/April 2020

250 views • 8 slides

-Samples [AB98] Hyp: domain S is a smooth curve or surface. S 1 -Samples [AB98] Hyp:

-Samples [AB98] Hyp: domain S is a smooth curve or surface. S 1 -Samples [AB98] Hyp: domain S is a smooth curve or surface. S E 1 -Samples [AB98] Hyp: domain S is a smooth curve or surface. S E 1 -Samples [AB98] Hyp: domain S

617 views • 38 slides

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

TWO S OR MEDIANS: COMPARISONS Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question Further study

397 views • 38 slides

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Voice Annunciator Volume and Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice announcements are handled by the coachs Clever Devices announcement system. Voice Annunciator Volume Voice Annunciator

558 views • 7 slides

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a 45.67 Is voice X better than voice Y Is voice X

380 views • 25 slides

Objects, Clones and Collections Implementation and simulation with simecol An example

Objects, Clones and Collections Thomas Petzoldt Dynamic Models in R? Introductory examples Why R Problems Concepts What is typical? Simecol Objects Objects, Clones and Collections Implementation and simulation with simecol An example

839 views • 32 slides

Types for Deep/Shallow Cloning Ka Wai Cheng Imperial College London Department of Computing

Types for Deep/Shallow Cloning Ka Wai Cheng Imperial College London Department of Computing June 26, 2012 Motivation Our system Untrusted code Query state Outsider Interface Internal State Motivation Our system Untrusted code Query

513 views • 40 slides

Cloning Considered Harmful Considered Harmful Cory Kapser and Michael W. Godfrey David R.

Cloning Considered Harmful Considered Harmful Cory Kapser and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada A Commonly Cited Belief Cloning considered harmful

882 views • 30 slides

CS 285 Instructor: Sergey Levine UC Berkeley Terminology & notation 1. run away 2. ignore

Supervised Learning of Behaviors CS 285 Instructor: Sergey Levine UC Berkeley Terminology & notation 1. run away 2. ignore 3. pet Terminology & notation 1. run away 2. ignore 3. pet Aside: notation Lev

669 views • 44 slides

RFIDIOts!!! Hacking RFID Without A Soldering Iron (or a Patent Attorney) Adam Laurie

RFIDIOts!!! Hacking RFID Without A Soldering Iron (or a Patent Attorney) Adam Laurie adam@algroup.co.uk http://trifinite.org http://rfidiot.org FIRST TC, 2008 Prague, Czec Republic Who Am I? The Bunker non-exec Co-Publisher

607 views • 50 slides

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Operating Systems Structure Operating

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Operating Systems Structure Operating Systems Structure February 7, 2008 - Lecture February 7, 2008 - Lecture 7 7 Instructor: Trent Jaeger Instructor: Trent Jaeger Last class:

714 views • 28 slides

Learning from Demonstration Applications and Challenges Feryal Behbahani 26 November 2018 Deep

Learning from Demonstration Applications and Challenges Feryal Behbahani 26 November 2018 Deep RL can learn everything? TD-Gammon, 1995 Slot car driving TRPO, Schulman et al., 2015 DQN, Mnih et al., 2013 Lang & Riedmiller 2012 Levine

868 views • 43 slides

JSEP Update Justin Uberti IETF 83.5 Topics Activity since IETF 83 Implementation

JSEP Update Justin Uberti IETF 83.5 Topics Activity since IETF 83 Implementation Status Issues Raised Recap from IETF 83 Offer/answer state machine specified PRANSWER's only meaning is "allow additional future

354 views • 18 slides