LPCNet: Improving Neural Speech Synthesis Through Linear Prediction - PowerPoint PPT Presentation

Sep 19, 2022 •42 likes •172 views

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction Jean-Marc Valin* (Amazon Web Services) Jan Skoglund (Google LLC) May 2019 *work performed while with Mozilla Approaches to Speech Synthesis Old DSP approach

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction Jean-Marc Valin* (Amazon Web Services) Jan Skoglund (Google LLC) May 2019 *work performed while with Mozilla
Approaches to Speech Synthesis ● “Old” DSP approach – Source-filter model – Synthesizing the excitation is hard – Acceptable quality at very low complexity ● New deep learning approach – Data driven – Results in large models (tens of MBs) – Very good quality at very high complexity ● Can we have the best of both worlds?
Neural Speech Synthesis ● WaveNet demonstrated impressive speech quality in 2016 – Data-driven: learning from real speech – Auto-regressive: each sample based on previous samples – Based on dilated convolutions – Probabilistic: network output is not a value but a probability distribution for μ-law value ● Still some drawbacks – Very high complexity (tens/hundreds of GFLOPS) – Uses neurons to model vocal tract
WaveRNN ● Replace dilated convolutions with RNN ● Addresses some of WaveNet’s issues
LPCNet: Bringing Back DSP ● Adding linear prediction to WaveRNN – Neurons no longer need to model vocal tract
Other Improvements ● Pre-emphasis – Boost HF in input/training data (1 – α z -1 ) – Apply de-emphasis on synthesis – Attenuates perceived μ-law noise for wideband ● Input embedding – Rather than use μ-law values directly, consider them as one-hot classifications – Learning non-linear functions for the RNN – Can be done at no cost by pre-computing matrix products
LPCNet: Complete Model
Training ● Inputs: signal ( t -1), excitation ( t -1), prediction ( t ) ● Output: excitation probability ( t ) ● Teacher forcing: use clean data as input – Need to avoid diverging due to imperfect synthesis not matching (perfect) training data – Inject noise in the input data – excitation = (clean signal) – (noisy prediction) ● Pre-emphasis and DC rejection applied to input ● Augmentation: varying gain and response
Complexity ● Use 16x1 block sparse matrices like WaveRNN – Add diagonal component to improve efficiency – 10% non-zero coefficients – 384-unit sparse GRU equivalent to 122-unit dense GRU ● Total complexity: 3 GFLOPS – No GPU needed – 20% of one 2.4 GHz Broadwell core – Real-time on modern phones with one core
Results ● Demo: https://people.xiph.org/~jm/demo/lpcnet/
Applications ● Text-to-speech (TTS) ● Low bitrate speech coding ● Codec post-filtering ● Time stretching ● Packet loss concealment (PLC) ● Noise suppression
Conclusion ● Bringing back DSP in neural speech synthesis – Improvement on WaveRNN – Easily real-time on a phone ● Future improvements – Use parametric output distribution – Add explicit pitch (as attention model?) – Improve noise robustness
Questions? ● LPCNet source code (BSD) – https://github.com/mozilla/lpcnet/

Recommend

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone Sequence To Speech Articulatory Approaches Concatenative Approaches HMM-based Approaches Rule-Based Approaches 1 Speech Synthesis Concept

749 views • 57 slides

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs Text Speech vs Text Same but different Same but different Core Speech Technologies Core Speech Technologies Speech Recognition Speech

705 views • 38 slides

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a

463 views • 24 slides

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis From text to speech From text to speech Text Analysis Text Analysis Strings of characters to words Strings of characters to words

667 views • 25 slides

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody Speech Synthesis Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody

420 views • 24 slides

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis From text to speech Text Analysis Strings of characters to words Linguistic Analysis From words to pronunciations and prosody

490 views • 25 slides

11-752: Speech Synthesis Objectives Understand basic processing in speech synthesis

11-752: Speech Synthesis Objectives Understand basic processing in speech synthesis Understand basic processing in speech synthesis Understand relative complexity of implementing Understand relative complexity of implementing

434 views • 29 slides

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a 45.67 Is voice X better than voice Y Is voice X

380 views • 25 slides

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text Analysis Text Analysis Chunking, tokenization, token expansion Chunking, tokenization, token expansion Linguistic Analysis

645 views • 29 slides

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody Part of Speech Tagging

383 views • 21 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech synthesis (Concluding lecture) Instructor: Preethi Jyothi Nov 6, 2017 Recall: SPSS framework O Speech Speech Train Parameter

273 views • 26 slides

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and synthesis 1 Modeling speech signals 2 Spectral and cepstral models 3 Linear Predictive models (LPC) 4 Other signal models 5 Speech synthesis Dan

623 views • 44 slides

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF SUPER-NANOPOROUS NANOPOROUS NANOPOROUS CARBON ALLOY BY CARBON ALLOY BY ELECTROOXIDATION OF A ZEOLITE ELECTROOXIDATION OF A ZEOLITE ELECTROOXIDATION OF A

573 views • 44 slides

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural IR tasks Neural IR architecture Feature Representations Neural IR query auto completion Neural IR query suggestion Neural IR document

1.48k views • 18 slides

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Linear, Linear, Linear CS7616 Pattern Recognition A. Bobick CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive Computing Linear, Linear, Linear CS7616 Pattern Recognition A. Bobick Administrivia

685 views • 64 slides

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Mbius Formant synthesis 1 l Formant synthesis acoustic-parametric synthesis method modeling

572 views • 22 slides

Syslog Processing for Switch Failure Diagnosis and Prediction in Datacenter Networks Shenglin

Syslog Processing for Switch Failure Diagnosis and Prediction in Datacenter Networks Shenglin Zhang, Weibin Meng, Jiahao Bu, Sen Yang Dan Pei, Ying Liu, Jun (Jim) Xu, Yu Chen, Hui Dong, Xianping Qu, Lei Song 9/21/2017 IWQOS 2017 1 Network

645 views • 45 slides

Professor: Kevin Molloy (adapted from slides originally developed by Alvin Chao) Counting on a

Professor: Kevin Molloy (adapted from slides originally developed by Alvin Chao) Counting on a Line: x+a moves you a units to the right of x xb moves you b units to the left of x Counting on a Circle (x+a) moves you a units

340 views • 15 slides

Merging Data Resources for Inflectional and Derivational Morphology in Czech ek y, Magda

Merging Data Resources for Inflectional and Derivational Morphology in Czech ek y, Magda Zden Zabokrtsk Sev c kov a, Milan Straka, Jon a s Vidra, Ad ela Limbursk a Charles University in Prague Institute of

373 views • 19 slides

Stanford CS193p Developing Applications for iOS Winter 2017 CS193p Winter 2017 Today What is

Stanford CS193p Developing Applications for iOS Winter 2017 CS193p Winter 2017 Today What is this class all about? Why am I here? Prerequisites You must be a strong object-oriented programmer. iOS Overview What s in iOS? Show me! A

515 views • 10 slides

The Overview of Web Search Engines Presented by Sunny Lam Outline Introduction Information

The Overview of Web Search Engines Presented by Sunny Lam Outline Introduction Information Retrieval Searching Problems Types of Search Engines The Largest Search Engines Architectures User Interfaces Web Directories Ranking Web

461 views • 35 slides

Introducing the Graph 500 Richard Murphy, Kyle Wheeler, Brian Barrett, and Jim Ang Sandia

Introducing the Graph 500 Richard Murphy, Kyle Wheeler, Brian Barrett, and Jim Ang Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

530 views • 18 slides

Pills from ASP Challenge 2019 Mario Alviano Department of Mathematics and Computer Science

Pills from ASP Challenge 2019 Mario Alviano Department of Mathematics and Computer Science University of Calabria, Italy, EU LPNMR 2019 1 / 17 Overview 1 Fastfood Problem Two lines encoding Use median if only one deposit (another two lines)

805 views • 39 slides

HB # : increasing the security and effjciency of HB + Henri Gilbert, Matt Robshaw, and Yannick

HB # : increasing the security and effjciency of HB + Henri Gilbert, Matt Robshaw, and Yannick Seurin Eurocrypt 2008 April 16, 2008 intro HB+ random-HB # HB # general MIM attacks conclusion the context pervasive computing (RFID tags . .

530 views • 23 slides