flowavenet a generative flow for raw audio
play

FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil - PowerPoint PPT Presentation

ICML 2019 FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil Lee 1 , Jongyoon Song 1 , Jaehyeon Kim 2 , Sungron Yoon 1,3 1 Seoul National University, 2 Kakao Corporation, 3 ASRI, INMC, Institute of Engineering Research, Seoul


  1. ICML 2019 FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil Lee 1 , Jongyoon Song 1 , Jaehyeon Kim 2 , Sungron Yoon 1,3 1 Seoul National University, 2 Kakao Corporation, 3 ASRI, INMC, Institute of Engineering Research, Seoul National University Poster 6/12 6:30 PM @Pacific Ballroom #2

  2. WaveNet ) log $ % & ':) = + log $ % & , & ., ,-' https://deepmind.com/blog/wavenet-generative-model-raw-audio/

  3. WaveNet Sequential sampling ) log $ % & ':) = + log $ % & , & ., ,-' https://deepmind.com/blog/wavenet-generative-model-raw-audio/

  4. Previous parallel speech synthesis models Pre-trained WaveNet Inverse Autoregressive Flows (IAFs) Probability Density Distillation !" # $ % ||# ' % Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning . 2018.

  5. Previous parallel speech synthesis models Pre-trained WaveNet Parallel sampling Inverse Autoregressive Flows (IAFs) Probability Density Distillation !" # $ % ||# ' % Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning . 2018.

  6. Previous parallel speech synthesis models Pre-trained WaveNet Parallel sampling Inverse Autoregressive Flows (IAFs) Probability Density Distillation Power Loss Perceptual Loss + !" # $ % ||# ' % Contrastive Loss Frame Loss Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning . 2018.

  7. Our Objectives • Simplify the training procedure for parallel sampling • Maintain the quality of speech samples

  8. Our Objectives • Simplify the training procedure for parallel sampling • Maintain the quality of speech samples Flow-based generative models for raw audio!

  9. FloWaveNet 3 3 + % , - Raw audio Gaussian Noise + log det 2, - & Training phase log $ % & ':) = log $ + , - & ':) 2&

  10. FloWaveNet 5 5 + % , - 34 , - Raw audio Gaussian Noise + log det 2, - & Training phase log $ % & ':) = log $ + , - & ':) 2& Sampling phase 34 (6) 6 = 6 ':) ~ 5 + 6 = 8 9, ; , & = , - <

  11. FloWaveNet 5 5 + % , - 34 , - Raw audio Gaussian Noise + log det 2, - & Training phase log $ % & ':) = log $ + , - & ':) 2& Sampling phase 34 (6) 6 = 6 ':) ~ 5 + 6 = 8 9, ; , & = , - < 34 are designed to be computed efficiently Both the transformation , - and , - à Efficient training & Parallel sampling

  12. FloWaveNet 7 4 56 7 4 59 4 : 7 ⋅ 4 59 7 log det 3 4 56 & log $ % & ':) = log $ + , & ':) + . 3& /

  13. Mean Opinion Scores FloWaveNet ≥ Gaussian IAF

  14. Sampling speed FloWaveNet ≅ Gaussian IAF ≅ Parallel WaveNet >> Autoregressive WaveNet 1000s times faster

  15. Conclusion • FloWaveNet produces high quality audio samples as well as previous distilled models. • FloWaveNet synthesizes audio samples in parallel – w/o well pre-trained WaveNet (No distillation!) – w/o auxiliary loss terms Demo page Code Poster 6/12 6:30 PM @Pacific Ballroom #2

  16. 16

Recommend


More recommend