Semi-Supervised Adversarial Audio Source Separation applied to - PowerPoint PPT Presentation

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Semi-Supervised Adversarial Audio Source Separation applied to Singing Voice Extraction Daniel Stoller 1 , Sebastian Ewert 2 , Simon Dixon 1 1 Centre for Digital Music Queen Mary University London 2 Spotify MLSP-L8: Deep Learning III ICASSP 19.04.2018

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Audio source separation Task: Recover sources from mixtures Example: Music instrument separation:

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Current state of the art [5, 3, 1] Training on multitrack datasets Neural network Discriminative, MSE loss

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Current state of the art [5, 3, 1] Training on multitrack datasets (small ⇒ overfitting!) Neural network Discriminative, MSE loss

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Our goal ⇒ How to also learn from unpaired mixtures and sources? Random mixing ignores source correlations [4, 2]

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Intuition Magnitude Unlabeled mixtures spectrogram Accompaniment estimates Separator Magnitude Mixture network database spectrogram Vocal estimates Magnitude spectrogram

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Intuition Unlabeled accompaniment Accompaniment Magnitude database spectrogram Magnitude Unlabeled mixtures spectrogram Accompaniment estimates Separator Magnitude Mixture network database spectrogram Vocal estimates Magnitude spectrogram Magnitude Singing voice spectrogram database Unlabeled vocals

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Intuition

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Derivation of unsupervised loss For optimal separator: q φ ( s k | m ) = p ( s k | m )

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Derivation of unsupervised loss For optimal separator: q φ ( s k | m ) = p ( s k | m ) E m ∼ p data q φ ( s k | m ) E m ∼ p data p ( s k | m ) = Overall separator output = Source distribution

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Derivation of unsupervised loss For optimal separator: q φ ( s k | m ) = p ( s k | m ) E m ∼ p data q φ ( s k | m ) E m ∼ p data p ( s k | m ) = out q k p k = φ s

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Derivation of unsupervised loss For optimal separator: q φ ( s k | m ) = p ( s k | m ) E m ∼ p data q φ ( s k | m ) E m ∼ p data p ( s k | m ) = out q k p k = φ s Necessary condition for optimal separator Loss: Minimise divergence between source outputs: L u = � K k =1 D [ out q k φ || p k s ]

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Overall approach Supervised loss: MSE between estimate and ground truth

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Overall approach Supervised loss: MSE between estimate and ground truth Unsupervised loss: L u = � K k =1 D [ out q k φ || p k s ] L add : MSE between sum of source estimates and mixture

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Theoretical framework Overall approach Supervised loss: MSE between estimate and ground truth Unsupervised loss: L u = � K k =1 D [ out q k φ || p k s ] L add : MSE between sum of source estimates and mixture Total loss: L = L s + α L u + β L add

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Implementation using GANs Divergence minimization with GANs Discriminator estimates divergence D between generator and real distribution Generator minimises divergence D

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Implementation using GANs Divergence minimization with GANs Discriminator estimates divergence D between generator and real distribution Generator minimises divergence D Our separator is a conditional generator ⇒ We use one discriminator per source to estimate the Wasserstein distance W [ out q k φ || p k s ]

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Experimental setup Avoids dataset bias Supervised and semi-supervised training with early stopping U-Net as separator, DCGAN as discriminator

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Results Performance Mean accompaniment SDR 12 Baseline 11 Ours 10 9 8 7 6 Test set DSD100 MedleyDB CCMixter iKala Mean vocal SDR 12 Baseline 10 Ours 8 6 4 2 Test set DSD100 MedleyDB CCMixter iKala

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Results Qualitative t (s) t (s) 0 0.5 1.0 1.5 2.0 0 0.5 1.0 1.5 2.0 0 0 256 256 f (Hz) f (Hz) 512 512 768 768 1024 1024 (a) Separator estimate x (b) ∇ x D ( x )

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Results Qualitative t (s) t (s) 0 0.5 1.0 1.5 2.0 0 0.5 1.0 1.5 2.0 0 0 256 256 f (Hz) f (Hz) 512 512 768 768 1024 1024 (a) Separator estimate x (b) ∇ x D ( x ) ⇒ Discriminator appears to work More perceptual loss function?

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Summary Current SotA methods only use multi-track data Our approach also uses solo source recordings Performance improvement in singing voice separation experiment More perceptual loss? (seeks posterior modes, not means)

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary End Code available at https://github.com/f90/AdversarialAudioSeparation Thank you for your attention!

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary A. Jansson, E. J. Humphrey, N. Montecchio, R. Bittner, A. Kumar, and T. Weyde. Singing voice separation with deep U-Net convolutional networks. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) , pages 323–332, 2017. M. Miron, J. Janer Mestres, and E. G´ omez Guti´ errez. Generating data to train convolutional neural networks for classical music source separation. In Proceedings of the 14th Sound and Music Computing Conference . Aalto University, 2017. A. A. Nugraha, A. Liutkus, and E. Vincent. Multichannel audio source separation with deep neural networks . PhD thesis, Inria, 2015.

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary S. Uhlich, F. Giron, and Y. Mitsufuji. Deep neural network based instrument extraction from music. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 2135–2139. IEEE, 2015. S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi, and Y. Mitsufuji. Improving music source separation based on deep neural networks through data augmentation and network blending. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 261–265, March 2017.

Semi-Supervised Adversarial Audio Source Separation applied to - PowerPoint PPT Presentation

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Semi-Supervised Adversarial Audio Source Separation applied to Singing Voice Extraction Daniel Stoller 1 , Sebastian Ewert 2 , Simon

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Tangent-Normal Adversarial Regularization for Semi-Supervised Learning Bing Yu , Jingfeng Wu

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Wave-U-Net A Multi-Scale Neural Network for End-to-End Audio Source Separation DANIEL STOLLER 1 ,

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Adversarial Training for Weakly Supervised Event Detection Xiaozhi Wang 1 , Xu Han 1 , Zhiyuan Liu

CSE 562: Mobile Systems & Applications Quals Course Systems Area Shyam Gollakota First

Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu 1

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Kernel Spectrogram Models for source separation Antoine Liutkus 1 , Zafar Rafii 2 , Bryan Pardo 2

DNN Based TTS Systems TTS Architecture: Traditional Pipeline Typical statistical parametric

Two-photon laser spectroscopy of antiprotonic helium and the antiproton-electron mass ratio

Croissance et proprits magntiques de rseaux planaires auto-organiss de nanofils de Fer

Status of direct neutrino mass measurements Florian Frnkle, Institute for Nuclear Physics

Semi-Supervised Adversarial Audio Source Separation applied to - PowerPoint PPT Presentation

Motivation State of the art Proposed approach Experiment: Singing voice separation Discussion and summary Semi-Supervised Adversarial Audio Source Separation applied to Singing Voice Extraction Daniel Stoller 1 , Sebastian Ewert 2 , Simon

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Tangent-Normal Adversarial Regularization for Semi-Supervised Learning Bing Yu , Jingfeng Wu

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Wave-U-Net A Multi-Scale Neural Network for End-to-End Audio Source Separation DANIEL STOLLER 1 ,

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Adversarial Training for Weakly Supervised Event Detection Xiaozhi Wang 1 , Xu Han 1 , Zhiyuan Liu

CSE 562: Mobile Systems &amp; Applications Quals Course Systems Area Shyam Gollakota First

Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu 1

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Kernel Spectrogram Models for source separation Antoine Liutkus 1 , Zafar Rafii 2 , Bryan Pardo 2

DNN Based TTS Systems TTS Architecture: Traditional Pipeline Typical statistical parametric

Two-photon laser spectroscopy of antiprotonic helium and the antiproton-electron mass ratio

Croissance et proprits magntiques de rseaux planaires auto-organiss de nanofils de Fer

Status of direct neutrino mass measurements Florian Frnkle, Institute for Nuclear Physics

CSE 562: Mobile Systems & Applications Quals Course Systems Area Shyam Gollakota First