GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 - PowerPoint PPT Presentation

GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 Keunwoo Choi at QMUL.uk, Spotify.us, groovo.io

Sound Source Separation Let’s isolate the “target” audio signal! • “ Cocktail party e ff ects ”   ..as if we’re simulating human brain (as if we know what’s going on there)

Sound Source Separation problem = f(assumptions} assumptions = {environments: {dry, wet, ..}, signal = {ch: {mono, stereo, ..}, content: {speech, music}}, target: {...}} Input Target Noise Speech, Ambience Speech Noise Mixture of speech Speaker i all j != i Music ((Vocal, Drum, Instrument i all j != i Guitar, Bass + ..)

SSS Applications • KA-RA-O-KE! • Transcription • DJing/Mixing • Many other MIR tasks • Once we called it a “Chicken-and-egg”;   S olving SSS would make many tasks extremely easier

BIG ASSUMPTION   FOR A LONG WHILE • with |STFT| (or CQT) • 1 time-frequency bin, 1 instrument -- aka “W-disjoint” • Phase doesn’t matter much • It used to apply to (almost) every research

Problem config 1 -   mixing matrix A • There was a mixing matrix A that we’ll estimate its inverse. s : source (instruments) a_xx : amplitude mixing coeffs x : stereo signal input ! w : estimated mixing coeffs ; y : estimated source (instruments)

ICA • Independent Component Analysis (ICASSP ’98) • Based on some stats -- independency, (non-)Gaussianity • Not directly about audio but a general technique • Example: http://www.kecl.ntt.co.jp/icl/signal/sawada/ demo/bss2to4/index.html Further study: https://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf

ADRESS • A zimuth D iscrimination and Res ynthesi s (DAFx 2004) • 1-dim clustering; for stereo sound source separation

Problem config 2 -   mixing matrix and delay • Sources are at di ff erent angles and distances   → Mixing matrix A is also about time delay

DUET • Location = {angle, distance} • Each location, each 2D cluster, each instrument • DOA (Direction Of Arrival) estimation • Something similar is in your phone (with 2+ microphones) to suppress non-speech sounds (but perhaps not in your earphones/headphones)

Problem config 3 - music - spectra of instruments http://www.physics.usyd.edu.au/teach_res/hsp/sp/mod31/m31_strings.htm

NMF • Assumptions of using NMF for SSS: • The spectral shapes of musical instruments are known. • NMF would separate each note (aka basis)! • Many applications for drum separation (it works) https://www.slideshare.net/DaichiKitamura/robust-music-signal-separation-based-on- supervised-nonnegative-matrix-factorization-with-prevention-of-basis-sharing

Problem config 4 - music - repeats • “Instrumental parts repeat!” ( ↔ vocal) • “Drums/beats repeat!” ( ↔ harmonic instruments) • A valid assumption for modern popular music • E.g., REPET (IEEE 2013), KAM (D. Fano Yela, ICASSP 2017), ...

Problem config 5 - music - some musical cases • “Central” (~= vocal) source separation • Because - main vocals are almost always at the centre (and we all love karaoke) • Harmonic/percussive source separation • Because - they are (almost) completely di ff erent in spectral/temporal axes • Median filtering for drum separation (D. Fitzgerald, DAFx) “Gaussian mixture model for singing voice separation from stereophonic music”, M Kim et al, 2011

Problem config 6 - music - ‘informed’ source separation • Exploiting the score as side information “Score-Informed Source Separation for Musical Audio Recordings”, S Ewert et al., 2013

History so far... as time goes by less generality stronger assumptions

DEEP! LEARNING!!

DL and SSS • Less assumptions (let’s think further papers!) • Data-related; trained models do NOT extrapolate.   E.g., A model with speech probably wouldn’t work with music. • Model-related; E.g., frame-based? context-free? Does it estimate the phase? Stereo-input?

Frame-based DL-SS • Because vocals are distinguishable in a frame (or frames) “Deep Learning For Monaural Speech Separation” , Po-sen Huang et al, 2014

U-Net and SS • Because vocals are distinguishable in the |STFT| image “U-Net: Convolutional Networks for Biomedical Image Segmentation”, O Ronneberger et al., 2015 “Singing Voice Separation With Deep U-Net Convolutional Networks”, A Jansson et al., ismir 2017

A practical limitation • Supervised learning requires a * paired* paired dataset dataset • for such a system;   1 Inst Vocal x: [mixtures]   Vocal 2 Inst y: [instrumental mixtures; vocal tracks] 3 Inst Vocal • → not sustainable

GANs and SS paired unpaired dataset • Weakly labelled dataset:   dataset {many instrumental tracks}   1 Inst Vocal (aka Real)   Inst tracks Vocal 2 Inst +   3 Inst Vocal {many voc + instrumental tracks}   mix tracks (input of aka Fake) • We alternate to show a GAN-based model   {real instrumental / vocal-separated (fake) instrumental}   and let the model learns   - i) to classify real/fake   - ii) to fake an instrumental track = to remove vocal   simultaneously. “Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction”, D Stoller, 2018 ICASSP

Further study • A great SS tutorial: http://ismir2010.ismir.net/ proceedings/tutorial_1_Vincent-Ono.pdf

Further me • keunwoochoi.wordpress.com • keunwoochoi.blogspot.com • groovo.io • spotify.com • http://c4dm.eecs.qmul.ac.uk

GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 - PowerPoint PPT Presentation

GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 Keunwoo Choi at QMUL.uk, Spotify.us, groovo.io Sound Source Separation Lets isolate the target audio signal! Cocktail party e ff ects ..as if were

Sonification - Sound of Science VU, WS 2013 Lecture 8 - Parameter Mapping Visda Goudarzi

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

SOUND SOUND Wha hat is t is sound sound? Click on the image below to find out. Sounds are

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

HEX Switch: Hardware-assisted security extensions of OpenFlow Taejune Park / KAIST /

Positioning Relay Nodes in ISP Networks Meeyoung Cha (KAIST) Sue Moon (KAIST) Chong-Dae Park

SolutionChat: Real-time Moderator Support for Chat-based Structured Discussion Sung-Chul Lee

SYNTHESIZING 3D SOUND SYNTHESIZING 3D SOUND AND AND SOUND LOCALIZATION SOUND LOCALIZATION

Sound & Editing Lily, Matt, Mei, Michaela Sound WHAT IS SOUND? An audible vibration of the

Sound 1 Sound "50% of the movie experience is sound - George Lucas Sound is used

A Classification Approach to Single Channel Source Separation CS 6772 Project Ron Weiss

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Sound Synthesis (Part 2) Graduate School of Culture Technology, KAIST Juhan Nam Category of

Sound Slide 2 / 50 Characteristics of Sound Sound can travel through any kind of matter, but

Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4,

1-Day-N-Questions 2016-12-15 Team MEDDLER 20165192 Sunggeun Ahn topmaze@kaist.ac.kr 20165161

The Big Picture: Where are We Now? I/O System Design Issues interrupts Processor Network

Indoor Sound Localization Fares Abawi Universitt Hamburg Fakultt fr Mathematik, Informatik

Outline Successful Multiparty Audio Communication over the Internet Introduction Problems

Score informed audio source separation using a parametric model of non-negative spectrogram

Wave-U-Net A Multi-Scale Neural Network for End-to-End Audio Source Separation DANIEL STOLLER 1 ,

Audio: Generation & Extraction Charu Jaiswal Music Composition which approach? Feed

Coroutines and Reactive Programming friends or foes? Konrad Kami ski Allegro.pl suspend

Mono for Game Developers Miguel de Icaza miguel@xamarin.com,

GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 - PowerPoint PPT Presentation

GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 Keunwoo Choi at QMUL.uk, Spotify.us, groovo.io Sound Source Separation Lets isolate the target audio signal! Cocktail party e ff ects ..as if were

Sonification - Sound of Science VU, WS 2013 Lecture 8 - Parameter Mapping Visda Goudarzi

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

SOUND SOUND Wha hat is t is sound sound? Click on the image below to find out. Sounds are

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

HEX Switch: Hardware-assisted security extensions of OpenFlow Taejune Park / KAIST /

Positioning Relay Nodes in ISP Networks Meeyoung Cha (KAIST) Sue Moon (KAIST) Chong-Dae Park

SolutionChat: Real-time Moderator Support for Chat-based Structured Discussion Sung-Chul Lee

SYNTHESIZING 3D SOUND SYNTHESIZING 3D SOUND AND AND SOUND LOCALIZATION SOUND LOCALIZATION

Sound &amp; Editing Lily, Matt, Mei, Michaela Sound WHAT IS SOUND? An audible vibration of the

Sound 1 Sound &quot;50% of the movie experience is sound - George Lucas Sound is used

A Classification Approach to Single Channel Source Separation CS 6772 Project Ron Weiss

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Sound Synthesis (Part 2) Graduate School of Culture Technology, KAIST Juhan Nam Category of

Sound Slide 2 / 50 Characteristics of Sound Sound can travel through any kind of matter, but

Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4,

1-Day-N-Questions 2016-12-15 Team MEDDLER 20165192 Sunggeun Ahn topmaze@kaist.ac.kr 20165161

The Big Picture: Where are We Now? I/O System Design Issues interrupts Processor Network

Indoor Sound Localization Fares Abawi Universitt Hamburg Fakultt fr Mathematik, Informatik

Outline Successful Multiparty Audio Communication over the Internet Introduction Problems

Score informed audio source separation using a parametric model of non-negative spectrogram

Wave-U-Net A Multi-Scale Neural Network for End-to-End Audio Source Separation DANIEL STOLLER 1 ,

Audio: Generation &amp; Extraction Charu Jaiswal Music Composition which approach? Feed

Coroutines and Reactive Programming friends or foes? Konrad Kami ski Allegro.pl suspend

Mono for Game Developers Miguel de Icaza miguel@xamarin.com,

Sound & Editing Lily, Matt, Mei, Michaela Sound WHAT IS SOUND? An audible vibration of the

Sound 1 Sound "50% of the movie experience is sound - George Lucas Sound is used

Audio: Generation & Extraction Charu Jaiswal Music Composition which approach? Feed