adaptive filtering for music voice separation exploiting
play

Adaptive Filtering for Music/Voice Separation Exploiting the - PowerPoint PPT Presentation

Time-Fequency masking Fixed patterns Varying patterns Demonstration Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure Adaptive REPET Antoine Liutkus 1 , Zafar Rafii 2 , Roland Badeau 1 , Bryan Pardo 2 ,


  1. Time-Fequency masking Fixed patterns Varying patterns Demonstration Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure Adaptive REPET Antoine Liutkus 1 , Zafar Rafii 2 , Roland Badeau 1 , Bryan Pardo 2 , el Richard 1 Ga¨ 1 Telecom ParisTech, CNRS LTCI, Paris, France 2 Northwestern University, EECS Department, Evanston, USA Liutkus, Rafii et al Adaptive REPET ICASSP 2012, Kyoto, Japan

  2. Time-Fequency masking Fixed patterns Varying patterns Demonstration Notation Source separation: notation voice v voice spectrogram V 1 5000 frequency (Hz) 3750 2500 0 1250 −1 0 Background b Background spectrogram B 1 5000 frequency (Hz) 3750 0 2500 1250 −1 0 mix x mix spectrogram X 1 5000 frequency (Hz) 3750 0 2500 1250 −1 0 0 10 20 30 40 50 1000 2000 3000 4000 5000 time (s) frame Liutkus, Rafii et al Adaptive REPET

  3. Time-Fequency masking Fixed patterns Varying patterns Demonstration Notation Separation as an adaptive filter Separating a source = filtering the mixture Time-varying filter w t : different for each frame t Element-wise weighting of the STFT Here: W ∈ [0 1] Time−Frequency mask W mix spectrogram X Weighted mix spectrogram W .* X 5000 5000 5000 3750 3750 3750 frequency (Hz) frequency (Hz) frequency (Hz) 2500 2500 2500 1250 1250 1250 0 0 0 1000 2000 3000 4000 5000 1000 2000 3000 4000 5000 1000 2000 3000 4000 5000 frame frame frame Liutkus, Rafii et al Adaptive REPET

  4. Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-frequency masks Time-Frequency masks interpretation W ( f , t ) ∈ [0 1] : Proportion of the source of interest in the mix. W ( f , t ) ≈ 1 ⇒ TF bin ( f , t ) mostly comes from source of interest W ( f , t ) ≈ 0 ⇒ TF bin ( f , t ) mostly comes from other sources Comb filter Given a pitch contour f 0 ( t ), keep multiples of f 0 ( t ) Time−varying comb−filter 500 frequency (Hz) 0 100 200 300 400 500 600 700 800 900 frame Liutkus, Rafii et al Adaptive REPET

  5. Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-frequency masks Beyond the harmonic model Modeling the accompaniement Most studies focus on harmonic voice models : Voice assumed harmonic and predominant pitch is estimated Filtering e.g. through comb filters Problems : breathy voices ? Consonants ? Loud accompaniement ? We focus on a model for the background B ! Liutkus, Rafii et al Adaptive REPET

  6. Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-frequency masks Filtering given the model From the B to the mask Mask from B alone Imagine X and B are available. What is W ? X ( f , t ) close to B ( f , t ) → W ( f , t ) ≈ 1 X ( f , t ) far from B ( f , t ) → W ( f , t ) ≈ 0 Binary Mask: 0 or 1 based on a thresholding of B X Soft mask: � � − (log X ( f , t ) − log B ( f , t )) 2 W ( f , t ) = exp λ 2 Liutkus, Rafii et al Adaptive REPET

  7. Time-Fequency masking Fixed patterns Varying patterns Demonstration REPET Repeating patterns in music modeling B Musical background is repetitive ! Background spectrogram B and its repeating pattern 7500 frequency (Hz) T 5000 2500 0 frames Given several repetitions, average to estimate B and filter it out ! Liutkus, Rafii et al Adaptive REPET

  8. Time-Fequency masking Fixed patterns Varying patterns Demonstration REPET REpeating Pattern Extraction Technique (REPET) Original REPET algorithm Estimate a fixed repeating period T Estimate the fixed repeating pattern through averaging Compute W as a binary mask Liutkus, Rafii et al Adaptive REPET

  9. Time-Fequency masking Fixed patterns Varying patterns Demonstration Advantages and limitations Advantages and limitations of REPET Advantages Fast Efficient for constant rythmic patterns (electro, short excerpts) Limitations Repeating pattern is changing over time Binary masking leads to artifacts We extend REPET to varying repeating patterns Liutkus, Rafii et al Adaptive REPET

  10. Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-varying period Pseudo-periodic patterns Patterns are not fixed: period may vary pattern may vary Frequency bands of B are assumed pseudo periodic, with the same period log−value of three frequency bands of the spectrogram B Background spectrogram B 5000 frequency (Hz) 3750 band 20 band 40 band 60 2500 1250 0 1000 2000 3000 4000 5000 0 frame Liutkus, Rafii et al Adaptive REPET

  11. Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-varying period Beat-spectrum estimation Estimating the period (1/2) Perform a short-term analysis of each band Add them all together Beat spectrogram : rythmic content of the signal spectrogram of band 1001 100 frequency (1/frame) 80 60 40 20 50 100 150 200 250 bag of frames mix spectrogram X spectrogram of band 401 beat spectrogram 10000 100 100 frequency (1/frame) frequency (1/frame) 80 80 frequency (Hz) 7500 60 60 5000 40 40 2500 20 20 0 1000 2000 3000 4000 5000 50 100 150 200 250 50 100 150 200 250 frame bag of frames bag of frames spectrogram of band 20 100 frequency (1/frame) 80 60 40 20 50 100 150 200 250 Liutkus, Rafii et al bag of frames Adaptive REPET

  12. Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-varying period Pseudo-period estimation Compute the beat spectrogram Estimate the time-varying repeating period Any frequency-based pitch detector will do ! Liutkus, Rafii et al Adaptive REPET

  13. Time-Fequency masking Fixed patterns Varying patterns Demonstration Model and estimation Background model given T 0 ( t ) Background model ∀ t , accompaniement is periodic for 2 K periods around t : B ( f , t ) = B ( f , t + kT 0 ( t )) , k = − K · · · K 7500 frequency (Hz) 5000 2500 0 Liutkus, Rafii et al Adaptive REPET

  14. Time-Fequency masking Fixed patterns Varying patterns Demonstration Model and estimation Voice model Voice model voice V is assumed to be sparse voice spectrogram V 5000 frequency (Hz) 3750 2500 1250 Liutkus, Rafii et al Adaptive REPET

  15. Time-Fequency masking Fixed patterns Varying patterns Demonstration Model and estimation Background estimation estimation of B given X and T 0 ( t ) Sparsity of V Most of the time , V ≈ 0 ⇒ X ≈ B Sometimes, V active ⇒ outliers mix spectrogram X 5000 frequency (Hz) 3750 2500 1250 0 ˆ B ( f , t ) = median [ X ( f , t + kT 0 ( t ))] k = − K ··· K Liutkus, Rafii et al Adaptive REPET

  16. Time-Fequency masking Fixed patterns Varying patterns Demonstration Model and estimation Adaptive REPET Block diagram Liutkus, Rafii et al Adaptive REPET

  17. Time-Fequency masking Fixed patterns Varying patterns Demonstration Demonstration Demonstration on different musical genres Liutkus, Rafii et al Adaptive REPET

  18. Time-Fequency masking Fixed patterns Varying patterns Demonstration Conclusion Adaptive algorithms for complete recordings Fast (approx. reading time) Extensions : from repetitivity to self-similarity Liutkus, Rafii et al Adaptive REPET

Recommend


More recommend