 
              Improving melody extraction using Probabilistic Latent Component Analysis Jinyu. Han 1 Ching-Wei. Chen 2 1 Interactive Audio Lab Northwestern Univsersity, USA 2 Media Technology Lab Gracenote, Inc May 19, 2011 Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 1 / 15
Agenda Introduction 1 Modeling the Spectrogram 2 Multinomial Model Probabilistic Latent Component Analysis System Description 3 Experiment Results 4 Illustration Example System Comparison Conclusion 5 Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 2 / 15
Introduction Pick only the singing voice as the Melody Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 3 / 15
Introduction System Overview !"#$%& '$()*+& '$)($)(&.%$/0&6020/7%)& ,%)-.%/*+& .%/*+& '0(10)23& '0(10)23& !//%14*)$10)2& 5%#0+&89*$)$)(& !//%14*)$10)2& 5%#0+& !//%14*)$10 !//%14*)$10)2& 50+%#>& ;$2/<&=371*7%)& )2-3"4490330#& :0#"/7%)& '$()*+& Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 4 / 15
Modeling the Spectrogram Multinomial Model Multinomial Distribution for Spectrogram Figure: Probability distribution underlying the t -th spectrum -./*"0)1$%23$+0'4/'.#% 23$+0')#%45%5'.#$% !" &'$()$*+,% 6#37"0)1$% !"#$% &'$()$*+,% &'.#$% !" Treat the spectrum in each time slice as a histogram Treat the histogram as a probability distribution Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 5 / 15
Modeling the Spectrogram Multinomial Model Multinomial Distribution for Spectrogram -./*"0)1$%23$+0'4/'.#% 23$+0')#%45%1"6$'$*0%5'.#$7 ! &'$()$*+,% 8#39"0)1$% !"#$% &'$()$*+,% Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 6 / 15
Modeling the Spectrogram Probabilistic Latent Component Analysis 6)78$9*4,&:)*+))!"#$%&,;)<#$%*&=) -.+50 1) /) !"#$%&'()*+)+&,(#) !"# - % .+ ) /) -.+50 3) /) >8?%'&#)@#8AB%) -.+50 2) /) -.+50 4) /) 0 4) 0 1) 0 2) 0 3) H=9(,%#7)F:)H?"#$%,9*4I >,?8(80,9*4)6;A*&8%B() EF=#&G#7)7,%,)84)%B#) C,%#4%)D*("*4#4%) ="#$%&*A&,() Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 7 / 15
System Description System Overview !"#$%& '$()*+& '$)($)(&.%$/0&6020/7%)& ,%)-.%/*+& .%/*+& '0(10)23& '0(10)23& !//%14*)$10)2& 5%#0+&89*$)$)(& !//%14*)$10)2& 5%#0+& !//%14*)$10 !//%14*)$10)2& 50+%#>& ;$2/<&=371*7%)& )2-3"4490330#& :0#"/7%)& '$()*+& Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 8 / 15
System Description Train P nv ( f | z ) from the non-vocal segment 23#&45#3)67&897'/:);#1<#&=)) *+,-)."/0&0&1) >?#'="/:)@#'=7";)A7") -''7<?/&0<#&=) ! "# !"#$%#&'() Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 9 / 15
System Description Extract singing voice in the mixture /8,:;%-'()*89%! 0 12345 ) *+#&,-#+)./'01)2#34#&5)) ! "# !"#$%&'()*)*+%,)-.% ! /0 12345%6789% !"#$%#&'() <)789%! /0% 12345 ) ! "# !"#$%#&'() Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 10 / 15
System Description Extract singing voice in the mixture !"#$%&'()*+",&- . /0123 ) ! "# !"#$%#&'() 4*5",&- !.& /0123 ) ! "# !"#$%#&'() Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 11 / 15
Experiment Results Illustration Example 5)678*)&1+9)/:);<+=91')2%,>)?3)43,3$()<@3,3$()) Mixture !"#$%&#'()*+,-+,-)./+&') Extracted Voice B$'CD',&3) 0$+-+,%1)*+,-+,-)./+&') Clean Voice 2'1/(3)4+,') A+=') Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 12 / 15
Experiment Results System Comparison Compare out system to DHP[1] and LW[2] Precision Recall F-measure Accuracy DHP 0.48 0.50 0.48 0.52 LW 0.09 0.086 0.09 0.19 Proposed 0.43 0.80 0.55 0.61 Parts of MIREX 2005 dataset: 9 recordings, totalling about 270 seconds of autio. Z. Duan, J. Han, and B. Pardo, “Harmonically informed pitch tracking”,in Proc. ISMIR, 2009. Y. Li and D. Wang, “Separation of singing voice from music accompaniment for monaural recordings”, IEEE Trans. Audio, Speech, and Language Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 13 / 15
Conclusion Conclusion The Probabilistic Latent Variable Model is introduced to model the accompaniment and lead vocal adaptively Experimental results show that the melody of the singing voice in mixture aduio is successfully extracted to some extent. Future directions include improving the vocal / nonvocal segementation module and the pitch estimation algorithm. Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 14 / 15
Conclusion Acknowledgement The first author performed this work with Ching-Wei Chen while at the Gracenote Media Technology Lab. We thank Markus Cremer, Bob Coover, Phillip Popp, Trista Chen, and Peter Dunker for enlightening discussions. The authors would like to thank the reviewers for their comments that help improve the paper. We also want to thank Bryan Pardo, David Little, Zhiyao Duan, Zafar Rafii, and Mark Cartwright for their suggestions that improve the presentation. Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 15 / 15
Recommend
More recommend