automatic transcription and separation of the main melody
play

Automatic Transcription and Separation of the Main Melody from - PowerPoint PPT Presentation

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals Jean-Louis Durrieu, PhD candidate TSI Department, Telecom ParisTech http://perso.telecom-paristech.fr/durrieu/en/ 07/09/09 Automatic Transcription and


  1. Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals Jean-Louis Durrieu, PhD candidate TSI Department, Telecom ParisTech http://perso.telecom-paristech.fr/durrieu/en/ 07/09/09

  2. Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals 1.Introduction 2.Signal Models 3.Transcription of the Melody 4.“Solo/Accompaniment” Separation page 2 direction ou services

  3. Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals 1.Introduction 2.Signal Models 3.Transcription of the Melody 4.“Solo/Accompaniment” Separation page 3 direction ou services

  4. Introduction  Blind Audio Source Separation (BASS) for music and Music Information Retrieval (MIR) : → Inter-related Fields  Polyphonic music recordings: a BASS/MIR hybrid approach to main melody transcription/separation  Applications page 4 direction ou services

  5. Introduction: link between BASS/MIR MIR BASS Approaches Approaches • Perceptually • Based on models motivated • Data-driven • Knowledge driven • “Low-level” • “High-level” (signal level) (semantic level) “Breaking” music into “atomic” elements Separated • Transcription Musical Sources • Indexing page 5 direction ou services

  6. Introduction: Bridging BASS/MIR “gap”  Improving BASS with MIR, and MIR with BASS  2 instruments transcription/separation example: MIR BASS MIR  Hybrid approaches: • E. Vincent , “Musical Source Separation Using Time- Frequency Source Priors”, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, No 1 • Singing voice signals? page 6 direction ou services

  7. Introduction: Main Melody Transcription, Main Instrument Separation  Definitions: • [MIREX] “Audio Melody Extraction”: extract the main melody from polyphonic audio signals. • [Paiva2007]: “[The Main] Melody is the dominant individual pitched line in a musical ensemble”.  Addressing 2 tasks: • Main Melody Transcription: identify and transcribe the sequence of fundamental frequencies played by the main instrument in a polyphonic music signal (mono or stereo), • Main Instrument/Accompaniment Separation: separate the instrument playing the main melody from the other accompaniment instruments. page 7 direction ou services

  8. Introduction: Applications  Transcribed Melody • Indexing large music database, • Musical transcription into “human readable” score, • ...  Separating the Main Instrument from the Accompaniment: • Generate accompaniments for solo performers • Pre-Processing for MIR applications (chord detection, instrument classification, etc.) • ... page 8 direction ou services

  9. Introduction: Presentation Outline  Signal Models Source/Filter model for the main instrument, NMF for the other instruments; estimation algorithm for the corresponding parameters,  Melody transcription Viterbi smoothing of the melody sequence,  Main Instrument/Accompaniment Separation (also referred to as Solo/Accompaniment Separation) Wiener filters to estimate the separated sources,  Conclusion/Discussions page 9 direction ou services

  10. Introduction: System Outline (ICASSP09) page 10 direction ou services

  11. Introduction: Contributors at Telecom ParisTech  Supervisors: • Bertrand DAVID, • Gaël RICHARD.  Team members: • Nancy BERTIN, • Cédric FEVOTTE, • Alexey OZEROV, • And all the other Audiosig project team members... page 11 direction ou services

  12. Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals 1.Introduction 2.Signal Models 3.Transcription of the Melody 4.“Solo/Accompaniment” Separation page 12 direction ou services

  13. Signal Models  Audio signals: • Time-Frequency Representation, • Statistical modeling.  Mixture model  Source/Filter model for the main instrument • Motivations • Characterizing the main melody instrument  NMF -based model for the accompaniment • Decomposition on limited dictionary • Link between NMF and our framework  Parameter estimation • NMF -like algorithm: multiplicative gradient approach page 13 direction ou services

  14. Signal Models: Time-Frequency Representation  Digital audio: waveform  Time-frequency representation: • Evolution of frequency content, Frequency (Hz) • Human auditory system.  Short-Time Fourier Transform (STFT): Time (s) page 14 direction ou services

  15. Signal Models: Complex Proper Gaussians  Model for complex spectrum:  Independence across time and frequencies:  For stationary processes, power spectrum density (PSD) of  Variance/PSD matrix s.t. page 15 direction ou services

  16. Signal Models: Mixture Model  Mixture = Solo + Accompaniment Voice Music  Each signal centered-Gaussian, with resp. variances :  Independence between V and M: NMF decomposition of the power spectrum Source/Filter model page 16 direction ou services

  17. Signal Models: Mixture Model  Mixture = Solo + Accompaniment Voice Music  Each signal centered-Gaussian, with resp. variances :  Independence between V and M: NMF decomposition of the power spectrum Source/Filter model page 17 direction ou services

  18. Signal Models: Source/Filter Model for the Main Instrument  Motivations: • Singing voice often main instrument, • Source/Filter widely used, suitable for wide range of other instruments, • Separately modeling pitched aspects (source) from timbre aspects (filter). Human vocal tract (from Wikipedia) page 18 direction ou services

  19. Signal Models: Source/Filter Principle (Glottal) (Vocal Tract) Source Filter Frequency (Hz) Frequency (Hz) Frequency (Hz) page 19 direction ou services

  20. Signal Models: Source/Filter Variability A Vocal Signal (by Tamy - from MTG MASS database) Frequency (Hz) Time (s) page 20 direction ou services

  21. Signal Models: Source/Filter Variability  Human singer: • Independent evolution of pitches and filters (vowel), • Continuous pitch variations, • Limited set of vowels (smooth filters), • Unvoiced parts...  Proposed Model for Main Instrument: • Discrete range of possible for voiced source component, log-spaced s.t. 96 per octave, • Limited number of “smooth” filters , • Unvoiced source component integrated later in the estimation process. page 21 direction ou services

  22. Signal Models: Source Component (1/2)  Voiced source component : • KLGLOTT88 (Glottal source) model, [Klatt90]: spectral comb dictionary , “notes”, • Freq. , Pitch : power spectrum , • Pitch , Frame : activation coefficients , • Nonnegative combination of the element of the dictionary  Unvoiced source • In dictionary , “unvoiced” component such that: • Activation coefficient estimated only after filter part. page 22 direction ou services

  23. Signal Models: Source Component (2/2) f0 number Frequency (Hz) Frequency (Hz) Time (s) f0 number page 23 direction ou services

  24. Signal Models: Filter Component (1/2)  Filter component: • Dictionary of filters , • Freq. , Filter number : freq. response , • Filter , Frame : activation , • Combination:  Filter smoothness: • Decomposition on spectral dictionary of smooth “atomic” elements , activations , • That is to say: page 24 direction ou services

  25. Signal Models: Filter Component (2/2) filter p Time (s) Frequency (Hz) Frequency (Hz) Frequency (Hz) Time (s) page 25 direction ou services

  26. Signal Models: Source/Filter Summary  Source contribution:  Filter contribution:  Main Instrument contribution to the mixture power spectrum:  Parameters: • Fixed parameters: dictionaries and • To estimate : page 26 direction ou services

  27. Signal Models: Mixture Model  Mixture = Solo + Accompaniment Voice Music  Each signal centered-Gaussian, with resp. variances :  Independence between V and M: NMF decomposition of the power spectrum Source/Filter model page 27 direction ou services

  28. Signal Models: Accompaniment (1/2)  Accompaniment/Background Music component : • Power spectrum dictionary , with elements, • Activation matrix , • Nonnegative combination of the element of the dictionary  Equivalence between [Fevotte09]: • Maximum Likelihood (ML) estimation of and with • NMF minimizing the Itakura-Saito divergence between and the matrix product page 28 direction ou services

  29. Signal Models: Accompaniment (2/2) r Frequency (Hz) Frequency (Hz) Time (s) r page 29 direction ou services

  30. Signal Models: Mixture model summary  Mixture variance/PSD matrix: • Main Instrument: • Accompaniment: • Mixture:  Parameters: • Fixed Parameters: • To be estimated: page 30 direction ou services

  31. Signal Models: Parameter Estimation  Maximum Likelihood (ML) estimation: • Log-likelihood of the observations : • With the parameterized variance:  NMF inspired algorithm: • Itakura-Saito divergence between and • Multiplicative updates for parameter estimation page 31 direction ou services

Recommend


More recommend