towards multi instrument drum transcription
play

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , - PowerPoint PPT Presentation

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2 WHAT IS DRUM TRANSCRIPTION? Input: popular music containing drums


  1. TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2

  2. WHAT IS DRUM TRANSCRIPTION? Input: popular music containing drums Output: symbolic representation of notes played by drum instruments � 2

  3. STATE OF THE ART Current state-of-the-art systems: ‣ End-to-end / activation-function-based approaches ‣ NN based approaches and NMF approaches spectrogram activation functions hi-hat snare bass t [ms] t [ms] Overview Article 
 Wu, C.-W., Dittmar, C., Southall, C.,Vogl, R., Widmer, G., Hockman, J., Müller, M., Lerch, A.: 
 “ An Overview of Automatic Drum Transcription ,” IEEE TASLP, vol. 26, no. 9, Sept. 2018. � 3

  4. FOCUS OF THIS WORK SD HH BD � 4

  5. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) SD HH BD � 4

  6. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets SD HH BD � 4

  7. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important SD HH BD � 4

  8. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

  9. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

  10. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

  11. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

  12. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH Other instruments are important! → Increase number of instruments for drum transcription BD bass drum snare drum hi-hat � 4

  13. SYSTEM OVERVIEW train data NN training NN 
 signal feature extraction 
 peak picking preprocessing event detection classification audio events waveform spectrogram activation functions detected peaks f [Hz] A hi-hat hi-hat snare snare bass bass t [s] t [s] t [s] t [s] � 5

  14. NETWORK ARCHITECTURES � 6

  15. NETWORK ARCHITECTURES Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds CNN train data sample � 6

  16. NETWORK ARCHITECTURES Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds Convolutional RNN ( CRNN ) ‣ ”best of both worlds” ‣ Low-level CNN for acoustic modeling ‣ Higher-level RNN for repetitive pattern modeling CRNN train data sample � 6

  17. NETWORK ARCHITECTURES CNN CRNN Early stopping 2 x conv: 32 x 3x3 (batch norm) 2 x conv: 32 x 3x3 (batch norm) Batch normalization max pool: 1x3 max pool: 1x3 L2 norm Dropout (30%) 2 x conv: 64 x 3x3 (batch norm) 2 x conv: 64 x 3x3 (batch norm) ADAM optimizer max pool: 1x3 max pool: 1x3 2 x dense: 256 3 x RNN: 50 BD GRU frames context conv. layers rec. layers dense layers CNN — 25 — 2x256 see figure CRNN 400 13 3 x 50 BD GRU — � 7

  18. DATASETS � 8

  19. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h � 8

  20. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h � 8

  21. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m � 8

  22. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m � 8

  23. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m RBMA13-Drums [Vogl et al. 2017] ♫ ‣ Music from 2013 Red Bull Music Academy, different styles ‣ 27 tracks, total duration: 1h 43m � 8

  24. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m RBMA13-Drums [Vogl et al. 2017] ♫ ‣ Music from 2013 Red Bull Music Academy, different styles ‣ 27 tracks, total duration: 1h 43m � 8

  25. DATASETS number of classes instrument name 3 8 18 BD BD BD bass drum SD SD SD snare drum SS side stick CLP hand clap HT hight tom MT TT mid tom LT low tom CHH closed hi-hat PHH HH HH pedal hi-hat OHH open hi-hat TB tambourine RD RD ride cymbal RB ride bell BE CB cowbell CRC crash cymbal SPC CY splash cymbal CHC Chinese cymbal CL CL clave/sticks � 9

  26. relative frequency of instrument onsets DATASETS number of classes instrument name 3 8 18 3 BD BD BD bass drum SD SD SD snare drum SS side stick CLP hand clap HT hight tom MT TT mid tom LT low tom 8 CHH closed hi-hat PHH HH HH pedal hi-hat OHH open hi-hat TB tambourine RD RD ride cymbal RB ride bell BE CB cowbell 18 CRC crash cymbal SPC CY splash cymbal CHC Chinese cymbal CL CL clave/sticks � 9

  27. relative frequency of instrument onsets DATASETS number of classes instrument name 3 8 18 3 BD BD BD bass drum SD SD SD snare drum SS side stick CLP hand clap HT hight tom MT TT mid tom LT low tom 8 CHH closed hi-hat PHH HH HH pedal hi-hat OHH open hi-hat TB tambourine RD RD ride cymbal RB ride bell BE CB cowbell 18 CRC crash cymbal SPC CY splash cymbal CHC Chinese cymbal CL CL clave/sticks � 9

  28. SYNTHETIC DATASET NEW! � 10

  29. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs � 10

  30. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment � 10

  31. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) � 10

  32. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) ‣ Varying quality, no vocals ! � 10

  33. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) ‣ Varying quality, no vocals ! ‣ 4197 tracks, total duration: 259h ♫ � 10

  34. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) ‣ Varying quality, no vocals ! ‣ 4197 tracks, total duration: 259h ♫ � 10

  35. relative frequency of instrument onsets SYNTHETIC DATASET 3 8 18 � 11

  36. relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution 8 18 � 11

  37. relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution − same bias for instruments same problems during training 8 18 � 11

  38. relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution − same bias for instruments same problems during training + datasets are representative samples 8 18 � 11

  39. BALANCING OF SYNTHETIC DATASET � 12

  40. BALANCING OF SYNTHETIC DATASET Swap instruments for individual tracks � 12

  41. BALANCING OF SYNTHETIC DATASET Swap instruments for individual tracks Artificial balancing of instrument distribution � 12

  42. relative frequency of instrument onsets BALANCING OF SYNTHETIC DATASET 3 Swap instruments for individual tracks Artificial balancing of instrument distribution 8 ♫ 18 � 12

  43. relative frequency of instrument onsets BALANCING OF SYNTHETIC DATASET 3 Swap instruments for individual tracks Artificial balancing of instrument distribution 8 ♫ 18 � 12

Recommend


More recommend