drum transcription via joint beat and drum modeling using
play

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING - PowerPoint PPT Presentation

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl richard.vogl@tuwien.ac.at ifs.tuwien.ac.at/~vogl 21 st Vienna Deep Learning Meetup 15 th of October 2018 Institute of Computational Perception DRUM


  1. WHY IS CONTEXT RELEVANT? Instruments from the same class often sound quite different 
 Similar sound for different instruments ♫ ♫ snare drums: crash v.s. splash: When humans transcribe drums ‣ Function in a track equally important (snare drum v.s. backbeat) ‣ Inaudible onsets will be filled in if expected � 13

  2. WHY IS CONTEXT RELEVANT? Instruments from the same class often sound quite different 
 Similar sound for different instruments ♫ ♫ snare drums: crash v.s. splash: When humans transcribe drums ‣ Function in a track equally important (snare drum v.s. backbeat) ‣ Inaudible onsets will be filled in if expected Music Language Model � 13

  3. BASS DRUM OR LOW TOM? ♫ ♫ ? 1: bass drum 2: floor tom 3 : ? ? ? � 14

  4. BASS DRUM OR LOW TOM? ♫ ♫ ? 1: bass drum 2: floor tom 3 : ? ? ? � 14

  5. BASS DRUM OR LOW TOM? ♫ ♫ ? 1: bass drum 2: floor tom 3 : ? ? ? � 14

  6. BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : ? ? ? � 14

  7. BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : ? ? ? � 14

  8. BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : bass drum � 14

  9. BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : bass drum � 14

  10. DATASETS � 15

  11. DATASETS ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 � 15

  12. DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 � 15

  13. DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ � 15

  14. DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ ENST solo (harder!) � 15

  15. DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ . c c a T S N ENST solo E ) ! t l u c i f f i d (harder!) ( � 15

  16. NETWORK MODELS Frames Context Conv. Layers Rec. Layers Dense Layers RNN (S) 100 — — 2x50 GRU — RNN (L) 400 — — 3x30 GRU — Architecture CNN (S) — 9 — 2x256 2 x 32 3x3 filt. 
 3x3 max pooling 
 CNN (L) — 25 — 2x256 2 x 64 3x3 filt. 
 CRNN (S) 100 9 2x50 GRU — 3x3 max pooling 
 all w/ batch norm. CRNN (L) 400 13 3x60 GRU — tsRNN baseline [Vogl et al. ICASSP’17] Early stopping Dropout Batch normalization ADAM optimizer L2 norm � 16

  17. accompaniment SMT ENST with 
 SMT ENST acc. ENST solo RESULTS 100 90 tsRNN F-measure [%] RNN (S) RNN (L) CNN (S) 80 CNN (L) CRNN (S) CRNN (L) 70 60 ENST solo � 17

  18. HOW DOES IT SOUND? “Punk” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18

  19. HOW DOES IT SOUND? “Punk” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18

  20. HOW DOES IT SOUND? “Hendrix” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18

  21. HOW DOES IT SOUND? “Hendrix” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18

  22. HOW DOES IT SOUND? Alexa, play some music… hi-hat snare bass ♫ ♫ ♫ � 18

  23. HOW DOES IT SOUND? Alexa, play some music… hi-hat snare bass ♫ ♫ ♫ � 18

  24. PART 1 AUTOMATIC DRUM TRANSCRIPTION Task Definition, Problem Modeling, Architectures PART 2 MULTI-TASK LEARNING Metadata for Transcripts

  25. LIMITATIONS OF CURRENT SYSTEMS � 20

  26. LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts 
 drum onset detection vs drum transcription � 20

  27. LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines � 20

  28. LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo � 20

  29. LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter � 20

  30. LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents � 20

  31. LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique � 20

  32. LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique Only three instrument classes Richard Vogl, Gerhard Widmer, and Peter Knees, “ Towards multi-instrument drum transcription ,” in Proc. 21th Intl. Conf. on Digital Audio Effects (DAFx18), Aveiro, Portugal, Sep. 2018. � 20

  33. LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique Only three instrument classes Richard Vogl, Gerhard Widmer, and Peter Knees, “ Towards multi-instrument drum transcription ,” in Proc. 21th Intl. Conf. on Digital Audio Effects (DAFx18), Aveiro, Portugal, Sep. 2018. � 20

  34. ADDITIONAL INFORMATION FOR TRANSCRIPTS HH 
 SD 
 BD t � 21

  35. ADDITIONAL INFORMATION FOR TRANSCRIPTS HH 
 SD 
 BD t � 21

  36. ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 HH 
 SD 
 BD t � 21

  37. ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 HH 
 SD 
 BD t � 21

  38. ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH 
 SD 
 BD t � 21

  39. ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH 
 SD 
 tempo ‣ BD t � 21

  40. ADDITIONAL INFORMATION FOR TRANSCRIPTS 4/4 Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH 
 SD 
 tempo ‣ BD meter ‣ t � 21

  41. ADDITIONAL INFORMATION FOR TRANSCRIPTS ✔ 4/4 Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH 
 SD 
 tempo ‣ BD meter ‣ t � 21

  42. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH 
 SD 
 BD t � 22

  43. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH 
 SD 
 BD t Beats are highly correlated with drum patterns 
 (drum onset locations / repetitive patterns) � 22

  44. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH 
 SD 
 BD t Beats are highly correlated with drum patterns 
 (drum onset locations / repetitive patterns) Assume that prior knowledge of beats is helpful for drum transcription � 22

Recommend


More recommend