WHY IS CONTEXT RELEVANT? Instruments from the same class often sound quite different Similar sound for different instruments ♫ ♫ snare drums: crash v.s. splash: When humans transcribe drums ‣ Function in a track equally important (snare drum v.s. backbeat) ‣ Inaudible onsets will be filled in if expected � 13
WHY IS CONTEXT RELEVANT? Instruments from the same class often sound quite different Similar sound for different instruments ♫ ♫ snare drums: crash v.s. splash: When humans transcribe drums ‣ Function in a track equally important (snare drum v.s. backbeat) ‣ Inaudible onsets will be filled in if expected Music Language Model � 13
BASS DRUM OR LOW TOM? ♫ ♫ ? 1: bass drum 2: floor tom 3 : ? ? ? � 14
BASS DRUM OR LOW TOM? ♫ ♫ ? 1: bass drum 2: floor tom 3 : ? ? ? � 14
BASS DRUM OR LOW TOM? ♫ ♫ ? 1: bass drum 2: floor tom 3 : ? ? ? � 14
BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : ? ? ? � 14
BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : ? ? ? � 14
BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : bass drum � 14
BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : bass drum � 14
DATASETS � 15
DATASETS ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 � 15
DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 � 15
DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ � 15
DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ ENST solo (harder!) � 15
DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ . c c a T S N ENST solo E ) ! t l u c i f f i d (harder!) ( � 15
NETWORK MODELS Frames Context Conv. Layers Rec. Layers Dense Layers RNN (S) 100 — — 2x50 GRU — RNN (L) 400 — — 3x30 GRU — Architecture CNN (S) — 9 — 2x256 2 x 32 3x3 filt. 3x3 max pooling CNN (L) — 25 — 2x256 2 x 64 3x3 filt. CRNN (S) 100 9 2x50 GRU — 3x3 max pooling all w/ batch norm. CRNN (L) 400 13 3x60 GRU — tsRNN baseline [Vogl et al. ICASSP’17] Early stopping Dropout Batch normalization ADAM optimizer L2 norm � 16
accompaniment SMT ENST with SMT ENST acc. ENST solo RESULTS 100 90 tsRNN F-measure [%] RNN (S) RNN (L) CNN (S) 80 CNN (L) CRNN (S) CRNN (L) 70 60 ENST solo � 17
HOW DOES IT SOUND? “Punk” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18
HOW DOES IT SOUND? “Punk” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18
HOW DOES IT SOUND? “Hendrix” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18
HOW DOES IT SOUND? “Hendrix” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18
HOW DOES IT SOUND? Alexa, play some music… hi-hat snare bass ♫ ♫ ♫ � 18
HOW DOES IT SOUND? Alexa, play some music… hi-hat snare bass ♫ ♫ ♫ � 18
PART 1 AUTOMATIC DRUM TRANSCRIPTION Task Definition, Problem Modeling, Architectures PART 2 MULTI-TASK LEARNING Metadata for Transcripts
LIMITATIONS OF CURRENT SYSTEMS � 20
LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts drum onset detection vs drum transcription � 20
LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts drum onset detection vs drum transcription ‣ bars lines � 20
LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts drum onset detection vs drum transcription ‣ bars lines ‣ tempo � 20
LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter � 20
LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents � 20
LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique � 20
LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique Only three instrument classes Richard Vogl, Gerhard Widmer, and Peter Knees, “ Towards multi-instrument drum transcription ,” in Proc. 21th Intl. Conf. on Digital Audio Effects (DAFx18), Aveiro, Portugal, Sep. 2018. � 20
LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique Only three instrument classes Richard Vogl, Gerhard Widmer, and Peter Knees, “ Towards multi-instrument drum transcription ,” in Proc. 21th Intl. Conf. on Digital Audio Effects (DAFx18), Aveiro, Portugal, Sep. 2018. � 20
ADDITIONAL INFORMATION FOR TRANSCRIPTS HH SD BD t � 21
ADDITIONAL INFORMATION FOR TRANSCRIPTS HH SD BD t � 21
ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 HH SD BD t � 21
ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 HH SD BD t � 21
ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH SD BD t � 21
ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH SD tempo ‣ BD t � 21
ADDITIONAL INFORMATION FOR TRANSCRIPTS 4/4 Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH SD tempo ‣ BD meter ‣ t � 21
ADDITIONAL INFORMATION FOR TRANSCRIPTS ✔ 4/4 Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH SD tempo ‣ BD meter ‣ t � 21
LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH SD BD t � 22
LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH SD BD t Beats are highly correlated with drum patterns (drum onset locations / repetitive patterns) � 22
LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH SD BD t Beats are highly correlated with drum patterns (drum onset locations / repetitive patterns) Assume that prior knowledge of beats is helpful for drum transcription � 22
Recommend
More recommend