DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING - PowerPoint PPT Presentation

DRUM TRANSCRIPTION VIA   JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl 1,2 , Matthias Dorfer 2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, matthias.dorfer@jku.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2

WHAT IS DRUM TRANSCRIPTION? Input: western popular music containing drums Output: symbolic representation of notes played by drum instruments 2

WHAT IS DRUM TRANSCRIPTION? Focus on the three major drum instruments: ‣ bass or kick drum ( KD ) ‣ snare drum ( SD ) ‣ hi-hat ( HH ) SD HH KD Reasons: ‣ Dominant instruments: most onsets ‣ Common subset for public datasets 3

SYSTEM OVERVIEW NN   feature extraction   signal peak picking event detection preprocessing classification audio events NN training 4

SYSTEM OVERVIEW NN   feature extraction   signal peak picking event detection preprocessing classification audio events NN training spectrogram f [Hz] t [s] 4

SYSTEM OVERVIEW NN   feature extraction   signal peak picking event detection preprocessing classification audio events NN training spectrogram activation functions f [Hz] t [s] t [s] 4

ISSUES OF CURRENT SYSTEMS 5

ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music 5

ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts   drum onset detection vs drum transcription 5

ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines 5

ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo 5

ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter 5

ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents 5

ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique 5

ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique Only three instrument classes etc. 5

ADDITIONAL INFORMATION FOR TRANSCRIPTS HH   SD   KD t 6

ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: HH   SD   KD t 6

ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: beats 2 3 4 1 2 3 4 1 bars lines ‣ HH   SD   tempo ‣ KD meter ‣ t 6

ADDITIONAL INFORMATION FOR TRANSCRIPTS ✔ Use beat and downbeat tracking to get: beats 2 3 4 1 2 3 4 1 bars lines ‣ HH   SD   tempo ‣ KD meter ‣ t 6

IMPROVE PERFORMANCE Three components to reach this goal: 1. Leverage beat information 2. Better model for drum detection 3. Dataset with real music for training 7

1. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH   SD   KD t 8

1. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH   SD   KD t Beats are highly correlated with drum patterns 8

1. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH   SD   KD t Beats are highly correlated with drum patterns Assume that prior knowledge of beats is helpful for drum transcription   (drum hit locations / repetitive patterns) 8

1. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH   SD   KD t Beats are highly correlated with drum patterns Assume that prior knowledge of beats is helpful for drum transcription   (drum hit locations / repetitive patterns) Use multi-task learning for beats and drums 8

MULTI-TASK LEARNING input output f [Hz] t [s] 9

MULTI-TASK LEARNING input output f [Hz] t [s] Three experiments: 9

MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) 9

MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) ‣ Training on drum targets with annotated beats as additional input features ( BF ) 9

MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) ‣ Training on drum targets with annotated beats as additional input features ( BF ) ‣ Training on drum and beat targets as multi-task problem ( MT ) 9

MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) ‣ Training on drum targets with annotated beats as additional input features ( BF ) ‣ Training on drum and beat targets as multi-task problem ( MT ) Expected increase in performance for BF compared to DT 9

MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) ‣ Training on drum targets with annotated beats as additional input features ( BF ) ‣ Training on drum and beat targets as multi-task problem ( MT ) Expected increase in performance for BF compared to DT Expected increase in performance for MT compared to DT 9

2. NETWORK MODELS — BASELINE MODELS 10

2. NETWORK MODELS — BASELINE MODELS Recurrent neural networks 10

2. NETWORK MODELS — BASELINE MODELS Recurrent neural networks ‣ Recurrent connections act as memory ‣ Processing of sequential data RNN train data sample 10

2. NETWORK MODELS — BASELINE MODELS Recurrent neural networks ‣ Recurrent connections act as memory ‣ Processing of sequential data ‣ Work well for drum detection and beat tracking   [Böck et al. ISMIR’16]   RNN train data sample 10

2. NETWORK MODELS — BASELINE MODELS Recurrent neural networks ‣ Recurrent connections act as memory ‣ Processing of sequential data ‣ Work well for drum detection and beat tracking   [Böck et al. ISMIR’16]   RNN with label time shift ( tsRNN )   state-of-the-art baseline [Vogl et al. ICASSP’17]   Bidirectional recurrent NN ( BDRNN )   [Vogl et al. ISMIR’16] [Southall et al. ISMIR’16] RNN train data sample ‣ Similar performance tsRNN 10

2. NETWORK MODELS — NEW FOR DT 11

2. NETWORK MODELS — NEW FOR DT Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds CNN train data sample 11

2. NETWORK MODELS — NEW FOR DT Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds Convolutional BDRNN ( CRNN ) ‣ ”best of both worlds” ‣ Low-level CNN for acoustic modeling ‣ Higher-level RNN for repetitive pattern modeling CRNN train data sample 11

NETWORK MODELS Frames Context Conv. Layers Rec. Layers Dense Layers BDRNN (S) 100 — — 2x50 GRU — BDRNN (L) 400 — — 3x30 GRU — CNN (S) — 9 — 2x256 2 x 32 3x3 filt.   3x3 max pooling   CNN (L) — 25 — 2x256 2 x 64 3x3 filt.   CRNN (S) 100 9 2x50 GRU — 3x3 max pooling   all w/ batch norm. CRNN (L) 400 13 3x60 GRU — tsRNN state-of-the-art baseline [Vogl et al. ICASSP’17] 12

CLASSIC DATASETS (ONLY DRUMS) 13

CLASSIC DATASETS (ONLY DRUMS) IDMT-SMT-Drums [Dittmar and Gärtner 2014] ♫ ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 + training samples 13

CLASSIC DATASETS (ONLY DRUMS) IDMT-SMT-Drums [Dittmar and Gärtner 2014] ♫ ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 + training samples ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 + training samples ♫ ♫ 13

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING - PowerPoint PPT Presentation

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl 1,2 , Matthias Dorfer 2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, matthias.dorfer@jku.at, gerhard.widmer@jku.at,

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

DRUM SHADE HAY Drum Shade is a fabric covered light shade with a laminated textile onto a

Automatic Drum Transcription E6820 Project Proposal Ron Weiss ronw@ee.columbia.edu Automatic

Combining Temporal And Spectral Features in HMM-based Drum Transcription Jouni Paulus, Anssi

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1

Good morning, it is my pleasure to introduce you to DRUM for UHC. DRUM is the brainchild of

DRUM TRANSCRIPTION FROM POLYPHONIC MUSIC WITH RECURRENT NEURAL NETWORKS Richard Vogl 1,2 ,

Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks Patrick Schwab,

GRASS VALLEY CHP Beat 31 Beat 31 is SR-49 starting at the Bear River (Nevada / Placer County

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

ROBOD: a Real-time Online Beat and Offbeat Drummer ock 1 , Florian Krebs 1 , 2 , Amaury Durand 3 ,

Beat the Street Torbay 19 th September 31 st October Beat the Street turns your whole area

The Biopharmaceutical Industrys Efforts to Beat Coronavirus Sharon Lamberton, MS, RN (State

Music transcription via convex optimization Song Mei ICME, Stanford June 3, 2015 Song Mei

TEN HAND PIANO PSOs @ CASA DA MSICA lvaro Barbosa http://www.abarbosa.org/ 2000/2006

Feature-Speci fi c Pro fi ling Vincent St-Amour Leif Andersen Matthias Felleisen PLT @

Timed and Hybrid Systems 2009-2014 Oded Maler CNRS - VERIMAG Grenoble, France November 2014

targets for actinide transmutation Philippe MARTIN CEA Marcoule / N uclear E nergy D ivision,

Topics in Combinatorial Optimization Orlando Lee Unicamp 4 de junho de 2014 Orlando Lee

Typical English mistakes The system consist of three main component. Giorgio Buttazzo don't forget

Developing Classical Music Amitai Schlair @schmonz

POTENTIAL IN-KIND CONTRIBUTION TO PIP-II CONSTRUCTION Olivier NAPOLY on behalf of CEA,