Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein CMU UC Berkeley
Piano Music Transcription note time
Supervised Transcription
Supervised Transcription w Model
Supervised Transcription
Learning to Transcribe Learning w
Prediction w ? Model
Prediction w Model
Piano Sounds
Piano Sounds
Piano Sounds
Piano Sounds
Piano Sounds
Piano Sounds
Piano Sounds freq time
Spectral Shape freq time
Spectral Shape freq time
Spectral Shape freq time
Spectral Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Temporal Shape freq time
Polyphony
Polyphony . . .
Unsupervised Transcription
Unsupervised Transcription Audio signal Symbolic Music
Unsupervised Transcription Learning Piano Parameters ? Audio signal Generative Model Symbolic ? Music
Unsupervised Transcription Learning Piano Parameters Generative ? Model Audio signal Symbolic ? Music
Generative Model note n time Note events velocity M n time
Generative Model Parameters Latent variables Note events PLAY REST M n µ n time duration velocity Activation A n α n time time Component spectrogram freq freq S n σ n time Spectrogram X freq time
Note Event Model M n µ n PLAY Event type REST PLAY REST PLAY E 1 E 2 E 3 Duration duration D 1 D 2 D 3 Velocity velocity V 1 V 2 V 3
Activation Model D 1 D 2 D 3 V 1 V 2 V 3
Activation Model Temporal α n shape D 1 D 2 D 3 V 1 V 2 V 3 copy temporal shape Activation A n
Activation Model Temporal α n shape D 1 D 2 D 3 V 1 V 2 V 3 truncate to duration Activation A n
Activation Model Temporal α n shape D 1 D 2 D 3 V 1 V 2 V 3 scale to velocity Activation A n
Activation Model Temporal α n shape D 1 D 2 D 3 V 1 V 2 V 3 add Gaussian noise Activation A n
Component Spectrogram Model Activation A n Spectral shape Poisson noise σ n S n Component spectrogram
Total Spectrogram Model A 1 A N σ 1 σ N . . . + S 1 S N X Total spectrogram
Learning and Inference Parameters Latent variables Note events PLAY REST M n µ n time duration velocity Activation A n α n time time Component spectrogram freq freq S n σ n time Spectrogram X freq time
Learning and Inference Note events update: Semi-Markov dynamic program M | A, α , µ Temporal shapes update: Closed form update α | A, M Activations update: Exponentiated gradient ascent A | M, X, α , σ Spectral shapes update: Exponentiated gradient ascent σ | A, X
Evaluation Onset F1 note time
Results MAPS Corpus 80 82.1 Onset F1 70 70.4 69.0 68.6 60 58.3 50 O’Hanlon Benetos Vincent Supervised Unsupervised* 2014 2014 2013 [Valentin et al. 2010] [Berg-Kirkpatrick et al. 2014]
Transcription Reference Predicted
Resynthesized Examples Grieg input Grieg resynth piano Grieg resynth guitar
Demo Demo!
Resynthesized Examples Chopin input Chopin resynth piano Chopin resynth guitar
Resynthesized Examples Beethoven input Beethoven resynth piano Beethoven resynth guitar
Recommend
More recommend