Neural Probabilistic Models for Melody Prediction, Sequence - PowerPoint PPT Presentation

Neural Probabilistic Models for Melody Prediction, Sequence Labelling and Classification Srikanth Cherla https://cherla.org September 13, 2017 1 / 47

Outline 1 Introduction: Analysis of Sequences in Music 2 Preliminaries: Restricted Boltzmann Machines, etc. 3 Contribution: The Recurrent Temporal Discriminative RBM 4 Extension: Generalising the RTDRBM 5 Contribution: Generalising the DRBM 2 / 47

Next 1 Introduction: Analysis of Sequences in Music 2 Preliminaries: Restricted Boltzmann Machines, etc. 3 Contribution: The Recurrent Temporal Discriminative RBM 4 Extension: Generalising the RTDRBM 5 Contribution: Generalising the DRBM 3 / 47

Sequences in Notated Music • A wealth of information in notated music • Increasingly available • in different formats (MIDI, Kern, GP4, etc.) • for different kinds of music (classical, rock, pop, etc.) • Analysis of sequences key to extracting information • Melody — Good starting point for a broader analysis 4 / 47

Relevance Scientific: • Computational musicology • Organizing music data • Aiding acoustic models • Music education Creative: • Automatic music generation • Compositional assistance 5 / 47

Task: Melody Prediction • Model a series of musical events s T 1 as follows T � � s t | s ( t − 1) � � s T � p = p 1 ( t − n +1) t =1 • Conditional probabilities learned from a corpus • Information theoretic measure - cross entropy , to measure a trained model’s prediction uncertainty T � � � � w t | w ( t − 1) w t | w ( t − 1) � H ( p, p m ) = − log 2 p m p ( t − n +1) ( t − n +1) t =1 • How well does a model p m approximate p ? • Cross entropy to be minimized 6 / 47

Motivating Distributed Models • Previous work focused on n -gram models • No comparative results with other prediction models • Thriving neural networks research (Bengio, 2009) • Recent success of neural network language models (Bengio 2003; Collobert et al., 2011; Mikolov et al., 2010) Start with an evaluation of connectionist models on the melody prediction task 7 / 47

Restricted Boltzmann Machine (Smolensky, 1986) • Generative, energy-based graphical model. • Data v in visible layer, features h in hidden layer. • Can model joint probability p ( v ) of data as exp( − FreeEnergy( v )) p ( v ) = v ∗ exp( − FreeEnergy( v ∗ )) � where, FreeEnergy( v ) = − log( � h exp( − Energy( v , h ))) • Learned using Contrastive Divergence (Hinton, 2002). h W v s ( t − n +1: t ) 9 / 47

Discriminative RBM (Larochelle & Bengio, 2008) • Discriminative classifier based on the RBM. • Data x and class-label y in visible layer. • Can model the conditional probability p ( y | x ) as exp( − FreeEnergy( x , y )) p ( y | x ) = y ∗ exp( − FreeEnergy( x , y ∗ )) � • Exact gradient computation is possible. h V U x y s ( t − n +1: t − 1) s ( t ) 10 / 47

Recurrent Temporal RBM (Sutskever et al., 2009) • Generative model for high-dimensional time-series. • RBM at time t conditioned on ˆ h ( t − 1) • Models joint probability of a sequence as � p ( v (1: T ) , h (1: T ) ) = p ( v ( t ) | h ( t − 1) ) p ( h ( t ) | v ( t ) , h ( t − 1) ) t • Learned using Contrastive Divergence and BPTT. W hh W hh h (0) c (1) h (1) c (2) h (2) . . . W hv W hv W W b (1) b (2) v (1) v (2) . . . s (0:1) s (1:2) 11 / 47

Motivation • Discriminative inference on generative RTRBM • Possible to carry out discriminative learning • Previous work suggested potential improvements 13 / 47

Discriminative Learning in the RTRBM (Cherla et al., 2015) Extend DRBM learning to a recurrent model p ( y ( t ) | x (1: t ) ) = p ( y ( t ) | x ( t ) , ˆ h ( t − 1) ) exp( − FreeEnergy( x ( t ) , y ( t ) )) = y ∗ exp( − FreeEnergy( x ( t ) , y ∗ )) � W hh W hh h (0) c (1) h (1) c (2) h (2) . . . W hv W hv W U W U b (1) b (2) x (1) y (1) x (2) y (2) . . . s (0) s (1) s (1) s (2) 14 / 47

Discriminative Learning in the RTRBM (Cherla et al., 2015) Apply to an entire sequence to optimize the log-likelihood: O = log p ( y (1: T ) | x (1: T ) ) T log p ( y ( t ) | x ( t ) , ˆ � h ( t − 1) ) = t =1 W hh W hh h (0) c (1) h (1) c (2) h (2) . . . W hv W hv W U W U b (1) b (2) x (1) y (1) x (2) y (2) . . . s (0) s (1) s (1) s (2) 15 / 47

Discriminative Learning in the RTRBM (Cherla et al., 2015) • Recurrent extension of the DRBM. • Identical in structure to the RTRBM. • Exact gradient of cost computable at each time-step. • Back-Propagation Through Time for sequence learning. W hh W hh h (0) c (1) h (1) c (2) h (2) . . . W hv W hv W U W U b (1) b (2) x (1) y (1) x (2) y (2) . . . s (0) s (1) s (1) s (2) 16 / 47

Experiments: Melody Corpus Corpus • As used in (Pearce & Wiggins, 2004). • A collection of 8 datasets. • Folk songs from the Essen Folk Song Collection. • Chorale melodies. Dataset No. events | χ | Yugoslavian folk songs 2691 25 Alsatian folk songs 4496 32 Swiss folk songs 4586 34 Austrian folk songs 5306 35 German folk songs 8393 27 Canadian folk songs 8553 25 Chorale melodies 9227 21 Chinese folk songs 11056 41 17 / 47

Experiments: Melody Corpus Models • Non-recurrent: n -grams (b), n -grams (u), FNN, RBMs, DRBMs with context length ∈ { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 } . • Recurrent: RNN, RTRBM, RTDRBM over entire sequences. • Hidden units ∈ { 25 , 50 , 100 , 200 } • Learning rate ∈ { 0 . 01 , 0 . 05 } • Trained for 500 epochs. • Best model determined over a validation set. Evaluation criterion — cross-entropy 1 ∈D test log 2 p mod ( s n | s ( n − 1) − � ) sn 1 H c ( p mod , D test ) = |D test | 18 / 47

Results 3 . 1 n − gram ( b ) F NN DRBM 3 RBM Cross Entropy n − gram ( u ) RNN 2 . 9 RT RBM RT DRBM 2 . 8 0 2 4 6 8 Context length In general, performance improves with context length. 19 / 47

Results 3 . 1 n − gram ( b ) F NN DRBM 3 RBM Cross Entropy n − gram ( u ) RNN 2 . 9 RT RBM RT DRBM 2 . 8 0 2 4 6 8 Context length n -gram model performance worsens at lower context length. 20 / 47

Results 3 . 1 n − gram ( b ) F NN DRBM 3 RBM Cross Entropy n − gram ( u ) RNN 2 . 9 RT RBM RT DRBM 2 . 8 0 2 4 6 8 Context length Non-recurrent connectionist models outperform n -grams. 21 / 47

Results 3 . 1 n − gram ( b ) F NN DRBM 3 RBM Cross Entropy n − gram ( u ) RNN 2 . 9 RT RBM RT DRBM 2 . 8 0 2 4 6 8 Context length Recurrent connectionist models outperform non-recurrent. 22 / 47

Results 3 . 1 n − gram ( b ) F NN DRBM 3 RBM Cross Entropy n − gram ( u ) RNN 2 . 9 RT RBM RT DRBM 2 . 8 0 2 4 6 8 Context length RTDRBM outperforms RTRBM. 23 / 47

Results 3 . 1 n − gram ( b ) F NN DRBM 3 RBM Cross Entropy n − gram ( u ) RNN 2 . 9 RT RBM RT DRBM 2 . 8 0 2 4 6 8 Context length With a shorter context: DRBM outperforms RBM. 24 / 47

Results 3 . 1 n − gram ( b ) F NN DRBM 3 RBM Cross Entropy n − gram ( u ) RNN 2 . 9 RT RBM RT DRBM 2 . 8 0 2 4 6 8 Context length With a longer context: RBM outperforms DRBM. 25 / 47

Results 3 . 1 n − gram ( b ) F NN DRBM 3 RBM Cross Entropy n − gram ( u ) RNN 2 . 9 RT RBM RT DRBM 2 . 8 0 2 4 6 8 Context length More details and discussion available in the paper. 26 / 47

Motivation W hh W hh h (0) c (1) h (1) c (2) h (2) . . . W hv W hv W U W U b (1) b (2) x (1) y (1) x (2) y (2) . . . s (0) s (1) s (1) s (2) h ( t − 1) = σ ( W x ( t − 1) + U y ( t − 1) + c ( t − 1) ) ˆ = σ ( W x ( t − 1) + U y ( t − 1) + W hh ˆ h ( t − 2) + c ) Limitation: Dependence of h ( t ) on y ∗ ( t − 1) which is not suitable for general sequence-labelling problems 28 / 47

Motivation W hh W hh h (0) c (1) h (1) c (2) h (2) . . . W hv W hv W U W U b (1) b (2) x (1) y (1) x (2) y (2) . . . s (0) s (1) s (1) s (2) h ( t − 1) = σ ( W x ( t − 1) + U y ( t − 1) + c ( t − 1) ) ˆ = σ ( W x ( t − 1) + U y ( t − 1) + W hh ˆ h ( t − 2) + c ) Solution: Replace y ∗ ( t − 1) (unavailable at test time) with predicted output y ( t − 1) of previous time-step. 29 / 47

Experiments: OCR Dataset (Taskar et al., 2004) • 6 , 877 English sentences with 52 , 152 words • Each character a 16 × 8 binary image • ASCII code label for each image (26 categories) • 10 cross-validation folds, one hold-out test set Method • Grid search over model hyperparameters • 10-fold cross validation during model selection • Models trained over entire sentences Evaluation: Average Loss Per Sequence   N L i E ( y , y ∗ ) = 1  1 � � � � ( y i ) j � = ( y ∗ I i ) j (1)  N L i i =1 j =1 30 / 47

Neural Probabilistic Models for Melody Prediction, Sequence - PowerPoint PPT Presentation

Neural Probabilistic Models for Melody Prediction, Sequence Labelling and Classification Srikanth Cherla https://cherla.org September 13, 2017 1 / 47 Outline 1 Introduction: Analysis of Sequences in Music 2 Preliminaries: Restricted Boltzmann

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

MELODY TONG GE JINGSI LI SHUO YANG Music programming language .mc .csv .midi

Wipes Clog Pipes! Central Sans Approach to the War on Wipes Melody LaBella Melody LaBella

Overview of CCCSDs Recycled Water Program Melody LaBella, Melody LaBella, P.E. P.E. LAFCO

Improving melody extraction using Probabilistic Latent Component Analysis Jinyu. Han 1 Ching-Wei.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CSCE 496/896 Lecture 11: Structured Prediction and Structured Prediction and Probabilistic

Recurrent Neural Models: Language Models, and Sequence Prediction and Generation CMSC 473/673

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Probabilistic Morphable Models 2019: Hands-on part Ghazi Bouabene Probabilistic Morphable Models

Energy Harvesting: Strategies for ultra low power consumption and reliability. Antonio Rubio and

29/06/2020 1 29/06/2020 The University of Genoa is a generalist university that develops

Online Slip Prediction for Mobile Robots 16831 Project Proposal Neal Seegmiller, Chris

PatchCut: Data-Driven Obje ject Segmentation via Local Shape Transfer Jimei Yang, Brian Price,

Deep Learning: Methods and Applications Chapter 3: Three Classes of Deep Learning Network

Using Bluetooth to Determine Using Bluetooth to Determine Travel Times www.kmjinc.com 1 Agenda

Nam e Standardization Nam e Standardization for Genealogical for Genealogical Record Linkage

MANAGEMENT AND ANALYSIS OF NATIONAL MULTISITE PROGRAM EVALUATION DATA: CENTER FOR SUBSTANCE ABUSE

Sambuz

Useful Links

Newsletter

Mail Us