Musical Instrument Classification Using Spiking Neural Networks Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi IIT Bombay November 6, 2015 Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 1 / 26
Overview Introduction 1 Biological Bases Of Musical Perception 2 Timbre The Human Ear Proposed Model 3 Modelling the input The Neural Network Weight Training Rule Observations 4 Conclusions 5 Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 2 / 26
Introduction The human ear is a small physical device with disproportionately large and interesting properties We want to tackle a small problem - Musical Instrument Classification Our ears and brain are very adept at solving this, but conventionally it requires complex signal processing and algorithms We have proposed a simple model for this classification problem and implemented it using spiking neural networks Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 3 / 26
Timbre The ANSI definition of timbre describes it as that attribute which allows us to distinguish between sounds having the same perceptual duration, loudness, and pitch, such as two different musical instruments It can be understood naively as the information containing relative amplitudes of harmonics present in a signal We used Timbre as the basis for classifying different instruments (similar to the ear). Figure: The timbre of the Guitar remains the same even if it is played in a different way Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 4 / 26
The Human Ear Cochlea The cochlea, or inner ear, constitutes the hydrodynamic part of the ear Basilar Membrane The basilar membrane is a flexible gelatinous membrane that divides the cochlea longitudinally, and it contains about 25,000 nerve endings attached to numerous haircells arranged on the surface of the membrane How it works? Patches of hair cells are spatially located according to the propagation of frequency along the Cochlear fluid and thus respond to different frequency ranges, essentially converting the signal’s information into frequency domain. Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 5 / 26
Proposed Model The input model that we have used is inspired by the biological structure of the inner ear while the neural network takes the inspiration from a similar network developed for composer classification [4]. Figure: The final implementation of the network that has been proposed Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 6 / 26
Modelling the input We perform STFT on the input signal Each neuron in the first layer excites for a certain range of frequencies For the LIF network, we use rate encoding (number of spikes is proportional to the amplitude of the harmonic). For the AEF network, we use temporal encoding (time between spikes is proportional to the amplitude of the harmonic). Figure: Simple network for classifying triangular and square waves (Proof of Concept) Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 7 / 26
The Neural Network We use the Leaky-Integrate-and-Fire (LIF) model of a neurons in our network. Figure: A basic block diagram to describe the network Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 8 / 26
Weight Training Rule Avoid excessive charge fed to the network in cases of high amplitudes at excitation frequencies. Figure: This rule helps the network to handle very frequent spiking patterns that may be due to aliasing or louder volume of the sound Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 9 / 26
Demonstration Problem of classifying a square wave from triangular wave of same frequency (different timbre). Figure: Difference in Timbre for a square and triangular wave of same tone Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 10 / 26
Observations Figure: Response of input layer neurons for a triangular wave (we see harmonics decay as square of the harmonic number Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 11 / 26
Observations Figure: Response of second layer neurons for a triangular wave (we see difference of amplitudes of harmonics) the output neuron (last in figure) does not spike Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 12 / 26
Observations Figure: Response of input layer neurons for a square wave (we see that harmonics decay linearly with the harmonic number) Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 13 / 26
Observations Figure: Response of second layer neurons for a square wave (we see difference of amplitudes of harmonics) the output neuron (last in figure) spikes for the square wave thus achieving classification Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 14 / 26
Observations Figure: Output of the neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next octave(II); At t=80, guitar note in next octave(III); At t=120, again the same note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 15 / 26
Observations Figure: Output of the neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next octave(II); At t=80, guitar note in next octave(III); At t=120, again the same note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 16 / 26
Observations Figure: Output of the neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next octave(II); At t=80, guitar note in next octave(III); At t=120, again the same note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 17 / 26
Observations: Approach 2 Figure: Output of the AEF neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next octave(II); At t=80, guitar note in next octave(III); At t=120, again the same note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument; At t=197 and t=201, note in octave I and octave II Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 18 / 26
Observations: Approach 2 Figure: Output of the AEF neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next octave(II); At t=80, guitar note in next octave(III); At t=120, again the same note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument; At t=197 and t=201, note in octave I and octave II Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 19 / 26
Observations: Approach 2 Figure: Output of the AEF neurons’ network for the second layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next octave(II); At t=80, guitar note in next octave(III); At t=120, again the same note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument; At t=197 and t=201, note in octave I and octave II Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 20 / 26
Tritonia Central Pattern Generator Figure: Tritonia inspired rhythmic Pattern Generator Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 21 / 26
Observations Figure: Output of the Tritonia inspired three neuron pattern Figure: Output of the Tritonia generator (low frequency inspired three neuron pattern oscillations) generator Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 22 / 26
Conclusions A simple neural network based instrument classification is implemented Both note and the relevant musical instrument can be identified The approach can be extended to simultaneous identification and extraction of instrument sounds from a music file Possible drawbacks Multiple instruments and notes cannot be detected if played at the same time Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA) November 6, 2015 23 / 26
Recommend
More recommend