Machine Listening in Complex Environments Some challenges in understanding musical and environmental sounds Mathieu Lagrange June 25, 2013
Music Human Perception Melody Enhancement Scattering Scene Synthesis Outline Motivation Let humans access audio data in a way that makes sense for them 2 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Outline Motivation Let humans access audio data in a way that makes sense for them Means explore different means of representing sound to quantify the notion of resemblance between sounds as experienced by humans in musical corpora for environmental sounds 2 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Listening in a Complex Environment semantic representations: human perception processes: mathematical representation: computational issues: 3 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Listening in a Complex Environment semantic representations: human perception processes: is there a need for segregating elements of interest ? mathematical representation: computational issues: 3 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Listening in a Complex Environment semantic representations: human perception processes: mathematical representation: what models can be relevant for representing complex scenes in a generic way ? computational issues: 3 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Listening in a Complex Environment semantic representations: human perception processes: mathematical representation: computational issues: is it meaningful to evaluate computational systems using artificial data ? 3 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Music Information Retrieval (MIR) As in every multimedia retrieval task, the main issue is to bridge the semantic gap. Depending on the data at hand, the difficulty of the task ranges from impossible to hardly doable 1 raw data (signal) 4 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Music Information Retrieval (MIR) As in every multimedia retrieval task, the main issue is to bridge the semantic gap. Depending on the data at hand, the difficulty of the task ranges from impossible to hardly doable 1 raw data (signal) 2 meta data (tags: genre) 4 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Music Information Retrieval (MIR) As in every multimedia retrieval task, the main issue is to bridge the semantic gap. Depending on the data at hand, the difficulty of the task ranges from impossible to hardly doable 1 raw data (signal) 2 meta data (tags: genre) 3 user ratings (likes) 4 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Content-based Similarity in Music Fingerprint Similarity Cover Tag User Signal Chords 5 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Can we break the glass ceiling in MIR ? One way is to consider the important property that Music is usually polyphonic. 6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Can we break the glass ceiling in MIR ? One way is to consider the important property that Music is usually polyphonic. Let us focus on what is important in this polyphony ! 6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Can we break the glass ceiling in MIR ? One way is to consider the important property that Music is usually polyphonic. Let us focus on what is important in this polyphony ! at a low auditory level [Mesgarani & al Nature’12] 6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Can we break the glass ceiling in MIR ? One way is to consider the important property that Music is usually polyphonic. Let us focus on what is important in this polyphony ! at a low auditory level [Mesgarani & al Nature’12] Since the multi-track recordings are usually not available, one has to resort to use approximate solutions, based on enhancement of the sources of interest within mix-down versions. 6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Can we break the glass ceiling in MIR ? One way is to consider the important property that Music is usually polyphonic. Let us focus on what is important in this polyphony ! at a low auditory level [Mesgarani & al Nature’12] Since the multi-track recordings are usually not available, one has to resort to use approximate solutions, based on enhancement of the sources of interest within mix-down versions. Expected performance gain is far from being achieved with state-of-the-art approaches, though. 6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis We do segregate, right ? In front of a complex scene, we are remarkably efficient at focusing on a source of interest. But arguably, are we actually performing the segregation at a low level, i.e. spectrogram level) ? 7 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis We do segregate, right ? In front of a complex scene, we are remarkably efficient at focusing on a source of interest. But arguably, are we actually performing the segregation at a low level, i.e. spectrogram level) ? Apparently, we do ;-) 7 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Misgarani & al [Nature’12] record cortical activity from human subjects implanted with customized high-density multi- electrode arrays as part of their clinical work-up for epilepsy surgery 8 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Misgarani & al [Nature’12] record cortical activity from human subjects implanted with customized high-density multi- electrode arrays as part of their clinical work-up for epilepsy surgery Subjects listened to speech samples from a corpus commonly used in multi-talker communication research. A typical sentence was "ready tiger go to red two now" where "tiger" is the call sign, and "red two" is the colour - number combination. 8 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Misgarani & al [Nature’12] record cortical activity from human subjects implanted with customized high-density multi- electrode arrays as part of their clinical work-up for epilepsy surgery Subjects listened to speech samples from a corpus commonly used in multi-talker communication research. A typical sentence was "ready tiger go to red two now" where "tiger" is the call sign, and "red two" is the colour - number combination. The method of stimulus reconstruction was used to estimate the speech spectrogram represented by the population neural responses. 8 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Misgarani & al [Nature’12] 9 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Misgarani & al [Nature’12] 9 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Why Source Separation for Music Analysis is not trivial ? The enhancement process must operate with limited prior knowledge about the properties of the specific parts to be enhanced. 10 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Why Source Separation for Music Analysis is not trivial ? The enhancement process must operate with limited prior knowledge about the properties of the specific parts to be enhanced. Distortions inevitably remain that propagate to the subsequent feature extraction and classification stages. 10 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Why Source Separation for Music Analysis is not trivial ? The enhancement process must operate with limited prior knowledge about the properties of the specific parts to be enhanced. Distortions inevitably remain that propagate to the subsequent feature extraction and classification stages. To reduce their impact, one needs to design features that are robust use standard features and estimate whether they are relevant or not by considering a notion of uncertainty. 10 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Why Source Separation for Music Analysis is not trivial ? The enhancement process must operate with limited prior knowledge about the properties of the specific parts to be enhanced. Distortions inevitably remain that propagate to the subsequent feature extraction and classification stages. To reduce their impact, one needs to design features that are robust use standard features and estimate whether they are relevant or not by considering a notion of uncertainty. Uncertainty can be binary (missing feature theory [Eggink2003]) Gaussian [Droppo2005]. 10 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis Contributions 1 Promote the use of Gaussian uncertainty instead of binary uncertainty for robust classification in the field of MIR, 11 / 28
Recommend
More recommend