Acoustic Scene Classification by Ensembling Gradient Boosting - PowerPoint PPT Presentation

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks DCASE 2017 Eduardo Fonseca, Rong Gong, Dmitry Bogdanov, Olga Slizovskaia, Emilia Gomez and Xavier Serra

Outline Introduction ● Proposed System & Results ● Summary ● 2 2

Introduction Acoustic Scene Classification (ASC) ● 15 acoustic scenes ⇀ system recording environment 3

Introduction Traditionally: feature engineering ● feature extraction ⇀ classifier ⇀ 4

Introduction Traditionally: feature engineering Nowadays: data-driven ● ● feature extraction learning representations ⇀ ⇀ classifier ⇀ 5

Introduction Traditionally: feature engineering Nowadays: data-driven ● ● feature extraction learning representations ⇀ ⇀ classifier ⇀ How about combining both approaches for ASC ? 6

Proposed System Freesound score GBM splitting Extractor aggregation acoustic late scene fusion 10s segment pre- score CNN splitting processing aggregation mel-spectrogram 7

Gradient Boosting Machine audio feature snippets vectors acoustic Freesound score n n GBM n splitting scene Extractor aggregation Freesound Extractor by ● http://essentia.upf.edu/documentation/freesound_extractor.html ● 8

Gradient Boosting Machine audio feature snippets vectors acoustic Freesound score n n GBM n splitting scene Extractor aggregation Gradient Boosting Machine: ● effective in Kaggle ⇀ multiple weak learners (decision trees) ⇀ 9

Gradient Boosting Machine audio feature snippets vectors acoustic Freesound score n n GBM n splitting scene Extractor aggregation Gradient Boosting Machine: ● effective in Kaggle ⇀ multiple weak learners (decision trees) ⇀ added iteratively ⇀ Implementation: ● LigthGBM https://github.com/Microsoft/LightGBM ⇀ 10

Gradient Boosting Machine audio feature snippets vectors acoustic Freesound score n n GBM n splitting scene Extractor aggregation Score aggregation: ● averaging scores across snippets ⇀ argmax ⇀ Results: ● development set ⇀ 4-fold cross-validation provided ⇀ Accuracy: 80.8% ⇀ 11

Convolutional Neural Network log-scaled T-F mel-spectrogram patches acoustic pre- score n CNN n splitting scene processing aggregation log-scaled mel-spectrogram ● 128 bands ⇀ 12

Convolutional Neural Network log-scaled T-F mel-spectrogram patches acoustic pre- score n CNN n splitting scene processing aggregation log-scaled mel-spectrogram ● 128 bands ⇀ Time splitting: ● T-F patches 1.5s ⇀ 13

Convolutional Neural Network log-scaled T-F mel-spectrogram patches acoustic pre- score n CNN n splitting scene processing aggregation 14

Convolutional Neural Network log-scaled T-F mel-spectrogram patches acoustic pre- score n CNN n splitting scene processing aggregation 15

Convolutional Neural Network log-scaled T-F mel-spectrogram patches acoustic pre- score n CNN n splitting scene processing aggregation Global time-domain pooling (Valenti, 2016) ● 16

Convolutional Neural Network Design of convolutional filters: ● spectro -temporal patterns for ASC? ⇀ different rectangular filters (Pons, 2017) (Phan, 2016) ⇀ 17

Convolutional Neural Network Design of convolutional filters: ● spectro -temporal patterns for ASC? ⇀ different rectangular filters (Pons, 2017) (Phan, 2016) ⇀ multiple vertical filter shapes ( Q = 1, 2, 3, 4, 5 ) ⇀ Q = 1 18

Convolutional Neural Network Design of convolutional filters: ● spectro -temporal patterns for ASC? ⇀ different rectangular filters (Pons, 2017) (Phan, 2016) ⇀ multiple vertical filter shapes ( Q = 1, 2, 3, 4, 5 ) ⇀ Q = 4 19

Recap Feature engineering: ● Freesound Extractor ⇀ GBM ⇀ Accuracy 80.8% ● 20

Recap Feature engineering: Data-driven ● ● Freesound Extractor log-scaled mel-spectrogram ⇀ ⇀ GBM ⇀ CNN ⇀ Accuracy 80.8% ● Accuracy: 79.9% ● 21

Recap Feature engineering: Data-driven: ● ● Freesound Extractor log-scaled mel-spectrogram ⇀ ⇀ GBM ⇀ CNN ⇀ Accuracy 80.8% ● Accuracy: 79.9% ● How different do they behave? 22

Models’ Comparison (Confusion matrix by GBM - Confusion matrix by CNN) ● 23

Models’ Comparison (Confusion matrix by GBM - Confusion matrix by CNN) ● GBM performs better CNN performs better 24

Models’ Comparison (Confusion matrix by GBM - Confusion matrix by CNN) ● 29

Late Fusion GBM: ● prediction probabilities ⇀ CNN: ● softmax activation values ⇀ 30

Late Fusion GBM: ● prediction probabilities ⇀ CNN: ● softmax activation values ⇀ Late fusion approach: ● arithmetic mean + argmax ⇀ System accuracy on development set: ● 83.0 % ⇀ 31

Results residential area ● vs park 32

Results residential area ● vs park tram vs train ● 33

Results residential area ● vs park tram vs train ● grocery store vs ● cafe/resto 34

Challenge Ranking accuracy drop ● outperforming baseline by absolute 6.3 % ● 35

Summary Ensemble of two models ● Simplicity of models: ● GBM + out-of-box feature extractor ⇀ CNN using domain knowledge ⇀ providing complementary information ⇀ Simple late fusion method ● Reasonable results although room for improvement ● individual models ⇀ fusion approach ⇀ 36

Thank you! 37

References ● H. Phan, L. Hertel, M. Maass, and A. Mertins, “ Robust audio event recognition with 1-max pooling convolutional neural networks ”, arXiv preprint arXiv:1604.06338, 2016. J. Pons, O. Slizovskaia, R. Gong, E. Gómez, and X. Serra, “ Timbre Analysis of Music ● Audio Signals with Convolutional Neural Networks ”, in 25th European Signal Processing Conference (EUSIPCO2017). ● M. Valenti, A. Diment, G. Parascandolo, S. Squartini, and T. Virtanen, “ DCASE 2016 acoustic scene classification using convolutional neural networks ,” in Proc. Workshop Detection Classif. Acoust. Scenes Events, 2016. 38

Acoustic Scene Classification by Ensembling Gradient Boosting - PowerPoint PPT Presentation

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks DCASE 2017 Eduardo Fonseca, Rong Gong, Dmitry Bogdanov, Olga Slizovskaia, Emilia Gomez and Xavier Serra Outline Introduction

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1

Explainable Improved Ensembling for Natural Language and Vision Nazneen Rajani University of

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

3. Operator aided acoustic target classification 4. Conclusion and future plans 1 27.01.2020

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

CS 395T: Visual Recognition Exploiting Context for Object Detection 5 th October 2012 Aashish

AUDIO FEATURE EXTRACTION AND ANALYSIS FOR SCENE SEGMENTATION AND CLASSIFICATION Zhu Liu and Yao

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Acoustic Correlates for Perceived Effort Levels in Expressive

Heterogeneous Classification System for Underwater Acoustic Recognition F. CHAILLAN 1 , S. MEUNIER

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

Summary Linearly separable classification problems. Logistic loss log and (empirical)

Logistic Regression Lecture 6 Logistic Regression Classification Model CS 335

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Neural Network Training: Old & New Tricks Old: (80s) Stochastic Gradient Descent,

for Image Classification Qilong Wang ( ) Dalian University of Technology

Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision Nazneen

When Ensembling Smaller Models is More Effjcient than Single Large Models WebVision 2020 Dan

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Acoustic Scene Classification by Ensembling Gradient Boosting - PowerPoint PPT Presentation

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks DCASE 2017 Eduardo Fonseca, Rong Gong, Dmitry Bogdanov, Olga Slizovskaia, Emilia Gomez and Xavier Serra Outline Introduction

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1

Explainable Improved Ensembling for Natural Language and Vision Nazneen Rajani University of

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

3. Operator aided acoustic target classification 4. Conclusion and future plans 1 27.01.2020

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Scene Understanding Introduction &amp; Overview Outline Motivation The problems Scene

CS 395T: Visual Recognition Exploiting Context for Object Detection 5 th October 2012 Aashish

AUDIO FEATURE EXTRACTION AND ANALYSIS FOR SCENE SEGMENTATION AND CLASSIFICATION Zhu Liu and Yao

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Acoustic Correlates for Perceived Effort Levels in Expressive

Heterogeneous Classification System for Underwater Acoustic Recognition F. CHAILLAN 1 , S. MEUNIER

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

Summary Linearly separable classification problems. Logistic loss log and (empirical)

Logistic Regression Lecture 6 Logistic Regression Classification Model CS 335

Cross Validation &amp; Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Neural Network Training: Old &amp; New Tricks Old: (80s) Stochastic Gradient Descent,

for Image Classification Qilong Wang ( ) Dalian University of Technology

Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision Nazneen

When Ensembling Smaller Models is More Effjcient than Single Large Models WebVision 2020 Dan

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer

Neural Network Training: Old & New Tricks Old: (80s) Stochastic Gradient Descent,