DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE - PowerPoint PPT Presentation

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of T echnology, Finland

Outline • Introduction • Our system • Training modes • Results • Challenge ranking

Introduction What is “acoustic scene classification”?

Introduction What is “acoustic scene classification”? Home Car Forest path Audio

Our system Overview Audio Label Feature Sequence Scores CNN extraction splitting averaging

Our system Audio Features Features Raw audio Log-mel spectrogram

Our system Features Sequence splitting Sequence splitting Sequence Raw audio segment Log-mel spectrogram

Our system Convolutional neural network Sequence

Our system Sequences CNN Convolutional neural network 128 Sequence Feature maps

Our system Sequences CNN Convolutional neural network 128 Batch normalization Sequence Feature maps

Our system Sequences CNN Convolutional neural network 128 128 Sequence Feature maps Subsampled feature maps

Our system Sequences CNN Convolutional neural network 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

Our system Sequences CNN Convolutional neural network Time shrinking 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

Our system Sequences CNN Convolutional neural network Flattening 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

Our system Sequences CNN Convolutional neural network Fully-connected softmax layer 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

Our system Sequences CNN Convolutional neural network 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

Our system Prediction Scores scores averaging Scores averaging Class prediction scores

Our system Prediction Scores scores averaging Scores averaging ! " Σ Class prediction scores argmax File’s class

T raining

T raining Cross-validation setup Fold 1 Training + validation T est T est Fold 2 T est Fold 3 T est Fold 4

T est T raining + validation T raining Fold n Non-full training Training Validation

T est T raining + validation T raining Fold n Non-full training Non-full training Training Validation

T est T raining + validation T raining Fold n Non-full training Training Accuracies Training Validation Validation Epochs

T est T raining + validation T raining Fold n Non-full training Training Accuracies Training Validation Validation Convergence time Epochs

T est T raining + validation T raining Fold n Non-full training Training Training Validation

T est T raining + validation T raining Fold n Non-full training Full training Training Training Validation

Results Test data Fold 1 Training + validation T est T est Fold 2 T est Fold 3 T est Fold 4

Results Sequence length Non-full training Full training 80 Accuracy (%) 75 70 65 0,5 1,5 3 5 10 30 Sequence length (s)

Results Class accuracies Class Accuracy (%) Class Accuracy (%) Beach 75.6 Library 66.6 Bus 76.9 Metro station 96.2 Café/Restaurant 74.4 Office 97.4 Car 91.0 Park 59.0 City center 93.6 Residential area 73.1 Forest path 96.2 T rain 46.2 Grocery store 88.5 T ram 78.2 Home 80.8

Results Class accuracies Class Accuracy (%) Class Accuracy (%) Library 66.6 Beach 75.6 Metro station 96.2 Bus 76.9 34.6% Residential area Café/Restaurant 74.4 Office 97.4 Car 91.0 Park 59.0 Residential area 73.1 City center 93.6 Train 46.2 Forest path 96.2 Tram 78.2 Grocery store 88.5 29.5% Bus Home 80.8

Results Other classifiers Sequence Accuracy (%) System length (s) Non-full training Full training Baseline GMM (MFCC) - - 72.6 T wo-layer CNN (MFCC) 5 67.7 72.6 T wo-layer MLP (log-mel) - 66.6 69.3 One-layer CNN (log-mel) 3 70.3 74.8 Two-layer CNN (log-mel) 3 75.9 79.0

Challenge ranking Final training Extended training set Evaluation set Training + validation + test Secret challenge data

Challenge ranking Final training Extended training set Evaluation set Training + validation + test Secret challenge data New training New validation

Challenge ranking Final training Extended training set Evaluation set Training + validation + test Secret challenge data 400 epochs New training New validation convergence

Challenge ranking Final training Extended training set Evaluation set Training + validation + test Secret challenge data Final training for 400 epochs

Challenge ranking 100 89,7 88,7 87,7 87,2 86,4 86,4 86,2 85,9 85,6 85,4 84,6 84,1 90 77,2 80 70 62,8 60 50 40 30 20 10 0

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of T echnology, Finland

Results Feature comparison Sequence Accuracy (%) System length (s) Non-full training Full training T wo-layer CNN (MFCC) 5 67.7 72.6 T wo-layer CNN (log-mel) 5 74.1 78.3

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE - PowerPoint PPT Presentation

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Universit Politecnica delle

DCASE 2016: Detection & Classification of Audio Scenes and Events Introduction and

how similar is it to speech recognition and music genre/instrument recognition ? G. Richard

General-purpose audio tagging of Freesound content with AudioSet labels DCASE 2018 Task 2

DCASE Challenge Aim to provide open data for researchers to use in their work Encourage

2/17/2016 1 2/17/2016 2 2/17/2016 3 2/17/2016 4 2/17/2016 5 2/17/2016 6 2/17/2016 7

Bag-of-Features Acoustic Event Detection for Sensor Networks Julian K urby, Ren e Grzeszick,

Welcome elcome to to 2013 2013 Saf Safety ety Gr Group oup Meeting # 1 !!! Meeting # 1

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

Q1 2016 Press Presentation | Page 1 | February 25, 2016 | May 3, 2016 Q1 2016 At a Glance

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

MOTION 2014 AGM Agenda Approval 1 13/01/2017 2016 AGM 2016 AGM AGM SPONSORS add

Safe and Robust Deep Learning Mislav Balunovi Department of Computer Science 1 SafeAI @ ETH

1. 1. Listen ten to Audio dio Clip ip 1. 2. 2. Play it again in and join in in. . Think

Computational Creativity Hannu Toivonen University of Helsinki hannu.toivonen@cs.helsinki.fi

Ubiquitous and Mobile Computing CS 528: Unsupervised Speaker Counter with Smartphones Xuanyu Li

REACT 2 TRAINING LTD www.react2training.co.uk Core Skills Courses Well- being ICT &

Presentation Overview Introduction - motivation Software requirements Open source

U.S. Department of Housing and Urban Development Office of Housing Counseling Facilitated by

Virtual Navigation of Ambisonics- Encoded Sound Fields Containing Near-Field Sources Joseph G.

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE - PowerPoint PPT Presentation

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Universit Politecnica delle

DCASE 2016: Detection &amp; Classification of Audio Scenes and Events Introduction and

how similar is it to speech recognition and music genre/instrument recognition ? G. Richard

General-purpose audio tagging of Freesound content with AudioSet labels DCASE 2018 Task 2

DCASE Challenge Aim to provide open data for researchers to use in their work Encourage

2/17/2016 1 2/17/2016 2 2/17/2016 3 2/17/2016 4 2/17/2016 5 2/17/2016 6 2/17/2016 7

Bag-of-Features Acoustic Event Detection for Sensor Networks Julian K urby, Ren e Grzeszick,

Welcome elcome to to 2013 2013 Saf Safety ety Gr Group oup Meeting # 1 !!! Meeting # 1

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

Q1 2016 Press Presentation | Page 1 | February 25, 2016 | May 3, 2016 Q1 2016 At a Glance

(c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016 Fabbian USA Corp. (c) 2016

MOTION 2014 AGM Agenda Approval 1 13/01/2017 2016 AGM 2016 AGM AGM SPONSORS add

Safe and Robust Deep Learning Mislav Balunovi Department of Computer Science 1 SafeAI @ ETH

1. 1. Listen ten to Audio dio Clip ip 1. 2. 2. Play it again in and join in in. . Think

Computational Creativity Hannu Toivonen University of Helsinki hannu.toivonen@cs.helsinki.fi

Ubiquitous and Mobile Computing CS 528: Unsupervised Speaker Counter with Smartphones Xuanyu Li

REACT 2 TRAINING LTD www.react2training.co.uk Core Skills Courses Well- being ICT &amp;

Presentation Overview Introduction - motivation Software requirements Open source

U.S. Department of Housing and Urban Development Office of Housing Counseling Facilitated by

Virtual Navigation of Ambisonics- Encoded Sound Fields Containing Near-Field Sources Joseph G.

DCASE 2016: Detection & Classification of Audio Scenes and Events Introduction and

REACT 2 TRAINING LTD www.react2training.co.uk Core Skills Courses Well- being ICT &