dcase 2016
play

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE - PowerPoint PPT Presentation

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Universit Politecnica delle


  1. DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of T echnology, Finland

  2. DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of T echnology, Finland

  3. Outline • Introduction • Our system • Training modes • Results • Challenge ranking

  4. Introduction What is “acoustic scene classification”?

  5. Introduction What is “acoustic scene classification”? Home Car Forest path Audio

  6. Our system Overview Audio Label Feature Sequence Scores CNN extraction splitting averaging

  7. Our system Audio Features Features Raw audio Log-mel spectrogram

  8. Our system Features Sequence splitting Sequence splitting Sequence Raw audio segment Log-mel spectrogram

  9. Our system Convolutional neural network Sequence

  10. Our system Sequences CNN Convolutional neural network 128 Sequence Feature maps

  11. Our system Sequences CNN Convolutional neural network 128 Batch normalization Sequence Feature maps

  12. Our system Sequences CNN Convolutional neural network 128 128 Sequence Feature maps Subsampled feature maps

  13. Our system Sequences CNN Convolutional neural network 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

  14. Our system Sequences CNN Convolutional neural network Time shrinking 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

  15. Our system Sequences CNN Convolutional neural network Flattening 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

  16. Our system Sequences CNN Convolutional neural network Fully-connected softmax layer 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

  17. Our system Sequences CNN Convolutional neural network 128 128 256 Sequence Feature maps New Subsampled feature maps feature maps

  18. Our system Prediction Scores scores averaging Scores averaging Class prediction scores

  19. Our system Prediction Scores scores averaging Scores averaging ! " Σ Class prediction scores argmax File’s class

  20. T raining

  21. T raining Cross-validation setup Fold 1 Training + validation T est T est Fold 2 T est Fold 3 T est Fold 4

  22. T est T raining + validation T raining Fold n Non-full training Training Validation

  23. T est T raining + validation T raining Fold n Non-full training Non-full training Training Validation

  24. T est T raining + validation T raining Fold n Non-full training Training Accuracies Training Validation Validation Epochs

  25. T est T raining + validation T raining Fold n Non-full training Training Accuracies Training Validation Validation Convergence time Epochs

  26. T est T raining + validation T raining Fold n Non-full training Training Training Validation

  27. T est T raining + validation T raining Fold n Non-full training Full training Training Training Validation

  28. Results Test data Fold 1 Training + validation T est T est Fold 2 T est Fold 3 T est Fold 4

  29. Results Sequence length Non-full training Full training 80 Accuracy (%) 75 70 65 0,5 1,5 3 5 10 30 Sequence length (s)

  30. Results Sequence length Non-full training Full training 80 Accuracy (%) 75 70 65 0,5 1,5 3 5 10 30 Sequence length (s)

  31. Results Sequence length Non-full training Full training 80 Accuracy (%) 75 70 65 0,5 1,5 3 5 10 30 Sequence length (s)

  32. Results Class accuracies Class Accuracy (%) Class Accuracy (%) Beach 75.6 Library 66.6 Bus 76.9 Metro station 96.2 Café/Restaurant 74.4 Office 97.4 Car 91.0 Park 59.0 City center 93.6 Residential area 73.1 Forest path 96.2 T rain 46.2 Grocery store 88.5 T ram 78.2 Home 80.8

  33. Results Class accuracies Class Accuracy (%) Class Accuracy (%) Library 66.6 Beach 75.6 Metro station 96.2 Bus 76.9 34.6% Residential area Café/Restaurant 74.4 Office 97.4 Car 91.0 Park 59.0 Residential area 73.1 City center 93.6 Train 46.2 Forest path 96.2 Tram 78.2 Grocery store 88.5 29.5% Bus Home 80.8

  34. Results Other classifiers Sequence Accuracy (%) System length (s) Non-full training Full training Baseline GMM (MFCC) - - 72.6 T wo-layer CNN (MFCC) 5 67.7 72.6 T wo-layer MLP (log-mel) - 66.6 69.3 One-layer CNN (log-mel) 3 70.3 74.8 Two-layer CNN (log-mel) 3 75.9 79.0

  35. Challenge ranking Final training Extended training set Evaluation set Training + validation + test Secret challenge data

  36. Challenge ranking Final training Extended training set Evaluation set Training + validation + test Secret challenge data New training New validation

  37. Challenge ranking Final training Extended training set Evaluation set Training + validation + test Secret challenge data 400 epochs New training New validation convergence

  38. Challenge ranking Final training Extended training set Evaluation set Training + validation + test Secret challenge data Final training for 400 epochs

  39. Challenge ranking 100 89,7 88,7 87,7 87,2 86,4 86,4 86,2 85,9 85,6 85,4 84,6 84,1 90 77,2 80 70 62,8 60 50 40 30 20 10 0

  40. DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of T echnology, Finland

  41. Results Feature comparison Sequence Accuracy (%) System length (s) Non-full training Full training T wo-layer CNN (MFCC) 5 67.7 72.6 T wo-layer CNN (log-mel) 5 74.1 78.3

Recommend


More recommend