bag of features acoustic event detection for sensor
play

Bag-of-Features Acoustic Event Detection for Sensor Networks Julian - PowerPoint PPT Presentation

Bag-of-Features Acoustic Event Detection for Sensor Networks Julian K urby, Ren e Grzeszick, Axel Plinge, and Gernot A. Fink Pattern Recognition, Computer Science XII, TU Dortmund University September 3, 2016 DCASE Workshop Budapest,


  1. Bag-of-Features Acoustic Event Detection for Sensor Networks Julian K¨ urby, Ren´ e Grzeszick, Axel Plinge, and Gernot A. Fink Pattern Recognition, Computer Science XII, TU Dortmund University September 3, 2016 DCASE Workshop Budapest, Hungary

  2. Axel Plinge BoF AED in Sensor Networks 1/14 Motivation Acoustic Sensor Networks (ASNs) ◮ are increasingly available: smartphones, laptops, hearing aids, ... ◮ offer the possibility of collaborative processing Acoustic Event Detection (AED) ◮ useful for ASN applications [1] ◮ distributed sensors can improve performance [2] ◮ can we do better than heuristics? [3] [1] A. Plinge, F. Jacob, R. Haeb-Umbach, and G. A. Fink. Acoustic microphone geometry calibration: An overview and experimental evaluation of state-of-the-art algorithms. IEEE Signal Process. Mag. , 33(4):14–29, July 2016 [2] H. Phan, M. Maass, L. Hertel, R. Mazur, and A. Mertins. A multi-channel fusion framework for audio event detection. In IEEE Workshop App. Signal Process. to Audio & Acoustics , 2015 [3] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014

  3. Axel Plinge BoF AED in Sensor Networks 2/14 Method Overview Bag-of-Features ◮ approach originating in text retrieval ◮ successful in AED [1] ◮ fast and online Multi-channel fusion ◮ individual microphones or arrays as sensor node ◮ heuristic fusion: vote, max, product, ... ◮ learning based fusion: classifier stacking Processing pipeline Acoustic Sensor Node Features Classification Quantization Histogram Fusion [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process. , Florence, Italy, May 2014

  4. Axel Plinge BoF AED in Sensor Networks 3/14 Fusion Features Quantization Histogram Classification Method (1/5) Features ◮ sliding window Codebook Training Fusion Training ◮ for each frame k , compute ② k perceptual loudness, MFCCs, and GFCCs [1] Loudness(Filter |(| sum(() Loudness Sliding(Window Spectrum Mel(Filterbank Sampling(+ log(|(| FFT DCT MFCCs Quantization Gammatone(Filterbank log(|(| DCT GFCCs silence speech door steps chairs GFCCs MFCCs L [1] X. Zhao, Y. Shao, and D. Wang. CASA-based robust speaker identification. IEEE Trans. Audio, Speech, Language Process. , 20(5):1608–1616, 2012 [2] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process. , Florence, Italy, May 2014 [3] code at http://patrec.cs.tu-dortmund.de/resources

  5. Axel Plinge BoF AED in Sensor Networks 4/14 Features Classification Fusion Quantization Histogram Method (2/5) Quantization ◮ compute class-wise GMM by EM Codebook Training Fusion Training ◮ concatenate to super-codebook v l =( I · c + i ) = ( µ i , c , σ i , c ) ◮ quantize each frame k by super-codebook q k , l ( ② k , v l ) = N ( ② k | µ l , σ l ) ◮ histogram over a window of K frames K b l ( Y n , v l ) = 1 � q k , l ( ② k , v l ) K k =1 silence speech door steps chairs q l q l q l q l q l [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process. , Florence, Italy, May 2014 [2] code at http://patrec.cs.tu-dortmund.de/resources

  6. Axel Plinge BoF AED in Sensor Networks 5/14 Method (3/5) Classification Fusion Features Quantization Histogram Classification Multinominal Bayes classification Codebook Training Fusion Training ◮ train with Lidstone smoothing α + � Yn ∈ Ω c b l ( Y n , v l ) P ( v l | Ω c ) = α L + � L � Yn ∈ Ω c b m ( Y n , v m ) m =1 ◮ all classes equally likely, i.e., have the same prior – ◮ maximum likelihood classification P ( Y n | Ω c ) = � v l ∈ ✈ P ( v l | Ω c ) b l ( Y n , v l ) silence speech chairs door steps log P ( Y | Ω c ) 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 c c c c c [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process. , Florence, Italy, May 2014 [2] code at http://patrec.cs.tu-dortmund.de/resources

  7. ❨ ❨ ❨ ❨ ❨ ❨ Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Classification Fusion Quantization Histogram BoF Models Fusion Training Codebook Training ◮ per channel, ◮ per array, or ◮ global [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014

  8. Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Classification Fusion Quantization Histogram BoF Models Fusion Training Codebook Training ◮ per channel, ◮ per array, or ◮ global �   � P 1 ( ❨ 1 , n | Ω 1 ) . . . P 1 ( ❨ 1 , n | Ω C ) � �   �   �  P 1 ( ❨ 1 , n | Ω 2 ) . . . P M ( ❨ 2 , n | Ω C )  �   � Heuristic fusion [1] argmax c ′   �   � . .   � � . . ◮ majority voting . . � � � �   P 1 ( ❨ 1 , n | Ω C ) � . . . P M ( ❨ M , n | Ω C ) �   � P m ( ❨ m , n | Ω c )   c ( m ) = argmax ˆ �   �   � �� � �� � �   � c   argmax c = c ′ argmax c = c ′ � � c ( m ) = c ′ }| c = argmax c ′ |{ ˆ ˆ [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014

  9. Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Classification Fusion Quantization Histogram BoF Models Fusion Training Codebook Training ◮ per channel, ◮ per array, or ◮ global   max m { P 1 ( ❨ 1 , n | Ω 1 ) . . . P M ( ❨ M , n | Ω 1 ) }   argmax c     max m { P 1 ( ❨ 1 , n | Ω 2 ) . . . P M ( ❨ M , n | Ω 2 ) } Heuristic fusion [1] . . .   ◮ majority voting     max m { P 1 ( ❨ 1 , n | Ω C ) . . . P M ( ❨ M , n | Ω C ) } P m ( ❨ m , n | Ω c ) c ( m ) = argmax ˆ c c ( m ) = c ′ }| c = argmax c ′ |{ ˆ ˆ ◮ maximum rule c = argmax ˆ max P m ( ❨ m , n | Ω c ) m c [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014

  10. Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Classification Fusion Quantization Histogram BoF Models Fusion Training Codebook Training ◮ per channel, ◮ per array, or ◮ global   P 1 ( ❨ 1 , n | Ω 1 ) · P 2 ( ❨ 2 , n | Ω 1 ) · . . . P M ( ❨ M , n | Ω 1 )     argmax c  P 1 ( ❨ 1 , n | Ω 2 ) · P 2 ( ❨ 2 , n | Ω 2 ) · . . . P M ( ❨ M , n | Ω 1 )    Heuristic fusion [1] . . ◮ majority voting  .        P 1 ( ❨ 1 , n | Ω C ) · P 2 ( ❨ 2 , n | Ω C ) · . . . P M ( ❨ M , n | Ω 1 ) P m ( ❨ m , n | Ω c ) c ( m ) = argmax ˆ c c ( m ) = c ′ }| c = argmax c ′ |{ ˆ ˆ ◮ maximum rule c = argmax ˆ max P m ( ❨ m , n | Ω c ) m c ◮ product rule � c = argmax ˆ P m ( ❨ m , n | Ω c ) c m [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014

  11. Axel Plinge BoF AED in Sensor Networks 7/14 Classification Fusion Features Quantization Histogram Method (5/5) Fusion Codebook Training Fusion Training Learned Fusion [1] ◮ classifier stacking – use a meta-learner instead of heuristics ◮ classification of the class-channel matrix   P 1 ( ❨ 1 , n | Ω 1 ) P M ( ❨ M , n | Ω 1 ) . . . P 1 ( ❨ 1 , n | Ω 2 ) P M ( ❨ M , n | Ω 2 )   . . .   c = F ˆ ...     P 1 ( ❨ 1 , n | Ω C ) P M ( ❨ M , n | Ω C ) . . . ◮ train a random forest classifier F using data not used for training the models ◮ invariance through channel-sorting max P m ( ❨ m , n | Ω c ) argsort c m [1] J. K¨ urby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop , Budapest, Hungary, Sept. 2016

Recommend


More recommend