Bag-of-Features Acoustic Event Detection for Sensor Networks Julian K¨ urby, Ren´ e Grzeszick, Axel Plinge, and Gernot A. Fink Pattern Recognition, Computer Science XII, TU Dortmund University September 3, 2016 DCASE Workshop Budapest, Hungary
Axel Plinge BoF AED in Sensor Networks 1/14 Motivation Acoustic Sensor Networks (ASNs) ◮ are increasingly available: smartphones, laptops, hearing aids, ... ◮ offer the possibility of collaborative processing Acoustic Event Detection (AED) ◮ useful for ASN applications [1] ◮ distributed sensors can improve performance [2] ◮ can we do better than heuristics? [3] [1] A. Plinge, F. Jacob, R. Haeb-Umbach, and G. A. Fink. Acoustic microphone geometry calibration: An overview and experimental evaluation of state-of-the-art algorithms. IEEE Signal Process. Mag. , 33(4):14–29, July 2016 [2] H. Phan, M. Maass, L. Hertel, R. Mazur, and A. Mertins. A multi-channel fusion framework for audio event detection. In IEEE Workshop App. Signal Process. to Audio & Acoustics , 2015 [3] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014
Axel Plinge BoF AED in Sensor Networks 2/14 Method Overview Bag-of-Features ◮ approach originating in text retrieval ◮ successful in AED [1] ◮ fast and online Multi-channel fusion ◮ individual microphones or arrays as sensor node ◮ heuristic fusion: vote, max, product, ... ◮ learning based fusion: classifier stacking Processing pipeline Acoustic Sensor Node Features Classification Quantization Histogram Fusion [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process. , Florence, Italy, May 2014
Axel Plinge BoF AED in Sensor Networks 3/14 Fusion Features Quantization Histogram Classification Method (1/5) Features ◮ sliding window Codebook Training Fusion Training ◮ for each frame k , compute ② k perceptual loudness, MFCCs, and GFCCs [1] Loudness(Filter |(| sum(() Loudness Sliding(Window Spectrum Mel(Filterbank Sampling(+ log(|(| FFT DCT MFCCs Quantization Gammatone(Filterbank log(|(| DCT GFCCs silence speech door steps chairs GFCCs MFCCs L [1] X. Zhao, Y. Shao, and D. Wang. CASA-based robust speaker identification. IEEE Trans. Audio, Speech, Language Process. , 20(5):1608–1616, 2012 [2] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process. , Florence, Italy, May 2014 [3] code at http://patrec.cs.tu-dortmund.de/resources
Axel Plinge BoF AED in Sensor Networks 4/14 Features Classification Fusion Quantization Histogram Method (2/5) Quantization ◮ compute class-wise GMM by EM Codebook Training Fusion Training ◮ concatenate to super-codebook v l =( I · c + i ) = ( µ i , c , σ i , c ) ◮ quantize each frame k by super-codebook q k , l ( ② k , v l ) = N ( ② k | µ l , σ l ) ◮ histogram over a window of K frames K b l ( Y n , v l ) = 1 � q k , l ( ② k , v l ) K k =1 silence speech door steps chairs q l q l q l q l q l [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process. , Florence, Italy, May 2014 [2] code at http://patrec.cs.tu-dortmund.de/resources
Axel Plinge BoF AED in Sensor Networks 5/14 Method (3/5) Classification Fusion Features Quantization Histogram Classification Multinominal Bayes classification Codebook Training Fusion Training ◮ train with Lidstone smoothing α + � Yn ∈ Ω c b l ( Y n , v l ) P ( v l | Ω c ) = α L + � L � Yn ∈ Ω c b m ( Y n , v m ) m =1 ◮ all classes equally likely, i.e., have the same prior – ◮ maximum likelihood classification P ( Y n | Ω c ) = � v l ∈ ✈ P ( v l | Ω c ) b l ( Y n , v l ) silence speech chairs door steps log P ( Y | Ω c ) 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 c c c c c [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process. , Florence, Italy, May 2014 [2] code at http://patrec.cs.tu-dortmund.de/resources
❨ ❨ ❨ ❨ ❨ ❨ Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Classification Fusion Quantization Histogram BoF Models Fusion Training Codebook Training ◮ per channel, ◮ per array, or ◮ global [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014
Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Classification Fusion Quantization Histogram BoF Models Fusion Training Codebook Training ◮ per channel, ◮ per array, or ◮ global � � P 1 ( ❨ 1 , n | Ω 1 ) . . . P 1 ( ❨ 1 , n | Ω C ) � � � � P 1 ( ❨ 1 , n | Ω 2 ) . . . P M ( ❨ 2 , n | Ω C ) � � Heuristic fusion [1] argmax c ′ � � . . � � . . ◮ majority voting . . � � � � P 1 ( ❨ 1 , n | Ω C ) � . . . P M ( ❨ M , n | Ω C ) � � P m ( ❨ m , n | Ω c ) c ( m ) = argmax ˆ � � � �� � �� � � � c argmax c = c ′ argmax c = c ′ � � c ( m ) = c ′ }| c = argmax c ′ |{ ˆ ˆ [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014
Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Classification Fusion Quantization Histogram BoF Models Fusion Training Codebook Training ◮ per channel, ◮ per array, or ◮ global max m { P 1 ( ❨ 1 , n | Ω 1 ) . . . P M ( ❨ M , n | Ω 1 ) } argmax c max m { P 1 ( ❨ 1 , n | Ω 2 ) . . . P M ( ❨ M , n | Ω 2 ) } Heuristic fusion [1] . . . ◮ majority voting max m { P 1 ( ❨ 1 , n | Ω C ) . . . P M ( ❨ M , n | Ω C ) } P m ( ❨ m , n | Ω c ) c ( m ) = argmax ˆ c c ( m ) = c ′ }| c = argmax c ′ |{ ˆ ˆ ◮ maximum rule c = argmax ˆ max P m ( ❨ m , n | Ω c ) m c [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014
Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Classification Fusion Quantization Histogram BoF Models Fusion Training Codebook Training ◮ per channel, ◮ per array, or ◮ global P 1 ( ❨ 1 , n | Ω 1 ) · P 2 ( ❨ 2 , n | Ω 1 ) · . . . P M ( ❨ M , n | Ω 1 ) argmax c P 1 ( ❨ 1 , n | Ω 2 ) · P 2 ( ❨ 2 , n | Ω 2 ) · . . . P M ( ❨ M , n | Ω 1 ) Heuristic fusion [1] . . ◮ majority voting . P 1 ( ❨ 1 , n | Ω C ) · P 2 ( ❨ 2 , n | Ω C ) · . . . P M ( ❨ M , n | Ω 1 ) P m ( ❨ m , n | Ω c ) c ( m ) = argmax ˆ c c ( m ) = c ′ }| c = argmax c ′ |{ ˆ ˆ ◮ maximum rule c = argmax ˆ max P m ( ❨ m , n | Ω c ) m c ◮ product rule � c = argmax ˆ P m ( ❨ m , n | Ω c ) c m [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf. , pages 2375–2379, Lisbon, Portugal, Sept. 2014
Axel Plinge BoF AED in Sensor Networks 7/14 Classification Fusion Features Quantization Histogram Method (5/5) Fusion Codebook Training Fusion Training Learned Fusion [1] ◮ classifier stacking – use a meta-learner instead of heuristics ◮ classification of the class-channel matrix P 1 ( ❨ 1 , n | Ω 1 ) P M ( ❨ M , n | Ω 1 ) . . . P 1 ( ❨ 1 , n | Ω 2 ) P M ( ❨ M , n | Ω 2 ) . . . c = F ˆ ... P 1 ( ❨ 1 , n | Ω C ) P M ( ❨ M , n | Ω C ) . . . ◮ train a random forest classifier F using data not used for training the models ◮ invariance through channel-sorting max P m ( ❨ m , n | Ω c ) argsort c m [1] J. K¨ urby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop , Budapest, Hungary, Sept. 2016
Recommend
More recommend