ab
play

AB Feature Extraction Experiments Discussion Noise Robust LVCSR - PowerPoint PPT Presentation

Introduction AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on the Stabilized Weighted Linear Prediction HUT-TUT Fall DSP Seminar 2008 Heikki Kallasjoki Adaptive Informatics Research Centre


  1. Introduction AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on the Stabilized Weighted Linear Prediction HUT-TUT Fall DSP Seminar 2008 Heikki Kallasjoki Adaptive Informatics Research Centre Helsinki University of Technology 21.11.2008 Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  2. Introduction AB Feature Extraction Experiments Discussion Outline Introduction Introduction Feature Extraction MFCC SWLP Experiments Speech Recognition System Data Results Discussion Conclusions and Future Work Questions? Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  3. Introduction AB Feature Extraction Introduction Experiments Discussion Introduction ◮ The “standard” mel-frequency cepstral coefficient (MFCC) based feature extraction is relatively sensitive to noise ◮ Noise robustness can be improved by replacing the “raw” FFT spectrum with a suitable spectral envelope estimate which (optimally) models only the interesting things ◮ SWLP is one method of generating such an envelope estimate, based on temporally weighted linear prediction Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  4. Introduction AB Feature Extraction MFCC Experiments SWLP Discussion Mel-frequency Cepstral Coefficients (MFCCs) ◮ The de-facto standard for speech recognition features ◮ Easily computed: ◮ Window the input signal into overlapping frames ◮ Estimate the log of the amplitude spectrum ◮ Wrap to mel scale using logarithmically spaced triangular filters ◮ Take a DCT of the result to get cepstral coefficients ◮ Replacing the direct FFT-based amplitude spectrum with a spectral envelope estimate leads to MFCC variants that are more noise-robust Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  5. Introduction AB Feature Extraction MFCC Experiments SWLP Discussion Conventional Linear Prediction (LP) ◮ Linear prediction gives an all-pole model for predicting a signal p � ◮ Conventional LP: x n = − ˆ a i x n − i i =1 ◮ The a i coefficients are found by minimizing a cost function: N + p � ε 2 E ( a ) = n ( a ) , where ε n ( a ) = x n − ˆ x n n =1 Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  6. Introduction AB Feature Extraction MFCC Experiments SWLP Discussion Stabilized Weighted Linear Prediction (SWLP) ◮ In Weighted Linear Prediction (WLP), a temporal weight term is added to the LP cost function ◮ The weight function makes it possible to give a higher importance to particular, hopefully less noisy regions of the signal ◮ SWLP is a formulation of WLP which guarantees that the generated all-pole model is stable Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  7. Introduction AB Feature Extraction MFCC Experiments SWLP Discussion SWLP Weight Function Selection ◮ Simple choice for the SWLP weight function is the short-time energy (STE) function ◮ The weight function given by STE causes the SWLP model to emphasize strong speech regions, where SNR is generally more favorable Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  8. Introduction AB Feature Extraction MFCC Experiments SWLP Discussion Effect of the STE Window Width ◮ As the STE window width is adjusted, the spectral behavior approaches that of conventional LP 60 FFT SWLP M=8 50 SWLP M=128 Amplitude/dB LP 40 30 20 10 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency/Hz Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  9. Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Speech Recognition System ◮ Acoustic model: ◮ Cross-word triphones ◮ State-clustered hidden Markov models ◮ Gaussian mixture models in speech feature space ◮ Gamma distribution for duration modeling ◮ Language model: n-grams of “statistical morphs” Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  10. Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Feature Extraction ◮ Pre-processing: ◮ Pre-emphasis filter 1 − 0 . 97 z − 1 and Hamming-windowed frames ◮ 125 frames per second, 256 samples per frame ◮ MFCC: ◮ FFT based log-magnitude spectrum ◮ Filterbank of 23 logarithmically spaced triangular filters ◮ DCT of filterbank output to get cepstral coefficients ◮ SWLP-MFCC: FFT spectrum replaced with SWLP estimate ◮ Post-processing: ◮ 39-dimensional features: frame energy, 12 cepstral coefficients, first and second derivatives ◮ Cepstral mean subtraction ◮ Normalization, MLLT Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  11. Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Data ◮ SPEECON Finnish language corpus ◮ Training sets: ◮ Clean set: 21 hours of clean speech from 293 speakers ◮ Multicondition set: similar length, even split of clean and noisy speech ◮ Test sets: ◮ Car environment: 60 minutes ◮ Public places environment: 90 minutes ◮ Both test sets had all recordings from three separate microphones positioned at different distances Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  12. Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Results Results. Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  13. Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Car environment, clean training set MFCC 70 67.5 LP-MFCC SWLP-MFCC 60 54.2 53.8 Letter error rate (%) 50 40 29.6 30 27.1 27.0 20 10 4.0 3.9 3.9 0 0 1 2 Recording channel Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  14. Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Car environment, noisy training set 25 MFCC LP-MFCC SWLP-MFCC 20 18.4 18.1 17.9 Letter error rate (%) 15 10 8.0 7.3 6.8 5 4.0 4.0 3.8 0 0 1 2 Recording channel Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  15. Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Public place environment, clean training set 45 MFCC 40.2 LP-MFCC 40 SWLP-MFCC 34.9 34.4 35 Letter error rate (%) 30 24.3 25 21.7 21.2 20 15 10 5 3.6 3.4 3.3 0 0 1 2 Recording channel Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  16. Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Public place environment, noisy training set 18 MFCC LP-MFCC 16 SWLP-MFCC 14 12.5 12.3 11.9 Letter error rate (%) 12 10 8 7.1 7.1 6.3 6 3.6 3.7 4 3.4 2 0 0 1 2 Recording channel Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  17. Introduction AB Feature Extraction Conclusions and Future Work Experiments Questions? Discussion Conclusions and Future (and Current) Work ◮ Spectral envelope estimation helps, when the noise is something unexpected ◮ How to use the SWLP weighting better? ◮ Adaptive control of the SWLP STE window width ◮ Improvements possible if we can select the “correct” M ◮ Using log-probabilities given by the decoder shows some promise ◮ Alternative idea: on-line noise estimation ◮ Maybe even replacing STE weighting with something else entirely ◮ SWLP against other methods, esp. MVDR based Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

  18. Introduction AB Feature Extraction Conclusions and Future Work Experiments Questions? Discussion Questions? ? Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Recommend


More recommend