AB Feature Extraction Experiments Discussion Noise Robust LVCSR - PowerPoint PPT Presentation

Introduction AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on the Stabilized Weighted Linear Prediction HUT-TUT Fall DSP Seminar 2008 Heikki Kallasjoki Adaptive Informatics Research Centre Helsinki University of Technology 21.11.2008 Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction Experiments Discussion Outline Introduction Introduction Feature Extraction MFCC SWLP Experiments Speech Recognition System Data Results Discussion Conclusions and Future Work Questions? Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction Introduction Experiments Discussion Introduction ◮ The “standard” mel-frequency cepstral coefficient (MFCC) based feature extraction is relatively sensitive to noise ◮ Noise robustness can be improved by replacing the “raw” FFT spectrum with a suitable spectral envelope estimate which (optimally) models only the interesting things ◮ SWLP is one method of generating such an envelope estimate, based on temporally weighted linear prediction Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction MFCC Experiments SWLP Discussion Mel-frequency Cepstral Coefficients (MFCCs) ◮ The de-facto standard for speech recognition features ◮ Easily computed: ◮ Window the input signal into overlapping frames ◮ Estimate the log of the amplitude spectrum ◮ Wrap to mel scale using logarithmically spaced triangular filters ◮ Take a DCT of the result to get cepstral coefficients ◮ Replacing the direct FFT-based amplitude spectrum with a spectral envelope estimate leads to MFCC variants that are more noise-robust Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction MFCC Experiments SWLP Discussion Conventional Linear Prediction (LP) ◮ Linear prediction gives an all-pole model for predicting a signal p � ◮ Conventional LP: x n = − ˆ a i x n − i i =1 ◮ The a i coefficients are found by minimizing a cost function: N + p � ε 2 E ( a ) = n ( a ) , where ε n ( a ) = x n − ˆ x n n =1 Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction MFCC Experiments SWLP Discussion Stabilized Weighted Linear Prediction (SWLP) ◮ In Weighted Linear Prediction (WLP), a temporal weight term is added to the LP cost function ◮ The weight function makes it possible to give a higher importance to particular, hopefully less noisy regions of the signal ◮ SWLP is a formulation of WLP which guarantees that the generated all-pole model is stable Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction MFCC Experiments SWLP Discussion SWLP Weight Function Selection ◮ Simple choice for the SWLP weight function is the short-time energy (STE) function ◮ The weight function given by STE causes the SWLP model to emphasize strong speech regions, where SNR is generally more favorable Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction MFCC Experiments SWLP Discussion Effect of the STE Window Width ◮ As the STE window width is adjusted, the spectral behavior approaches that of conventional LP 60 FFT SWLP M=8 50 SWLP M=128 Amplitude/dB LP 40 30 20 10 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency/Hz Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Speech Recognition System ◮ Acoustic model: ◮ Cross-word triphones ◮ State-clustered hidden Markov models ◮ Gaussian mixture models in speech feature space ◮ Gamma distribution for duration modeling ◮ Language model: n-grams of “statistical morphs” Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Feature Extraction ◮ Pre-processing: ◮ Pre-emphasis filter 1 − 0 . 97 z − 1 and Hamming-windowed frames ◮ 125 frames per second, 256 samples per frame ◮ MFCC: ◮ FFT based log-magnitude spectrum ◮ Filterbank of 23 logarithmically spaced triangular filters ◮ DCT of filterbank output to get cepstral coefficients ◮ SWLP-MFCC: FFT spectrum replaced with SWLP estimate ◮ Post-processing: ◮ 39-dimensional features: frame energy, 12 cepstral coefficients, first and second derivatives ◮ Cepstral mean subtraction ◮ Normalization, MLLT Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Data ◮ SPEECON Finnish language corpus ◮ Training sets: ◮ Clean set: 21 hours of clean speech from 293 speakers ◮ Multicondition set: similar length, even split of clean and noisy speech ◮ Test sets: ◮ Car environment: 60 minutes ◮ Public places environment: 90 minutes ◮ Both test sets had all recordings from three separate microphones positioned at different distances Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Results Results. Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Car environment, clean training set MFCC 70 67.5 LP-MFCC SWLP-MFCC 60 54.2 53.8 Letter error rate (%) 50 40 29.6 30 27.1 27.0 20 10 4.0 3.9 3.9 0 0 1 2 Recording channel Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Car environment, noisy training set 25 MFCC LP-MFCC SWLP-MFCC 20 18.4 18.1 17.9 Letter error rate (%) 15 10 8.0 7.3 6.8 5 4.0 4.0 3.8 0 0 1 2 Recording channel Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Public place environment, clean training set 45 MFCC 40.2 LP-MFCC 40 SWLP-MFCC 34.9 34.4 35 Letter error rate (%) 30 24.3 25 21.7 21.2 20 15 10 5 3.6 3.4 3.3 0 0 1 2 Recording channel Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction Speech Recognition System AB Feature Extraction Data Experiments Results Discussion Public place environment, noisy training set 18 MFCC LP-MFCC 16 SWLP-MFCC 14 12.5 12.3 11.9 Letter error rate (%) 12 10 8 7.1 7.1 6.3 6 3.6 3.7 4 3.4 2 0 0 1 2 Recording channel Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction Conclusions and Future Work Experiments Questions? Discussion Conclusions and Future (and Current) Work ◮ Spectral envelope estimation helps, when the noise is something unexpected ◮ How to use the SWLP weighting better? ◮ Adaptive control of the SWLP STE window width ◮ Improvements possible if we can select the “correct” M ◮ Using log-probabilities given by the decoder shows some promise ◮ Alternative idea: on-line noise estimation ◮ Maybe even replacing STE weighting with something else entirely ◮ SWLP against other methods, esp. MVDR based Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

Introduction AB Feature Extraction Conclusions and Future Work Experiments Questions? Discussion Questions? ? Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

AB Feature Extraction Experiments Discussion Noise Robust LVCSR - PowerPoint PPT Presentation

Introduction AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on the Stabilized Weighted Linear Prediction HUT-TUT Fall DSP Seminar 2008 Heikki Kallasjoki Adaptive Informatics Research Centre

Pattern Recognition Part 4: Feature Extraction Gerhard Schmidt Christian-Albrechts-Universitt

Linear Predictive Coding and Cepstrum coefficients for mining time variant information from

FUTURE INTERNET Testbed @TWAREN Che-Nan Yang NCHC,Taiwan Overview OpenFlow Testbed in

C oprocessor A ccelerated F ilterbank Extension Library Mummy, are we there yet Jan Kr amer

MUSIC CLASSIFICATION USING DNNS Course Project for CS365 Chaitanya Ahuja Amlan Kar Mentored by

Analysis of speech Dr. Anil Kumar Vuppala IIIT Hyderabad Analysis of speech Representing speech

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

A Deep Representation for Invariance and Music Classification Chiyuan Zhang, Georgios

Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1 ,

A Horizon B Horizon Samples Standard Volumetric Glassware with open bottom Filter paper Funnel

AGAF-JHV 2007 Wehningen, 5. Mai 2007 D A T V-Development: The Next Generation Uwe E. Kraus

Current Systems Wolfgang Hofle Acknowledgements: RF group, Section Leaders and Deputies Special

CR stochastic cooling system (1-2 GHz) Dr. Christina Dimopoulou 3d BINP FAIR Workshop

Parents Briefing 31 January 2015 GENERAL SCHOOL MATTERS Reporting and Dismissal Time

Particle Dynamics in e.m. fields Lagrangian given and d x /dt, a system

Technical Report on Work Package 1: Theory of Hierarchical Structure University of Birmingham

CGS 3066: Spring 2015 PHP Reference Can also be used as a study guide. Only covers topics

On the Expressiveness and Complexity of ATL Fran cois Laroussinie, Nicolas Markey, Ghassan

1 Research Portal Creating an Account Selection Criteria and Weightings Criterion Weight (%)

Constrained Join Protocol for 6TiSCH was: Minimal Security Framework for 6TiSCH

Proverbs 2:2122 For the upright will live in the land, and the blameless will remain in it;

Robust Streaming Codes based on Deterministic Channel Approximations Ashish Khisti University of

Laser and Beam Driven Wakefield Acceleration Chan Joshi University of California Los Angeles

Recent Higgs Search Results Recent Higgs Search Results with the CMS Detector with the CMS

AB Feature Extraction Experiments Discussion Noise Robust LVCSR - PowerPoint PPT Presentation

Introduction AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on the Stabilized Weighted Linear Prediction HUT-TUT Fall DSP Seminar 2008 Heikki Kallasjoki Adaptive Informatics Research Centre

Pattern Recognition Part 4: Feature Extraction Gerhard Schmidt Christian-Albrechts-Universitt

Linear Predictive Coding and Cepstrum coefficients for mining time variant information from

FUTURE INTERNET Testbed @TWAREN Che-Nan Yang NCHC,Taiwan Overview OpenFlow Testbed in

C oprocessor A ccelerated F ilterbank Extension Library Mummy, are we there yet Jan Kr amer

MUSIC CLASSIFICATION USING DNNS Course Project for CS365 Chaitanya Ahuja Amlan Kar Mentored by

Analysis of speech Dr. Anil Kumar Vuppala IIIT Hyderabad Analysis of speech Representing speech

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

A Deep Representation for Invariance and Music Classification Chiyuan Zhang, Georgios

Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1 ,

A Horizon B Horizon Samples Standard Volumetric Glassware with open bottom Filter paper Funnel

AGAF-JHV 2007 Wehningen, 5. Mai 2007 D A T V-Development: The Next Generation Uwe E. Kraus

Current Systems Wolfgang Hofle Acknowledgements: RF group, Section Leaders and Deputies Special

CR stochastic cooling system (1-2 GHz) Dr. Christina Dimopoulou 3d BINP FAIR Workshop

Parents Briefing 31 January 2015 GENERAL SCHOOL MATTERS Reporting and Dismissal Time

Particle Dynamics in e.m. fields Lagrangian given and d x /dt, a system

Technical Report on Work Package 1: Theory of Hierarchical Structure University of Birmingham

CGS 3066: Spring 2015 PHP Reference Can also be used as a study guide. Only covers topics

On the Expressiveness and Complexity of ATL Fran cois Laroussinie, Nicolas Markey, Ghassan

1 Research Portal Creating an Account Selection Criteria and Weightings Criterion Weight (%)

Constrained Join Protocol for 6TiSCH was: Minimal Security Framework for 6TiSCH

Proverbs 2:2122 For the upright will live in the land, and the blameless will remain in it;

Robust Streaming Codes based on Deterministic Channel Approximations Ashish Khisti University of

Laser and Beam Driven Wakefield Acceleration Chan Joshi University of California Los Angeles

Recent Higgs Search Results Recent Higgs Search Results with the CMS Detector with the CMS

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System