Feature Extraction Combining Feature Extraction Combining Spectral - PowerPoint PPT Presentation

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral Noise Reduction and Cepstral Histogram Equalization Cepstral Histogram Equalization For Robust ASR For Robust ASR J.C. Segura, M.C. Benítez, A. de la Torre, A.J. Rubio J.C. Segura, M.C. Benítez, A. de la Torre, A.J. Rubio Signal Processing and Signal Processing and University University Communications Group Communications Group of Granada (SPAIN) of Granada (SPAIN)

Introduction Introduction � Results for Noisy TI-Digits at ICASSP’02 � Histogram Equalization (HE) can reduce the mismatch of noisy speech better than CMS and CMVN � Its performance is increased when applied over partially compensated speech features � In this work we explore HE performance in combination with Spectral Subtraction José C. Segura, ICSLP’2002 2

Outline Outline � System description � Front-End Spectral Noise Reduction � Speech/Non-Speech Detection � Spectral Subtraction � Back-End Processing � Frame-Dropping � Feature Normalization � Experimental set-up � Results and discussion José C. Segura, ICSLP’2002 3

System Description System Description Back-End Front-End logE SND SND2 logE NS Speech signal Recog . FD FFT HE MFCC SS José C. Segura, ICSLP’2002 4

Spectral Subtraction Spectral Subtraction � Standard implementation on the magnitude spectrum { ( ) } ˆ = − α ˆ β X ( w ) max Y ( w ) N ( w ) , Y ( w ) t t t t  λ ˆ + − λ N ( w ) ( 1 ) Y ( w ) Non - Speech  − t 1 t ˆ =  N ( w ) t ˆ  N ( w ) Speech  − t 1 α = ˆ Over - subtractio n 1.1 N ( w ) : Noise estimate β = Maximum attenuatio n 0.3 Y ( w ) : Noisy speech λ = ˆ Forgetting factor 0 . 95 X ( w ) : Clean speech estimate José C. Segura, ICSLP’2002 5

Speech/Non- -Speech Detection (I) Speech Detection (I) Speech/Non � Based on log-Energy quantile difference � Quantiles are estimated over a sliding window of 21 frames (at a frame rate of 100Hz) � Q 0.5 (median) is used to track the noise level B � Q 0.9 is used to track the speech level � Q SNR = Q 0.9 -B is thresholded to detect speech � Noise level B is updated with Q 0.5 whenever non-speech is detected José C. Segura, ICSLP’2002 6

Speech/Non- -Speech Detection (II) Speech Detection (II) Speech/Non � Characteristics of the SND algorithm � Easy and fast implementation � Fast tracking of noise level � Q SNR is smooth enough to prevent false speech detections � Implicit symmetric hang-over José C. Segura, ICSLP’2002 7

Speech/Non- -Speech Detection (III) Speech Detection (III) Speech/Non José C. Segura, ICSLP’2002 8

Frame- -Dropping Dropping Frame � The objective is to remove long speech pauses � Based on same SND algorithm � It works over the noise reduced speech � One frame is removed only if in the middle of a non-speech segment of predefined length � This prevents over-dropping � 11 frames are used in this work José C. Segura, ICSLP’2002 9

Feature Normalization (I) Feature Normalization (I) � CDF-matching for non-linear distortion compensation � Given a zero-memory one-to-one general transformation y=T [ x ] → = → = x p ( x ) y T [ x ] p ( T [ x ]) p ( y ) X Y Y = x = y ∫ ∫ C ( x ) p ( u ) du C ( y ) p ( u ) du X − ∞ X Y − ∞ Y = ⇒ = − = − 1 1 C ( x ) C ( y ) x T [ y ] C ( C ( y )) X Y X Y José C. Segura, ICSLP’2002 10

Feature Normalization (II) Feature Normalization (II) � Two ways of using CDF-matching for mismatch reduction � CDF-matching for feature compensation � C X (x) is estimated during training � During test, C Y (y) estimate is used to compensate for the mismatch = − = − ˆ ˆ 1 1 x T [ y ] C ( C ( y )) ˆ X Y � CDF-matching for feature normalization � A predefined C X (x) is selected (usually Gaussian) � For both training and test, features are transformed to match the reference distribution using an estimate of C Y (y) � Can be viewed as an extension of CMVN José C. Segura, ICSLP’2002 11

Feature Normalization (III) Feature Normalization (III) � Previous works: Feature compensation � R. Balchandran, R. Mammone. Non Non- -parametric estimation and parametric estimation and correction of non- -linear distortion in speech systems linear distortion in speech systems [ICASSP´98] correction of non • Domain: Speech samples • Task: Speaker ID / Sigmoid and cubic distortions � S. Dharanipragada, M. Padmanabhan. A nonlinear unsupervised A nonlinear unsupervised adaptation technique for speech recognition [ICSLP’00] adaptation technique for speech recognition • Domain: Cepstrum • Task: Speech Recognition / Handset / Speaker-phone mismatch � F. Hilger, H. Ney. Quantile based histogram equalization for noise Quantile based histogram equalization for noise robust speech recognition [EUROSPEECH’01] robust speech recognition • Domain: Filter-bank Energy • Task: Speech Recognition / AURORA task José C. Segura, ICSLP’2002 12

Feature Normalization (IV) Feature Normalization (IV) � Previous works: Feature normalization � J. Pelecanos, S. Sridharan. Feature warping for robust speaker Feature warping for robust speaker verification [Speaker Odyssey’01] verification • Domain: Cepstrum • Task: NIST 1999 Speaker Recognition Evaluation database � B. Xiang, U.V. Chaudhari,… Short Short- -time gaussianization for robust time gaussianization for robust speaker verification [ICASSP’02] speaker verification • Domain: Cepstrum / Short-time • Task: Speaker Verification � J.C. Segura, A. de la Torre, M.C. Benítez,… Non Non- -linear linear transformations of the feature space for robust speech recognition on transformations of the feature space for robust speech recogniti [ICASSP’02] • Domain: Cepstrum • Task: Speech Recognition / AURORA José C. Segura, ICSLP’2002 13

Feature Normalization (V) Feature Normalization (V) ( ) ( ) ( ) = + + = = y log exp x h exp n h 0 . 8 n 3 . 5 José C. Segura, ICSLP’2002 14

Feature Normalization (VI) Feature Normalization (VI) � Implementation details � CDF-matching is applied in the cepstrum domain in a feature transformation scheme � Each cepstral coefficient is transformed independently to match a Gaussian reference distribution � Algorithm • C Y (y) is estimated for each feature of each utterance using cumulative histograms • The bins centers are transformed and a piecewise linear transformation is constructed • The transformation is applied to the input features to get the transformed ones José C. Segura, ICSLP’2002 15

Feature Normalization (VII) Feature Normalization (VII) noisy clean José C. Segura, ICSLP’2002 16

Experimental set- -up up Experimental set � Database end-pointing � Noisy TI-digits and SpeechDat Car databases have been automatically end-pointed � SND algorithm is used on clean speech (channel 0) utterances � 200ms of silence have been added at the end-points � Acoustic features � Standard front-end: 12 MFCC + logE � Delta and acceleration coefficients are appended at the recognizer with regression lengths of 7 and 11 frames respectively � Acoustic modeling � One 16 emitting states left-to-right continuous HMM per digit � 3 Gaussian mixture per state José C. Segura, ICSLP’2002 17

Aurora 2 results Aurora 2 results TI-Digits Multi-condition Training A B C Average Rel.Imp. Baseline 88.07 87.22 84.56 87.03 ---- SS 90.94 88.69 86.29 89.11 9.43% SS+HE 90.72 89.74 90.03 90.19 15.42% SS+FD+HE 90.89 89.80 90.11 90.30 17.99% 23.57% TI-Digits Clean-condition Training 35.51% A B C Average Rel.Imp. 37.22% Baseline 58.74 53,40 66.00 58.06 ---- SS 73.71 69.35 75.63 72.35 37.71% SS+HE 82.08 82.61 81.73 82.22 55.59% SS+FD+HE 82.51 82.78 81.87 82.49 56.45% José C. Segura, ICSLP’2002 18

Aurora 3 results Aurora 3 results Finnish WM MM HM Average Rel.Imp. Baseline 92.74 80.51 40.53 75.41 ----- SS 95.09 78.80 69.19 82.91 21.92% SS+HE 94.58 86.53 74.20 86.67 35.10% SS+FD+HE 94.58 86.73 73.11 86.46 35.00% Spanish WM MM HM Average Rel.Imp. Baseline 92.94 83.31 51.55 79.22 ----- 30.54% SS 95.58 89.76 71.94 87.63 39.00% 45.79% SS+HE 96.15 93.15 86.77 93.00 57.00% SS+FD+HE 96.65 94.10 87.03 93.35 61.95% 46.65% German WM MM HM Average Rel.Imp. Baseline 91.20 81.04 73.17 83.14 ----- SS 93.41 86.60 84.32 88.75 30.70% SS+HE 94.79 88.58 89.32 91.25 45.29% SS+FD+HE 94.57 88.07 88.95 90.89 43.00% José C. Segura, ICSLP’2002 19

20 mixtures Aurora 2 results 20 mixtures Aurora 2 results CleanCondition MultiCondition 100 100 90 95 80 90 70 85 Wac (%) Wac (%) 60 80 50 75 BL 3 mix 40 BL 3 mix BL 20 mix BL 20 mix SS+FD+HE 3 mix 70 SS+FD+HE 3 mix 30 SS+FD+HE 20 mix SS+FD+HE 20 mix 65 20 10 60 Clean 20dB 15dB 10dB 5dB 0dB Clean 20dB 15dB 10dB 5dB 0dB Clean Condition Multi Condition Features Absolute Relative Absolute Relative BL 3mix 58.06 --.-- 87.03 --.-- BL 20mix 58.04 4.51% 88.98 26.39% SS+FD+HE 3mix 82.49 56.45% 90.30 17.99% SS+FD+HE 20mix 83.22 62.67% 91.53 41.38% José C. Segura, ICSLP’2002 20

Feature Extraction Combining Feature Extraction Combining Spectral - PowerPoint PPT Presentation

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral Noise Reduction and Cepstral Histogram Equalization Cepstral Histogram Equalization For Robust ASR For Robust ASR J.C. Segura, M.C. Bentez, A.

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

Object based feature extraction of Google based feature extraction of Google Object Earth

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Feature Extraction Aleix M. Martinez aleix@ece.osu.edu Continuous Feature Space Let us now

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

SECTORAL DEBATE PRESENTATION Laying the Foundation Providing the Solutions Preparing for

KPCs Role in Meeting Kuwaits Energy Challenges January 2018 Outline 1 Overview 2

Tanzania RPP-TAP Comments & Recommendations November 1-4, 2010 FCPF Participants Committee

WELCOME CENTRE ORIENTATION FOR INTERNATIONAL STUDENTS OUTLINE o Housing + Getting Settled o

Project L.A.K.E. Logging of Acoustic Keyboard Emanations Using Sound as a Keylogger

A Modular Approach to Electrical Storage & Conversion Angel V. Peterchev, Ph.D. Department

Electric Materials Randy Bowman NASA Glenn Research Center LMA / High Temperature and Smart

Platinum 2008 2008 Interim Review Interim Review Platinum th November 2008 18 th 18 November

Feature Extraction Combining Feature Extraction Combining Spectral - PowerPoint PPT Presentation

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral Noise Reduction and Cepstral Histogram Equalization Cepstral Histogram Equalization For Robust ASR For Robust ASR J.C. Segura, M.C. Bentez, A.

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

Object based feature extraction of Google based feature extraction of Google Object Earth

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Feature Extraction Aleix M. Martinez aleix@ece.osu.edu Continuous Feature Space Let us now

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

SECTORAL DEBATE PRESENTATION Laying the Foundation Providing the Solutions Preparing for

KPCs Role in Meeting Kuwaits Energy Challenges January 2018 Outline 1 Overview 2

Tanzania RPP-TAP Comments &amp; Recommendations November 1-4, 2010 FCPF Participants Committee

WELCOME CENTRE ORIENTATION FOR INTERNATIONAL STUDENTS OUTLINE o Housing + Getting Settled o

Project L.A.K.E. Logging of Acoustic Keyboard Emanations Using Sound as a Keylogger

A Modular Approach to Electrical Storage &amp; Conversion Angel V. Peterchev, Ph.D. Department

Electric Materials Randy Bowman NASA Glenn Research Center LMA / High Temperature and Smart

Platinum 2008 2008 Interim Review Interim Review Platinum th November 2008 18 th 18 November

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Tanzania RPP-TAP Comments & Recommendations November 1-4, 2010 FCPF Participants Committee

A Modular Approach to Electrical Storage & Conversion Angel V. Peterchev, Ph.D. Department