Subproject II: Robustness in Speech Recognition Members (1/2) - PowerPoint PPT Presentation

Subproject II: Robustness in Speech Recognition

Members (1/2) Jen-Tzung Chien Hsiao-Chuan Wang (Co-PI) (PI) National Cheng Kung National Tsing Hua University University Jeih-Weih Hung Lin-shan Lee (Co-PI) National Taiwan National Chi Nan University University Sin-Horng Chen Hsin-min Wang National Chiao Academia Sinica Tung University

Members (2/2) Yih-Ru Wang National Chiao Tung University Yuan-Fu Liao National Taipei University of Technology Berlin Chen National Taiwan Normal University

Research Theme Signal Level Input Feature Signal speech Extraction Processing & Transformation Speech Decoding Output including Word Graph Rescoring Recognition Results Adaptive Adaptive Adaptive HMM Pronunciation Language Models Lexicon Models Model Level Lexical level

Research Roadmap Current Achievements � � Future Directions & Applications •Speech enhancement & •Speech recognition in different wavelet processing adverse environments, e.g. car, home, etc. •Cepstral moment normalization & temporal filtering & •Robust broadcast news •Microphone array and noise transcription cancellation approaches •Discriminative adaptation for •Lecture speech recognition acoustic and linguistic models •Spontaneous speech recognition •Maximum entropy modeling & data mining algorithm •Next generation automatic speech recognition •Robust language modeling •Powerful machine learning approaches for complicated robustness problems

Signal Level Approaches � Speech Enhancement – Harmonic retaining, perceptual factor analysis, etc. � Robust Feature Representation – Higher-order cepstral moment normalization, data-driven temporal filtering, etc. � Microphone Array Processing – Microphone array with post-filtering, etc. � Missing-Feature Approach – Sub-space missing feature imputation and environment sniffing, mismatch-aware stochastic matching, etc.

Higher-Order Cepstral Moment Normalization (HOCMN) (1/3) � Cepstral Feature Normalization Widely Used – CMS: normalizing the first moment – CMVN: normalizing the first and second moments – HEQ: normalizing the full distribution (all order moments) – How about normalizing a few higher order moments only? – Disturbances of larger magnitudes may be the major sources of recognition errors, which are better reflected in higher order moments

Higher-Order Cepstral Moment Normalization (HOCMN) (2/3) � Experimental results : Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C) 83.00 82.00 (b) 81.00 (a) 80.00 CMVN (L=86) 79.00 78.00 (a) HOCMN[1,N] (full-utterance) (1st and N-th moments 77.00 normalized) (b) HOCMN[1,N](L=86) 76.00 CMVN 75.00 N (even integer) 74.00 0 10 20 30 40 50 60 70

Higher-Order Cepstral Moment Normalization (HOCMN) (3/3) � Experimental Results : Aurora 2, clean condition training, word accuracy averaged over 0~20dB for each type of noise condition 86.00 Set A Set B Set C 84.00 82.00 80.00 CMVN 78.00 HOCMN[1,5,100] HEQ 76.00 74.00 72.00 n y e r t . C C . . t t n a o e g g l g r a n o b C o . . i v e v v w a y t t i b A r p A e A i r t a t b a b u r a e S w i i t r u B a B C A A S h b t t S S x s u t t t e E e e e S R S S S � HOCMN is significantly better than CMVN for all types of noise � HOCMN is better than HEQ in most types of noise except for the “Subway” and “Street” noise

Data-Driven Temporal Filtering � Developed filters were performed on the temporal domain of the original features � These filters can be derived in a data-driven manner according to the criteria of PCA/LDA/MCE � They can be integrated with Cepstral mean and variance normalization (CMVN) to achieve further performance

Microphone Array Processing (1/3) � Integrated with Model Level Approaches (MLLR) Model Adaptation MLLR Adapted HMM Speech Input Initial HMM Adaptation Parameters Using Parameters Microphone Array Delay Delay-and-Sum Enhanced Speech Result Estimator Beamformer signal Recognition Speech Recognition Speech Enhancement Using Time Domain Coherence Measure (TDCM)

Microphone Array Processing (2/3) � Further Improved with Wiener Filtering and Spectral Weighting Function (SWF) x 1 x X τ W Improved Weight ˆ 2 1 Wiener Filter Selection ˆ W FFT x τ ˆ 3 2 ˆ S W s ˆ ╳ IFFT x τ 4 ˆ 3 Spectral X Weighting Function x X Delay-and-Sum FFT Beamformer

Microphone Array Processing (3/3) � Applications for In-Car Speech Recognition – Power Spectral Coherence Measure (PSCM) used to estimate the time delay Microphone Array Air Air 45º Conditioner Conditioner wheel 90cm speaker personal Fan noise computer Physical configuration Configuration in car

Model Level Approaches � Improved Parallel Model Combination � Bayesian Learning of Speech Duration Models � Aggregate a Posteriori Linear Regression Adaptation

Aggregate a Posteriori Linear Regression (AAPLR) (1/3) � Discriminative Linear Regression Adaptation � Prior Density of Regression Matrix is Incorporated to Construct Bayesian Learning Capabilities � Closed-form Solution Obtained for Rapid Adaptation Prior information of Discriminative criterion regression matrix AAPLR Bayesian Closed form Learning solution

Aggregate a Posteriori Linear Regression (AAPLR) (2/3) ˆ λ ˆ W W p ( X , m , ) g ( ) N M , m n r m r m � MAPLR ∑∑ = = ˆ ˆ J ( W ) R ( W W ) log MAPLR p ( X ) = = m 1 1 n , m n λ p ( X W , ) P g ( W ) N M 1 m ∑∑ � AAPLR = , m n r m m r J ( W ) AAPLR M p ( X ) = = m 1 n 1 m , n ─ aggregated over all model classes m with probabilities P m � Discriminative Training N ( ) M 1 m ∑∑ = AAPLR l W ( ) J d AAPLR m M = = m 1 n 1 η 1 / ⎧ ⎫ 1 ∑ = λ − η λ AAPLR ⎨ ⎬ d g ( X ; , W ) log exp[ g ( X ; , W )] ( ) − ( ) m m m r m j j r j ⎩ M 1 ⎭ ≠ m j λ = λ g ( X ; , W ) log{ p ( X W , ) g ( W )} , m m r m n r m r

Aggregate a Posteriori Linear Regression (AAPLR) (3/3) � Comparison with Other Approaches Estimation Criterion Discriminative Bayesian Closed- adaptation learning form solution ML MAP MCE MMI AAP ○ MLLR No No Yes ○ MAPLR No Yes Yes ○ MCELR Yes No No ○ ○ CMLLR Yes No Yes ○ ○ AAPLR Yes Yes Yes

Lexical Level Approaches � Pronunciation Modeling for Spontaneous Mandarin Speech � Language Model Adaptation – Latent Semantic Analysis and Smoothing – Maximum Entropy Principle � Association Pattern Language Model

Pronunciation Modeling for Spontaneous Mandarin Speech � Automatically Constructing Multiple-pronunciation Lexicon using a Three-stage Framework to Reduce Confusion Introduced by the Added Pronunciations Automatically Ranking the Keeping only the generating possible pronunciations to necessary surface forms but avoid confusion across pronunciations to avoiding confusion different words avoid confusion across across different words different words

Association Pattern Language Model (1/5) � N-grams Consider only Local Relations � Trigger pairs Consider Long-distance Relations, but only for Two Associated Words � Word Associations Can Be Expanded for More than Two Distant Words � A New Algorithm to Discover Association Patterns via Data Mining Techniques

Association Pattern Language Model (2/5) � Bigram & Trigram trigram ... bigram Twin bigram Towers bigram bigram Sept. bigram 11 bigram George Bush trigram � Trigger Pairs trigger pair ... bigram Twin bigram Towers bigram bigram Sept. bigram 11 bigram George Bush trigger pair

Association Pattern Language Model (3/5) � Association Patterns association pattern ... bigram Twin bigram Towers bigram bigram Sept. bigram 11 bigram Bush George association pattern

Association Pattern Language Model (4/5) � Association Pattern Mining Procedure

Association Pattern Language Model (5/5) � Association Pattern Set Ω AS Covering Different Association Steps Constructed � Merge Mutual Information of All Association Patterns q p ( W , w ) − → = a 1 j q MI ( W w ) log − 1 a j q p ( W ) p ( w ) − a 1 j L S ∑ ∑ ∑ = + → s , q s log p ( W ) log p ( w ) MI ( W w ) − AS q a 1 j = = s , q → ∈ Ω s q 1 s 1 W w − j AS a 1 � Association Pattern n -gram Estimated ~ = + log p ( W ) a log p ( W ) a log p ( W ) 1 AS 2

Subproject II: Robustness in Speech Recognition Members (1/2) - PowerPoint PPT Presentation

Subproject II: Robustness in Speech Recognition Members (1/2) Jen-Tzung Chien Hsiao-Chuan Wang (Co-PI) (PI) National Cheng Kung National Tsing Hua University University Jeih-Weih Hung Lin-shan Lee (Co-PI) National Taiwan National Chi

Subproject of Diagnosis and control of bacterial pathogen agent of pre- and post-harvest rice

Subproject 2: Development of an Integrated Culturally Centered Care Model to Address Depression

Biosens II Background Biosens II Subproject 2.3 Optimal replacement policies for dairy

Subproject 1: Dairy cattle Filippo Biscarini, University of Gttingen Wageningen, 16 March 2011

Ethical Problems and Breeding Goals Subproject 3: Pigs Sandra Edwards Newcastle University

Subproject 2: Improving low input Sheep production systems Herv Hoste, UMR 1225 INRA/ENVT,

Ethical problems and breeding goals Subproject 1: Dairy cattle Henner Simianer Department of

Subproject 2: Sheep Ethical Problems and Breeding Goals Alexandros Stefanakis, Smaro Sotiraki,

Subproject 4: Laying Hens Ferry Leenstra, Veronika Maurer, Monique Bestman, Esther Zeltner, Thea

Land Cover Changes in the Western Siberian Corn-Belt Implementation of a remote sensing-based

Revealing the Mill Race: An exploration of collaborative flood risk ontologies Louise Mullagh and

The North Vision The plan The North If the North were a country, it would have the 10th

Presentation TTIP Stakeholder event July 15 th 1. ONCE AGAIN, REITERATING THE EUCOLAIT POSITION

The Strategy For Plastic Waste Reduction Zero Plastic to Landfill 100% Recovery Michel

Next Generation Speech Science and Technologies - A Cross-Country Joint Project for Collaboration

POSTER PRESENTATIONS Date & Time: Friday, August 3, 2018 @ 4:30 p.m. to 6:00 p.m. Room:

Eco-Industrial Estate/Park By Asst. Prof. Dr. Kitikorn Charmondusit September 7, 2010

Marijuana Policy Workshop June 13, 2017 ABOUT HDL COMPANIES Serves: 400 Cities 44

Higher Education Sector: The Markets Perspective November 2019 Section 1 Sector Outlook and

FOREST BIOMASS BUSINESS CENTER Cathy LeBlanc, Executive Director, Camptonville Community

Operations and Water Temperature Modeling Wednesday, September 22, 2010 Agenda Background

Marysville Levee Commission Welcome Introductions and Housekeeping Meeting Objectives

Decision on request for reliability must-run designations Neil Millar Executive Director,

Yuba IRWM/RWMG Meeting, May 20, 2020 ( Thank you for Joining us! Please ensure your phone is muted

Sambuz

Useful Links

Newsletter

Mail Us

Subproject II: Robustness in Speech Recognition Members (1/2) - PowerPoint PPT Presentation

Subproject II: Robustness in Speech Recognition Members (1/2) Jen-Tzung Chien Hsiao-Chuan Wang (Co-PI) (PI) National Cheng Kung National Tsing Hua University University Jeih-Weih Hung Lin-shan Lee (Co-PI) National Taiwan National Chi

Subproject of Diagnosis and control of bacterial pathogen agent of pre- and post-harvest rice

Subproject 2: Development of an Integrated Culturally Centered Care Model to Address Depression

Biosens II Background Biosens II Subproject 2.3 Optimal replacement policies for dairy

Subproject 1: Dairy cattle Filippo Biscarini, University of Gttingen Wageningen, 16 March 2011

Ethical Problems and Breeding Goals Subproject 3: Pigs Sandra Edwards Newcastle University

Subproject 2: Improving low input Sheep production systems Herv Hoste, UMR 1225 INRA/ENVT,

Ethical problems and breeding goals Subproject 1: Dairy cattle Henner Simianer Department of

Subproject 2: Sheep Ethical Problems and Breeding Goals Alexandros Stefanakis, Smaro Sotiraki,

Subproject 4: Laying Hens Ferry Leenstra, Veronika Maurer, Monique Bestman, Esther Zeltner, Thea

Land Cover Changes in the Western Siberian Corn-Belt Implementation of a remote sensing-based

Revealing the Mill Race: An exploration of collaborative flood risk ontologies Louise Mullagh and

The North Vision The plan The North If the North were a country, it would have the 10th

Presentation TTIP Stakeholder event July 15 th 1. ONCE AGAIN, REITERATING THE EUCOLAIT POSITION

The Strategy For Plastic Waste Reduction Zero Plastic to Landfill 100% Recovery Michel

Next Generation Speech Science and Technologies - A Cross-Country Joint Project for Collaboration

POSTER PRESENTATIONS Date &amp; Time: Friday, August 3, 2018 @ 4:30 p.m. to 6:00 p.m. Room:

Eco-Industrial Estate/Park By Asst. Prof. Dr. Kitikorn Charmondusit September 7, 2010

Marijuana Policy Workshop June 13, 2017 ABOUT HDL COMPANIES Serves: 400 Cities 44

Higher Education Sector: The Markets Perspective November 2019 Section 1 Sector Outlook and

FOREST BIOMASS BUSINESS CENTER Cathy LeBlanc, Executive Director, Camptonville Community

Operations and Water Temperature Modeling Wednesday, September 22, 2010 Agenda Background

Marysville Levee Commission Welcome Introductions and Housekeeping Meeting Objectives

Decision on request for reliability must-run designations Neil Millar Executive Director,

Yuba IRWM/RWMG Meeting, May 20, 2020 ( Thank you for Joining us! Please ensure your phone is muted

Sambuz

Useful Links

Newsletter

Mail Us

POSTER PRESENTATIONS Date & Time: Friday, August 3, 2018 @ 4:30 p.m. to 6:00 p.m. Room: