THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 presented by - PowerPoint PPT Presentation

THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 presented by Xiong Xiao 1 , Shengkui Zhao 2 , Duc Hoang Ha Nguyen 3 , Xionghu Zhong 3 , Douglas L. Jones 2 , Eng Siong Chng 1,3 , Haizhou Li 1,3,4 1 Temasek Lab@NTU, Nanyang Technological University, Singapore. 2 Advanced Digital Sciences Center, Singapore. 3 School of Computer Engineering, Nanyang Technological University, Singapore. 4 Department of Human Language Technology, Institute for Infocomm Research, Singapore.

Outline • System Highlights • Speech Enhancement – Delay and Sum + spectral subtraction – MVDR + DNN spectrogram enhancement • Speech Recognition – Multi condition training – Clean condition training • Summary 2

System Highlights • Beamforming – Delay and Sum, MVDR – Classic method, always works! • DNN feature mapping – Mapping reverberant spectrogram to clean spectrogram for enhancement – Mapping reverberant MFCC features to clean features for ASR • DNN acoustic modeling for ASR – Discriminative feature learning and modeling in a single framework. • Feature adaptation (Cross-transform) for ASR – a generalization of temporal filter and fMLLR transform. – explicitly use the correlation between feature frames to counter distortions that have effects over many frames. 3

Speech Enhancement Systems Two speech enhancement systems are considered:  DS beamforming + spectral subtraction (DS+SS);  MVDR beamforming + DNN based spectrogram enhancement (MVDR + DNN). 5

Speech Enhancement – DS + Spectral Susbtraction  DS beamforming  Windowing STFT: 64ms Hanning window,  GCC-PHAT for TDOA estimation,  Multi-channel phase alignment and sum. 75% frame overlap, 1024 point STFT.  Spectral Subtraction  Reverberation time estimation: ML method.  Amplitude spectral subtraction. 6

Speech Enhancement – MVDR + DNN feature mapping  Use DNN to map a window of reverberant feature vectors to a (central) clean feature vector.  Let DNN learn to do dereverberation.  For speech enhancement, input and output are spectrum vectors.  For ASR, input and out are MFCC feature vectors.  Training data: frame aligned clean and multi-condition data.  DNN size: 2827 – 3x3072 – 771  Predict both static and dynamic spectrum, then merge them to produce smoothed 7 static spectrum.

Objective measures – CD and LLR 6.00 Cepstral Distance Both DS+SS and 5.00 MVDR+DNN reduces 4.00 Unprocesed cepstral distances and LLR 3.00 SS (1ch) significantly, especially for DNN (1ch) 2.00 high reverberation cases. DS+SS (8ch) MVDR+DNN (8ch) 1.00 0.00 Near Far Near Far Near Far DNN degrades LLR Room 1 Room 2 Room 3 Ave. significantly for 8-ch low reverberation cases. 0.9 Log Likelihood Ratio 0.8 0.7 0.6 Unprocesed 0.5 SS (1ch) 0.4 DNN (1ch) 0.3 DS+SS (8ch) MVDR+DNN (8ch) 0.2 0.1 0 8 Near Far Near Far Near Far Room 1 Room 2 Room 3 Ave.

Objective measures – fwSegSNR and SRMR 12 fwSegSNR 10 DNN improves fwSegSNR for most cases. 8 Unprocesed SS (1ch) 6 DNN (1ch) 4 DS+SS (8ch) DNN has smaller MVDR+DNN (8ch) 2 improvements in SRMR for real data. 0 Near Far Near Far Near Far Room 1 Room 2 Room 3 Ave. • Generalization problem 7 of DNN. SRMR 6 5 4 Unprocesed 3 SS (1ch) DNN (1ch) 2 DS+SS (8ch) 1 MVDR+DNN (8ch) 0 Near Far Near Far Near Far Near Far 9 Room 1 Room 2 Room 3 Ave. Room1 Ave. SimData RealData

Subjective measures Amont of Reverberation Score MVDR+DNN generally removes more Mean reverberation than DS+SS. Simulated RealData Room 2 Room1 Near Far Near Far Unprocessed 41.5 31.0 28.9 21.5 Processed 52.6 42.7 37.8 38.6 SS But it also introduces more speech 1ch Improvement 11.1 11.7 8.9 17.2 Processed 59.3 51.7 63.9 63.5 distortion and results in poorer quality. DNN Improvement 17.8 20.7 35.0 42.0 Unprocessed 21.5 18.9 14.6 16.6 Processed 47.4 42.1 42.2 30.7 DS+SS 8ch Improvement 25.9 23.2 27.6 14.1 Processed 83.3 50.1 50.2 29.4 Reasons? MVDR+DNN Improvement 61.8 31.2 35.6 12.9 • Frame-by-frame processing of DNN. Overall Quality Score Mean • DNN reduces mean square errors Simulated RealData Room 2 Room1 between predicted log spectrum and Near Far Near Far Unprocessed 36.7 46.3 51.9 42.9 clean log spectrum, not a perceptually Processed 47.9 47.4 45.6 50.2 SS meaningful error. 1ch Improvement 11.2 1.1 -6.3 7.3 Processed 19.6 16.6 16.7 16.4 DNN Improvement -17.1 -29.7 -35.3 -26.5 Unprocessed 37.0 33.8 30.6 25.3 Processed 57.8 55.8 52.0 43.9 DS+SS 8ch Improvement 20.8 22.0 21.4 18.6 Processed 31.9 20.7 15.5 9.3 MVDR+DNN 10 Improvement -5.1 -13.2 -15.1 -16.0

Speech Recognition Systems • MVDR beamforming for 2ch and 8ch. • Clean condition training scheme – Cross Transform Adaptation – CMLLR (256 class) model adaptation. – HMM/GMM model (the challenge baseline settings) • Multi condition training scheme – DNN based feature compensation – DNN based acoustic modeling 12

ASR - Multi-condition training – results • DNN feature mapping (585-3x2048-39) • DNN acoustic modeling (351-7x2048-3500, RBM pretraining + CrossEntropy + SMBR) 40 DNN feature compensation 1ch-w/o DNN feature compensation and DNN acoustic model are 35 1ch-w DNN feature compensation complementary. 30 8ch-w/o DNN feature compensation 25 Reason? 8ch-w DNN feature compensation 20 WER • DNN feature compensation uses 15 parallel corpus and wider context. 10 • 5 Good to have a two concatenated DNN architecture than a big 0 DNN? near far near far near far near far Room1_A Room2_A Room3_A Real Room1_A Simulated Rooms Real Room Avg 13

ASR - Clean-condition training • Use cross transform for feature compensation • Use CMLLR for model adaptation (challenge script) • HMM/GMM system (challenge script) Temporal filtering processes the feature trajectories. Linear transform processes feature vectors. How about combine them? 14

ASR – Cross-transform • Cross-transform is a generalization of both temporal filtering and linear transform. • To adapt the features at a time-frequency location, both the feature vector and feature trajectory that contains the location are used in the regression. Necessary to take the cross- shape to reduce the number of free parameters. 15

ASR - Clean-condition training – Results • Cross-transform (33 frame window size, batch mode) • CMLLR (256 class, batch mode) • HMM/GMM system (Challenge scripts) Cross-transform and CMLLR model adaptation 90 1ch-MVN are complementary. 1ch-CrossTransform 80 1ch-CMLLR 70 Reason: 1ch-CrossTransform+CMLLR 8ch-MVN 60 • Cross-transform uses 8ch-CrossTransform longer context size. 50 8ch-CMLLR WER • Multi-class CMLLR is 8ch-CrossTransform+CMLLR 40 more flexible: different 30 transform for different 20 classes. 10 0 Near Far Near Far Near Far Near Far Room 1 Room 2 Room 3 Room1 SimData RealData Average 16

Summary • Traditional beamforming works well for both speech enhancement and recognition. • DNN reduces reverberation significantly, but also introduces high distortion especially in high reverberation cases. • Cross-transform adapts features using both long term temporal information and spectral information. Complementary to CMLLR. • Future directions – Analyze why DNN produces distortions to speech signal and propose solution. – Apply cross-transform to adaptive training of DNN based acoustic model in multi- condition training scheme. 17

Thank you! 18

THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 presented by - PowerPoint PPT Presentation

THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 presented by Xiong Xiao 1 , Shengkui Zhao 2 , Duc Hoang Ha Nguyen 3 , Xionghu Zhong 3 , Douglas L. Jones 2 , Eng Siong Chng 1,3 , Haizhou Li 1,3,4 1 Temasek Lab@NTU, Nanyang Technological

Implementing an X-ray Implementing an X-ray reverberation model in XSPEC reverberation model in

Introduction to Computer Networks Polly Huang EE NTU http://homepage.ntu.edu.tw/~pollyhuang

Results from the use of the X-ray Results from the use of the X-ray reverberation model KYNREFREV

ELEC 407 DSP Project Algorithmic Reverberation A Hybrid Approach Combining Moorers

vor / H igg gg s Era Exptl NTU NTU George W.S. Hou (NTU) Colloquium 09/22/20 2 Flav/HIG

CDC TRG Special Meeting in NTU Thank you very very much for the support from NTU!!! Yoshihito

BENESCH ADSC West Coast Chapter Annual Meeting Recovering for Unforeseen Conditions May 20,

ADSC: Anchor and Micropile Installation School (AMPIS) AMPIS: Anchor and Micropile Installation

ADSC West Coast Chapter Richard D. Kalson, Esq. rkalson@beneschlaw.com 614.223.5380 (W)

Report on Field Day by Kerry Allen 07-01-10 DSI-LANG GEOTECH What was the ADSC Geo 3 Conference

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

Singapore Day in Rome Presented by: Ms Francisca SIOW Graduate Studies Office About NTU NTU

/ 9/16/2004 ACCESS IC LAB ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO

IOT DEVICES IN REVERBERATION CHAMBER John Kvarnstrand BLUETEST PRESENTATION - IOT WEEK AARHUS

Rapid Stochastic Gradient Descent Accelerating Machine Learning Statistical Machine Learning

SHORT TERM ACTION RECOGNITION PROBLEM ACTION PRIMITIVES, NOT SEQUENCES, NEAR FIELD (BODIES 300

CS 4495 Computer Vision Tracking 3: Follow the pixels Aaron Bobick School of Interactive

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Psychoacoustic impact assessment of smoothed AM/FM resonance signals Antonio Goulart, Marcelo

EXAFS data analysis Giuliana Aquilanti Elettra Sincrotrone Trieste Material almost integrally

Motion capture Captures the subtlety of human motion Keyframing difficult and not as

Carnatic Music: A Computational Perspective Hema A Murthy Department of Computer Science and

THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 presented by - PowerPoint PPT Presentation

THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 presented by Xiong Xiao 1 , Shengkui Zhao 2 , Duc Hoang Ha Nguyen 3 , Xionghu Zhong 3 , Douglas L. Jones 2 , Eng Siong Chng 1,3 , Haizhou Li 1,3,4 1 Temasek Lab@NTU, Nanyang Technological

Implementing an X-ray Implementing an X-ray reverberation model in XSPEC reverberation model in

Introduction to Computer Networks Polly Huang EE NTU http://homepage.ntu.edu.tw/~pollyhuang

Results from the use of the X-ray Results from the use of the X-ray reverberation model KYNREFREV

ELEC 407 DSP Project Algorithmic Reverberation A Hybrid Approach Combining Moorers

vor / H igg gg s Era Exptl NTU NTU George W.S. Hou (NTU) Colloquium 09/22/20 2 Flav/HIG

CDC TRG Special Meeting in NTU Thank you very very much for the support from NTU!!! Yoshihito

BENESCH ADSC West Coast Chapter Annual Meeting Recovering for Unforeseen Conditions May 20,

ADSC: Anchor and Micropile Installation School (AMPIS) AMPIS: Anchor and Micropile Installation

ADSC West Coast Chapter Richard D. Kalson, Esq. rkalson@beneschlaw.com 614.223.5380 (W)

Report on Field Day by Kerry Allen 07-01-10 DSI-LANG GEOTECH What was the ADSC Geo 3 Conference

VAST CHALLENGE 2017 Bianca Barnucz &amp; Stephanie Wegscheidl OVERVIEW VAST Challenge

Singapore Day in Rome Presented by: Ms Francisca SIOW Graduate Studies Office About NTU NTU

/ 9/16/2004 ACCESS IC LAB ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK &amp; ACTION TANK TO

IOT DEVICES IN REVERBERATION CHAMBER John Kvarnstrand BLUETEST PRESENTATION - IOT WEEK AARHUS

Rapid Stochastic Gradient Descent Accelerating Machine Learning Statistical Machine Learning

SHORT TERM ACTION RECOGNITION PROBLEM ACTION PRIMITIVES, NOT SEQUENCES, NEAR FIELD (BODIES 300

CS 4495 Computer Vision Tracking 3: Follow the pixels Aaron Bobick School of Interactive

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Psychoacoustic impact assessment of smoothed AM/FM resonance signals Antonio Goulart, Marcelo

EXAFS data analysis Giuliana Aquilanti Elettra Sincrotrone Trieste Material almost integrally

Motion capture Captures the subtlety of human motion Keyframing difficult and not as

Carnatic Music: A Computational Perspective Hema A Murthy Department of Computer Science and

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO