Evaluation of an LSTM-RNN System in Different NIST Language - PowerPoint PPT Presentation

Evaluation of an LSTM-RNN System in Different NIST Language Recognition Frameworks ¡ Ruben Zazo, Alicia Lozano-Diez and Joaquin Gonzalez-Rodriguez {ruben.zazo, alicia.lozano} @uam.es ATVS – Biometric Recognition Group. Universidad Autónoma de Madrid Odyssey 2016.

Outline ¡ 1. Motivation 2. Long Short-Term Memory Recurrent Neural Network (LSTM) 3. System Description 4. Reference i-Vector System 5. Datasets 6. Results (LRE09, LRE15) 7. Conclusions 2/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Motivation Language Identification The process of automatically identifying the language of a given spoken utterance v Most state-of-the-art systems rely on acoustic modeling v i-Vector extraction + Classification stage v Deep Neural Networks seem to outperform i-Vector based approaches when enough data for training is available. v End-to-end v Bottleneck v Senons ¡ ¡ 4/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Motivation: DNNs n Deep Neural Network: q Input: Frame + Context q K hidden layers q Sigmoid q ReLu q Output layer q Softmax q Rely on stacking several acoustic frames in order to model time context ¡ ¡ ¡ ¡Can ¡we ¡model ¡context ¡in ¡a ¡be<er ¡way? ¡ ¡ 5/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Motivation: RNNs n Recurrent Neural Networks: q Input: Same q K hidden layers q Recurrent connections q Output layer q Softmax q Can model temporal context and learn from previous input! -> Good model for sequences! Good ¡theoreAcal ¡model. ¡In ¡pracAce: ¡Vanishing ¡gradient ¡problem ¡ ¡ 6/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Motivation: LSTMs n LSTM - RNNs: q We replace every hidden node with a LSTM block 7/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Long Short-Term Memory Recurrent Neural Network 9/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

System Description v Input Layer: MFCC Shifted Delta Coefficients v No Stacking of Acoustic Frames v One or Two hidden layers v Unidirectional LSTM layers with peepholes v Output Layer: Softmax (same units as target languages). v Cross entropy error function. v Different training subset per iteration: Random chunks of 2 seconds -> 6 hours of audio per language. v Last 10% of output scores averaged to obtain final score. v Multiclass Linear Logistic Regression Calibration is applied to the output of every system (FoCal). ¡ 11/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Reference System v Input features: v MFCC-SDCs , configuration 7-1-3-7 v Each frame represented by a 56-dimensional vector v Same features for the proposed systems v UBM : 1024 Gaussian components v Total Variability space from Baum-Welch statistics: v 400 dimensions v Cosine-based scoring v Implemented in Kaldi v Same calibration technique (FoCal multiclass) ¡ 13/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Datasets v Balanced subset of NIST 2009 LRE 3s task: v VOA only , to avoid unbalanced mix of CTS and VOA v Languages with 200 or more hours available v 8 representative languages: US English, Spanish, Dari, French, Pashto, Russian, Urdu and Chinese Mandarin. v Dev set of NIST LRE 2015 : v Mix of CTS and Broadcast Narrow Band Speech v 20 languages grouped in 6 clusters according to similarity v Amount of training data ranges from .5h to >100h v 15% of data, split in segments of 3, 10 and 30s used as test v Test set of NIST LRE 2015: v Broad range of speech durations. 15/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Results: Discarding Initial Frame Scores Performance (EER) versus percentage of frame outputs discarded 18 17.5 17 EER (%) 16.5 16 15.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Percentage of frame outputs discarded 17/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Results: Balanced Subset of LRE09 (I) v Balanced subset of NIST 2009 LRE , VOA only, 8 lang, 1600h total train v 4 out of 5 systems outperform up to 15% in terms of Cavg the reference i-Vector system. v Proposed architectures have 5 to 21 times fewer parameters . v Fusion of i-Vector and LSTM gives best performance. ¡ 18/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Results: Balanced Subset of LRE09 (II) v Balanced subset of NIST 2009 LRE , VOA only, 8 lang, 1600h total train 19/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Results: Dev set of LRE15 (I) v Dev set of LRE15: 6 clusters, database mismatch, unbalanced sets. v One LSTM per cluster (no inter-cluster trials). v Same architecture than best result in LRE09: 2 hidden layers of size 512 . C avg ¡x ¡100 ¡ System ¡ Ara ¡ Eng ¡ Fren ¡ Iber ¡ Slav ¡ Chin ¡ Avg LSTM ¡ 13.79 ¡ 18.88 ¡ 2.70 ¡ 17.11 ¡ 15.01 ¡ 10.11 ¡ 12.93 ¡ 3s ¡ i-‑vector ¡ 15.59 ¡ 13.91 ¡ 5.68 ¡ 19.96 ¡ 19.71 ¡ 22.06 ¡ 16.15 ¡ Fusion ¡ 11.50 ¡ 12.48 ¡ 2.86 ¡ 13.28 ¡ 13.71 ¡ 9.75 ¡ 10.60 ¡ LSTM ¡ 8.59 ¡ 18.76 ¡ 1.04 ¡ 14.73 ¡ 8.68 ¡ 9.95 ¡ 10.29 ¡ 30s ¡ i-‑vector ¡ 3.08 ¡ 1.99 ¡ 0 ¡ 12.78 ¡ 4.23 ¡ 4.93 ¡ 4.50 ¡ Fusion ¡ 3.06 ¡ 3.87 ¡ 0 ¡ 9.84 ¡ 3.31 ¡ 4.60 ¡ 4.11 ¡ v LSTM system performs better than i-Vector system when facing short durations v Fusion of i-Vector and LSTM gives best and most robust performance 20/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Results: Dev set of LRE15 (II) v Dev set of LRE15: 6 clusters, database mismatch, unbalanced sets. v Results on the 3s task 0.3 LSTM i − vector v LSTM has over 20% Fusion 0.25 relative improvement over 0.2 ref. i-Vector system Cavg 0.15 v Fusion is better and more robust than 0.1 single systems 0.05 0 Arabic English French Iberic Slavic Chinese Average 21/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Results: Test set of LRE15 v Test set of LRE15: Similar to dev set of LRE15 but with continuous durations and a big mismatch between training and testing data. v LSTM system 0.5 LSTM degrades faster in i − vector 0.45 mismatched scenarios Fusion Fusion CV 0.4 v i-Vector handles better 0.35 long utterances 0.3 v Fusion is worse than Cavg single systems 0.25 (mismatch) 0.2 v 2-fold fusion shows 0.15 that the systems are 0.1 learning complementary 0.05 information 0 3 5 10 15 20 25 30 All ¡ Durations (in seconds) 22/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Conclusions v Controlled/Balanced scenario (e.g., LRE09): v 85% less parameters v Over 15% relative improvement v Highly unbalanced scenario, (e.g., LRE15): v Comparable results v Complementary information. Robust fusion . v Strong dependence on mismatch: need for variability compensation. ¡ 24/24 ¡ Ruben ¡Zazo. ¡Odyssey ¡2016. ¡

Evaluation of an LSTM-RNN System in Different NIST Language - PowerPoint PPT Presentation

Evaluation of an LSTM-RNN System in Different NIST Language Recognition Frameworks Ruben Zazo, Alicia Lozano-Diez and Joaquin Gonzalez-Rodriguez {ruben.zazo, alicia.lozano} @uam.es ATVS Biometric Recognition Group. Universidad Autnoma

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System Runbin Shi 1 Junjie

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon NVIDIA Research March 29, 2018

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting LSTM

An Introduction to Neural Networks Long Short Term Memory (LSTM) and the Attention mechanism Ange

EcoRNN : Efficient Computing of LSTM RNN on GPUs Bojian Zheng (Graduate Student), Gennady

Lo Long-short term memory (L (LSTM) Jeong Min Lee CS3750, University of Pittsburgh Outline

Recurrent Neural Network Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline Dependent Random

NIST Gaithersburgs Approach to a Solar PV Array Project John.R.Bollinger@nist.gov 2 NIST

Federal Computer Security Managers Forum Meeting September 10, 2018 NIST Gaithersburg NIST

FEDERAL COMPUTER SECURITY MANAGERS FORUM MEETING FEBRUARY 6, 2020 NIST WEST SQUARE NIST

NIST Trustworthy Email Project High Assurance Domain Project Scott Rose, NIST scottr@nist.gov

Some RNN Variants Arun Mallya Best viewed with Computer Modern fonts installed Outline

CoSMIX: A Compiler-based System for Secure Memory Instrumentation and Execution in Enclaves Meni

Last Class: Paging & Segmentation Paging: divide memory into fixed-sized pages, map to

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos

Slides for Lecture 12 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve

Recurrent Networks: Stability analysis and LSTMs 1 Which open source project? 2 Related math.

& 2019.10.8 Seung-Hoon Na Chonbuk National University

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Evaluation of an LSTM-RNN System in Different NIST Language - PowerPoint PPT Presentation

Evaluation of an LSTM-RNN System in Different NIST Language Recognition Frameworks Ruben Zazo, Alicia Lozano-Diez and Joaquin Gonzalez-Rodriguez {ruben.zazo, alicia.lozano} @uam.es ATVS Biometric Recognition Group. Universidad Autnoma

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN &amp; Gated RNN

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System Runbin Shi 1 Junjie

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon NVIDIA Research March 29, 2018

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting LSTM

An Introduction to Neural Networks Long Short Term Memory (LSTM) and the Attention mechanism Ange

EcoRNN : Efficient Computing of LSTM RNN on GPUs Bojian Zheng (Graduate Student), Gennady

Lo Long-short term memory (L (LSTM) Jeong Min Lee CS3750, University of Pittsburgh Outline

Recurrent Neural Network Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline Dependent Random

NIST Gaithersburgs Approach to a Solar PV Array Project John.R.Bollinger@nist.gov 2 NIST

Federal Computer Security Managers Forum Meeting September 10, 2018 NIST Gaithersburg NIST

FEDERAL COMPUTER SECURITY MANAGERS FORUM MEETING FEBRUARY 6, 2020 NIST WEST SQUARE NIST

NIST Trustworthy Email Project High Assurance Domain Project Scott Rose, NIST scottr@nist.gov

Some RNN Variants Arun Mallya Best viewed with Computer Modern fonts installed Outline

CoSMIX: A Compiler-based System for Secure Memory Instrumentation and Execution in Enclaves Meni

Last Class: Paging &amp; Segmentation Paging: divide memory into fixed-sized pages, map to

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos

Slides for Lecture 12 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve

Recurrent Networks: Stability analysis and LSTMs 1 Which open source project? 2 Related math.

&amp; 2019.10.8 Seung-Hoon Na Chonbuk National University

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

Last Class: Paging & Segmentation Paging: divide memory into fixed-sized pages, map to

& 2019.10.8 Seung-Hoon Na Chonbuk National University