BAT System Description for NIST LRE 2015 BUT+Agnitio+Torino - PowerPoint PPT Presentation

BAT System Description for NIST LRE 2015 BUT+Agnitio+Torino Oldrich Plchot, Pavel Matejka, Radek Fer, Ondrej Glembek,Ondrej Novotny, Jan Pesan, Lukas Burget, Martin Karafiat, Karel Vesely, Lucas Ondel, Santosh Kesiraju, Frantisek Grezl, Sri Harish Mallidi (JHU), Ruizhi Li (JHU), Niko Brummer, Albert Swart, Sandro Cumani June 22, Bilbao, Odyssey 2016

Data ● Fixed training condition ○ Train - 60% of training data, short cuts generated evenly from 3 to 30 seconds ○ Dev - 40% - short cuts ranging from 3 to 30 seconds with uniform distribution ● Open training condition ○ all relevant data we managed to find ;) (no Babel data for i-vec, just for BN features) ○ main additions are KALAKA-3 (European Spanish, British English) and Arabic - Al Jazeera free corpus ● Details in our system description / Odyssey paper

Stacked Bottleneck features (SBN) ● Based on a hierarchy of two NNs. Bottlenecks from the first network are stacked in time and used as inputs to the second NN. ● Bottlenecks from the second NN are the final features. ● Fixed condition training data ○ Switchboard with ~7k triphone state targets ○ LRE15 training data with labels obtained using acoustic unit discovery tool (200 3-state units) ● Open condition training data ○ 17 languages from Babel project (IARPA) as Multilingual BN - with ~100 phone states per language

General system overview ● i-vector based systems : ○ Features: ■ DNN bottlenecks trained on ● Switchboard English (Fixed cond.) ● Babel data – multilingual bottleneck features (Open cond.) ■ MFCC-SDC+PLLR (phone LLH ratios) ○ 2048 Full or Diagonal GMM/UBM, 600 dimensional i-vectors ○ Gaussian Linear Classifier (GLC) seems sufficient ■ Including i-vector uncertainty in scoring helps ● Frame Level Sequence Summarizing NN ( SSNN )

Fusion with Prior-weighted Logistic Regression ● Fusion is trained on dev data in score domain ● One weight per system and one bias per language ● Cluster prior: For the data of each cluster, we used a cluster-specific prior, with zero probabilities for out-of-cluster languages and equal weights within the cluster. ● Alternative system to allow between cluster analysis: Uniform prior: (flat) over all languages

Fixed Training Condition Fusion EVL Single systems DEV DEV EVL System name cavg* cavg/cavg* cavg* cavg System name classf Primary 1.9 SBN80-SWB1-KALDI--CD GLC COV 2.41 Alternate1 1.24 SBN80-SWB1--CD NN 2.80 SDC-PLLR--CD GLC 4.72 SBN80-AUTO600-KALDI--CD GLC COV 5.46 SSNN/ Alternate 2 NN 10.46 SBN80-SWB1-KALDI--CD/ Alt3 GLC 2.31

Fixed Training Condition Fusion EVL Single systems DEV DEV EVL System name cavg* cavg/cavg* cavg* cavg System name classf Primary 1.9 18.1 / 13.5 SBN80-SWB1-KALDI--CD GLC COV 2.41 16.9 Alternate1 1.24 19.4 / 13.4 SBN80-SWB1--CD NN 2.80 19.9 SDC-PLLR--CD GLC 4.72 22.0 SBN80-AUTO600-KALDI--CD GLC COV 5.46 27.0 SSNN/ Alternate 2 NN 10.46 35.0 SBN80-SWB1-KALDI--CD/ Alt3 GLC 2.31 18.48 ● Eval: Single best system better than Primary fusion ● Calibration ○ Almost no calibration loss on Dev ○ Fairly large calibration loss on eval

Cluster dependent i--vector ● Average of scores from 6 systems, where ● UBM is trained only on data in a given cluster DEV EVAL Fixed Training Condition cavg* cavg cavg* SYSTEM NAME SBN80-SWB1-KALDI 2.9 20.1 16.2 SBN80-SWB1-KALDI-CD 2.5 19.7 15.4 SBN80-SWB1-KALDI-CD diag 2.3 18.5 14.9

Sequence Summarizing NN

Open Training Condition Single systems Fusion EVL DEV DEV EVL cavg* cavg System name cavg* cavg/cavg* System name classf SSNN NN 30.0 Primary 7.14 ML-17-SBN-CD GLC 8.8 Alternate1 7.15 MultilangRDT GLC 10.4 SBN80-SWB1-KALDI--CD GLC 10.4 SDC-PLLR-CD NN 12.7 SNB80-AUTO600-KALDI NN 15.6 ML-17-SBN - trained on Open GLC COV 8.9

Open Training Condition Single systems Fusion EVL DEV DEV EVL cavg* cavg System name cavg* cavg/cavg* System name classf SSNN NN 30.0 41.3 Primary 7.14 14.1 / 10.3 ML-17-SBN-CD GLC 8.8 13.9 Alternate1 7.15 14.1 / 10.4 MultilangRDT GLC 10.4 13.6 SBN80-SWB1-KALDI--CD GLC 10.4 17.6 SDC-PLLR-CD NN 12.7 21.4 SNB80-AUTO600-KALDI NN 15.6 25.0 ML-17-SBN - trained on Open GLC COV 8.9 12.0 ● Single best system trained fully on Open Training condition better than fusion

Analysis of training data - Analysis of using different training data for UBM/ivec and classifier - Important to train i--vector and classifier on Open dataset F …. Fixed Training data UBM/IVEC_Classifier O … Open Submission

Comparison of different features - Fixed Training Condition - all systems with 2048G FullCov UBM, 600 ivec and Gaussian classifier * 16.1 20.1 * 19.7 22.1 28.9 23.8 - Violates fixed data condition * (post eval analysis only)

French cluster disaster ● Radio vs. Telephone in DEV - most probably overtrained for channel ● Channel is taking over on the EVAL data ● Calibration on eval data is not able to fix a wrong classifier

Comparison of different i-vector classifiers - Different classifiers performs similarly - Gaussian Linear Classifier (GLC) - Language Dependent Ivector (LDI) - Multiclass Multivariate Fully Bayesian Gaussian Classifier (MMFBG) - Neural Network - Logistic Regression

Automatically derived acoustic units for BN training - Variational Bayes trained Dirichlet Process mixture of HMMs - Open loop of infinite number of phone-like units - 3 state HMMs, 2 Gaussians per state - 2048G FullCov UBM, 600 ivec and GLC + cuts DEV EVAL Fixed data condition cavg* cavg/cavg* Features MFCC-SDC 6.3 23.8 / 21.5 SBN80-AUTO600-KALDI 5.4 28.9 / 24.2 SBN80-SWB1-KALDI 2.9 20.1 / 16.2 - we can do better than SDC baseline on DEV even without transcription - conventional bottleneck trained on (probably) any data is still better

Conclusion - lessons learned ● State-of-the-art system is i--vector system with Bottleneck features ● GLC with uncertainty performs similar to GLC trained with a lot of small cuts ● Phonotactic systems do not contribute to the final fusion ● Data engineering is always important ● Frame level NN approaches ○ prone to overtraining ○ Better to use NN as a source of counts which are modelled by other classifier ● Other systems ○ Denoising/Dereverberation with NN - helping on EVL but not on DEV ○ Phonotactic systems - with Switchboard phoneme recognizer ○ Frame level DNN

THANK YOU

BAT System Description for NIST LRE 2015 BUT+Agnitio+Torino - PowerPoint PPT Presentation

BAT System Description for NIST LRE 2015 BUT+Agnitio+Torino Oldrich Plchot, Pavel Matejka, Radek Fer, Ondrej Glembek,Ondrej Novotny, Jan Pesan, Lukas Burget, Martin Karafiat, Karel Vesely, Lucas Ondel, Santosh Kesiraju, Frantisek Grezl, Sri

TRECVID 2018 Video to Text Description Asad A. Butt NIST George Awad NIST; Dakota Consulting,

The MITLL NIST LRE 2015 Language Recognition System* Contributors in alphabetical order Najim

NIST Gaithersburgs Approach to a Solar PV Array Project John.R.Bollinger@nist.gov 2 NIST

The Sheffield language recognition system in NIST LRE 2015 Raymond Ng, Mauro Nicolao, Oscar Saz,

University of the Basque Country (EHU) Systems for the NIST 2011 LRE Mikel Penagarikano, Amparo

NIST Cybersecurity Framework Sean Sweeney, Information Security Officer 5/20/2015 Overview

Hazard 14: Noise Healthy Home Rating System (HHRS) 2015 1 Description of hazard Covers

Fermilab LBNF CF Far Detector BSI Facilities Engineering Services Section 8/26/2015

Style Guide for Voting System Documentation: Why User-Centered Documentation Matters to Voting

Usefulness of Existing Iris Databases and Future Priorities George W. Quinn NIST gw@nist.gov

NIST Trustworthy Email Project High Assurance Domain Project Scott Rose, NIST scottr@nist.gov

Chapter 1 Description of the climate system and of its components Climate system dynamics and

The SRI NIST SRE10 Speaker Verification System L. Ferrer, M. Graciarena, S. Kajarekar, N.

IT ITD D Cy Cyber ber Secu curi rity ty The NIST Framework High Value for ITS Shannon

The SRI NIST SRE08 Speaker Verification System M. Graciarena, S. Kajarekar, N. Scheffer E.

Business System of Nordgold (BSN) Description In 2012, Nordgold took an active decision to make

Business Associate Liability and Other Issues OCR/NIST 2015 Security Rule Conference September

Tomasz Plawski Jefferson Lab OPS Stay Retreat, July 15th, 2015 Outline MO System Description

Health Behaviors Coach A New Job Description for a New Health Care System A Proposal from The

Outline Problem Description Proposed System System Architecture Description of

OUROBOROS-R, an IND-CPA KEM based on Rank Metric NIST First Post-Quantum Cryptography

Sanitation and Drainage Healthy Home Rating System (HHRS) 2015 1 Description of the Hazard

Lattice-based public-key cryptosystems D. J. Bernstein NIST post-quantum competition: 69

Sigfox System Description Juan Carlos Zuniga Benoit Ponsard