The MITLL NIST LRE 2015 Language Recognition System* Contributors - PowerPoint PPT Presentation

The MITLL NIST LRE 2015 Language Recognition System* Contributors in alphabetical order Najim Dehak**, Elizabeth Godoy, Douglas Reynolds Fred Richardson, Stephen Shum**, Elliot Singer Doug Sturim, Pedro Torres-Carrasquillo ** Johns Hopkins University ***Spoken Language System Group, MIT-CSAIL * This work was sponsored by the Department of Defense under Air Force contract F19628-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

Outline • Systems • Development Data • Evaluation Results • Observations Odyssey 2016 PAT 2

LRE15 Systems - I • Classic I-Vector systems – IVEC: cep + sdc features – PITCH1: cep + sdc + log_F0 + D log_F0 features • ASR DNN / I-Vector systems – BNF1, BNF2: DNN bottleneck features – PITCH2: DNN bottleneck + log_F0+ D log_F0 features – STATS: DNN posteriors and cep+sdc features • ASR DNN / GMM-MMI – MMI: GMM-MMI classifier using DNN bottleneck features • Multilingual ASR DNN / I-Vector system (Open data task) – MLBNF: 5 Babel language DNN bottleneck features Odyssey 2016 All ivec systems scored with LDA+WCCN or WCCN. PAT 3

LRE15 Systems - II • Unsupervised Unit Discovery DNN / I-Vector system – BAUD: DNN bottleneck features • DNN Counts Subspace Multinomial Model systems – CNT1: Counts from ASR DNN layers – CNT2: Counts from LID DNN layers – CNT3: Joint subspace of CNT1 and CNT2 counts • Calibration and Fusion – Multiclass calibration followed by linear fusion – Duration weighting on system scores – Per system calibration: MMI-trained Gaussian – Linear fusion optimized with logistic regression Odyssey 2016 All ivec systems scored with LDA+WCCN or WCCN. PAT 4

Fixed Development Data Preparation • Randomly divided the development data by file count – 60% train – 40% test • Augmented both train and test sets with variable duration segmentation (uniform distribution between 3-30 secs) – Allowed for duration calibration in test – Found that duration augmentation of train data improved performance – Other forms of augmentation (warping pitch, spectrum, speed) did not show any appreciable gains • For submissions, calibration and fusion trained using scores from train+test sets Odyssey 2016 PAT 6

Open Development Data Preparation • Found other data sources for all languages – LRE07, 09, 11, OHSU, OGI-22, Fisher, Callfriend, Babel, Ahumada, MI5-UK, Appen, Qatar-Dialect, Kalaka – Types of speech: CTS, BNBS, BWBS – All data audited • Extra data used for language model training – Used fixed data test set for performance estimation • The multi-lingual DNN was the only system to explicitly rely on using extra data • During development found that using all the extra data hurt performance – Only 3 of the languages contributed to improved performance (Brazilian Portuguese, British English, and Arabic MSA) Odyssey 2016 CTS = Conversational Telephone Speech; BNBS = Broadcast Narrow Band Speech; PAT BWBS = Broadcast Wide Band Speech 7

Development Results Primary Systems 0.06 0.05 arabic 0.04 COST chinese english 0.03 french iberian 0.02 slavic average 0.01 0 Fixed Primary Open Primary Odyssey 2016 PAT 8

Fixed Primary Component Breakout • Primary not far from oracle fusion • Unsupervised BAUD does almost as well as single best ASR DNN BNF1 0.3 Best 0.25 Single System 0.2 0.176 0.173 Cost Average 0.15 Sans Français 0.093 0.089 0.1 0.05 0 BAUD CNT1 BNF1 PITCH1 STATS PRIMARY Oracle Odyssey 2016 PAT 10

Fixed Primary Per-Cluster Breakout • We have analysis showing that BNBS vs. CTS is a major effect in French cluster • Arabic and Iberian clusters have the highest costs after French – Language / source?* 0.25 0.2 0.15 Cost 0.1 0.05 0 arabic chinese english french iberian slavic average avg_nofr *MSA and Portuguese are least confusable languages in their Odyssey 2016 PAT clusters (both dominated by BNBS) 11

French Cluster Analysis • Type (BNBS vs. CTS) appears to be a large factor in dev/eval mismatch WAF HAITIAN CTS BNBS HAITIAN WAF Odyssey 2016 PAT 12

Slavic Cluster Analysis • Type (BNBS vs. CTS) is a factor but does not affect language separation BNBS BNBS RUSSIAN CTS CTS Odyssey 2016 PAT 13

Open Primary Component Breakout • Minor improvement using extra data • Multilingual BNF has slight gain over BNF1 0.3 Best 0.25 Single System 0.2 0.169 0.167 Cost Average 0.15 Sans Français 0.1 0.086 0.084 0.05 0 CNT1 BNF1 PITCH1 MLBNF STATS PRIMARY Oracle Odyssey 2016 PAT 14

Open Task Adding Data to Arabic • Looked at effect of Source Languages Audit Files Speech (hrs) adding extra data to Iraqi Appen Appen 2012 121.90 Levantine Arabic languages Fisher Levantine LDC 1572 120.69 Iraqi Levantine LRE11 LDC 2727 29.89 Maghrebi MSA Egyptian Levantine Mechanical Qatar 20056 122.91 Maghrebi Turk MSA • Bottom line: extra data System Cost provided little gain or hurt Baseline 0.2292 performance on eval Baseline+Appen 0.2235 Baseline+Fisher 0.2255 • Post-eval? Baseline+LRE11 0.2155 Baseline+Qatar 0.2604 Odyssey 2016 PAT 15

Post-eval Experiments Highlights • Additional data – After revisiting open-set submission, training with all data available would have reduced “French” cluster error • Multilingual – Work in progress but reductions observed for some configurations that include a more diverse set of languages Odyssey 2016 PAT 16

Post-eval Experiments Highlights • Spanish errors – 50 samples chosen randomly – Main issues present on these errors • Cuban females (10) • Little speech content (5-7) • English errors – 50 samples chosen randomly – Main issues present on these errors • 80% errors do not involved Indian English • 5 files with no or little speech content Odyssey 2016 PAT 17

Observations • DNN Bottleneck features used in an i-vector system continues to be best single system • Fusion with count (phonotactic) systems provides moderate gains • Possible factors affecting performance this year – Language confusability (amplified by short durations) – Source mismatch (BNBS vs. CTS) • Adding more data did not solve the problem… on dev set • Path forward – Need to better focus on robustness over wider conditions vs. incremental improvements over narrow conditions Odyssey 2016 PAT 18

Odyssey 2016 PAT 19

Fixed Development Data Speech Speech CODE LANGUAGE # Cuts CODE LANGUAGE # Cuts (hrs) (hrs) ara-acm Iraqi 2206 75.59 por-brz Braz. Port. 1838 5.96 ara-apc Levantine 4073 266.67 qsl-pol Polish 695 32.14 ara-arb MSA 912 8.18 qsl-rus Russian 2021 37.80 ara-ary Maghrebi 919 46.91 spa-car Carib. Spa. 194 30.59 ara-arz Egyptian 440 97.27 spa-eur Eur. Spa. 366 8.55 eng-gbr British Eng. 147 2.10 spa-lac Lat. Am. Spa. 160 15.30 eng-sas Indian Eng. 1689 25.37 zho-cdo Min 209 6.46 eng-usg Amer. Eng. 2448 165.92 zho-cmn Mandarin 4131 200.70 fre-hat Hatian Cr. 2192 110.79 zho-wuu Wu 234 10.36 fre-waf West Afr. Fr. 1229 7.02 zho-yue Cantonese 2382 123.61 Odyssey 2016 PAT 20

Open Development Data Preparation Sources Type Cuts LANGUAGE Arabic.egyptian None Arabic.iraqi LRE11, Appen CTS 1788 Arabic.levantine LRE11, Fisher, Appen CTS 3623 Arabic.maghrebi LRE11 BNBS 505 Arabic.msa LRE11 BNBS 506 Chinese.cantonese LRE09, Babel CTS, BNBS 2359 Chinese.mandarin LRE05-07-09-11, Callfriend, CTS, BNBS 3693 OHSU Chinese.minnan LRE07-09 CTS 168 Chinese.wu LRE07-09 CTS 189 Spanish.caribbean LRE07 CTS 74 Spanish.european Ahumada CTS 328 Spanish.latinamerican OHSU (Mexican) CTS 130 Portuguese.brazilian LRE09, OGI-22, VOA CTS, BNBS 1791 scrape English.american LRE05-07-09-11, Callfriend, CTS 2088 OHSU English.indian LRE07-09-11, OHSU, OGI- CTS 1271 22 English.british UK-MI5 SID CTS 148 Polish LRE11 CTS, BNBS 208 Russian LRE07-09-11, Callfriend CTS, BNBS 1551 Missing West African French LRE09, VOA scrape BNBS 1195 Qatar and Kalaka Haitian Creole Babel, VOA scrape CTS, BNBS 1869 Odyssey 2016 PAT 21

Calibration and Fusion Backend Duration System System Fusion Norm Calibration s 1,1 priors LL 1,1 Detector Weight Duration MMI … … w 1 Scale Gaussian 1 s 1,M LL 1,M … LL 1 P 1 s k,1 LL k,1 S Bayes’ Detector Weight … … Duration MMI … … U w k k Scale Gaussian Rule s k,M M LL M P M LL k,M … s K,1 LL K,1 Detector Duration MMI Weight … … K Scale Gaussian w K s K,M LL K,M N # frames (N)  d  k a N k Odyssey 2016 PAT 22

The MITLL NIST LRE 2015 Language Recognition System* Contributors - PowerPoint PPT Presentation

The MITLL NIST LRE 2015 Language Recognition System* Contributors in alphabetical order Najim Dehak, Elizabeth Godoy, Douglas Reynolds Fred Richardson, Stephen Shum, Elliot Singer Doug Sturim, Pedro Torres-Carrasquillo ** Johns Hopkins

The Sheffield language recognition system in NIST LRE 2015 Raymond Ng, Mauro Nicolao, Oscar Saz,

BAT System Description for NIST LRE 2015 BUT+Agnitio+Torino Oldrich Plchot, Pavel Matejka, Radek

Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE 2015 Alan

University of the Basque Country (EHU) Systems for the NIST 2011 LRE Mikel Penagarikano, Amparo

NIST Gaithersburgs Approach to a Solar PV Array Project John.R.Bollinger@nist.gov 2 NIST

Federal Computer Security Managers Forum Meeting September 10, 2018 NIST Gaithersburg NIST

FEDERAL COMPUTER SECURITY MANAGERS FORUM MEETING FEBRUARY 6, 2020 NIST WEST SQUARE NIST

NIST Trustworthy Email Project High Assurance Domain Project Scott Rose, NIST scottr@nist.gov

Usefulness of Existing Iris Databases and Future Priorities George W. Quinn NIST gw@nist.gov

Language Recognition for Dialects and Closely Related Languages NIST OpenLRE 2015 G. Gelly,

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

NIST/DOE Workshop on Wide-Bandgap Power Electronics for Advanced Distribution Grids Al Hefner

Dual EC DRBG and NIST Crypto Process Review John Kelsey, NIST 1 Three Stories How Dual EC

Evaluation of an LSTM-RNN System in Different NIST Language Recognition Frameworks Ruben Zazo,

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

NIST Cybersecurity Framework Sean Sweeney, Information Security Officer 5/20/2015 Overview

CS 525M Mobile and Ubiquitous Computing Seminar RTS / CTS -Induced Congestion in Ad Hoc

Mining API Popularity 40 35 # projects using an API element 30 junit.framework.TestSuite

From Renaissance Scholars to Renaissance Communities: Learning and Education in the 21st Century

Barbara J. Bruno, CPC, CTS You change peoples lives for the better every day- but what about

Mesa Continuous Integration at Intel Mark Janes Clayton Craft Zune was SurfacePro for Likes

Dynamic Pricing Accel Clean DG Accel Demand Resp Accel Energy Eff Voluntary Load Response

Medium edium Acces ccess Cont ontrol ol Prot otocols ocols 1 Collision Avoidance

USenate Meeting CCF Report (attachment) November 1, 2018 Staff Senate Report Student

Sambuz

Useful Links

Newsletter

Mail Us