The MITLL NIST LRE 2015 Language Recognition System* Contributors in alphabetical order Najim Dehak**, Elizabeth Godoy, Douglas Reynolds Fred Richardson, Stephen Shum**, Elliot Singer Doug Sturim, Pedro Torres-Carrasquillo ** Johns Hopkins University ***Spoken Language System Group, MIT-CSAIL * This work was sponsored by the Department of Defense under Air Force contract F19628-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.
Outline • Systems • Development Data • Evaluation Results • Observations Odyssey 2016 PAT 2
LRE15 Systems - I • Classic I-Vector systems – IVEC: cep + sdc features – PITCH1: cep + sdc + log_F0 + D log_F0 features • ASR DNN / I-Vector systems – BNF1, BNF2: DNN bottleneck features – PITCH2: DNN bottleneck + log_F0+ D log_F0 features – STATS: DNN posteriors and cep+sdc features • ASR DNN / GMM-MMI – MMI: GMM-MMI classifier using DNN bottleneck features • Multilingual ASR DNN / I-Vector system (Open data task) – MLBNF: 5 Babel language DNN bottleneck features Odyssey 2016 All ivec systems scored with LDA+WCCN or WCCN. PAT 3
LRE15 Systems - II • Unsupervised Unit Discovery DNN / I-Vector system – BAUD: DNN bottleneck features • DNN Counts Subspace Multinomial Model systems – CNT1: Counts from ASR DNN layers – CNT2: Counts from LID DNN layers – CNT3: Joint subspace of CNT1 and CNT2 counts • Calibration and Fusion – Multiclass calibration followed by linear fusion – Duration weighting on system scores – Per system calibration: MMI-trained Gaussian – Linear fusion optimized with logistic regression Odyssey 2016 All ivec systems scored with LDA+WCCN or WCCN. PAT 4
Outline • Systems • Development Data • Evaluation Results • Observations Odyssey 2016 PAT 5
Fixed Development Data Preparation • Randomly divided the development data by file count – 60% train – 40% test • Augmented both train and test sets with variable duration segmentation (uniform distribution between 3-30 secs) – Allowed for duration calibration in test – Found that duration augmentation of train data improved performance – Other forms of augmentation (warping pitch, spectrum, speed) did not show any appreciable gains • For submissions, calibration and fusion trained using scores from train+test sets Odyssey 2016 PAT 6
Open Development Data Preparation • Found other data sources for all languages – LRE07, 09, 11, OHSU, OGI-22, Fisher, Callfriend, Babel, Ahumada, MI5-UK, Appen, Qatar-Dialect, Kalaka – Types of speech: CTS, BNBS, BWBS – All data audited • Extra data used for language model training – Used fixed data test set for performance estimation • The multi-lingual DNN was the only system to explicitly rely on using extra data • During development found that using all the extra data hurt performance – Only 3 of the languages contributed to improved performance (Brazilian Portuguese, British English, and Arabic MSA) Odyssey 2016 CTS = Conversational Telephone Speech; BNBS = Broadcast Narrow Band Speech; PAT BWBS = Broadcast Wide Band Speech 7
Development Results Primary Systems 0.06 0.05 arabic 0.04 COST chinese english 0.03 french iberian 0.02 slavic average 0.01 0 Fixed Primary Open Primary Odyssey 2016 PAT 8
Outline • Systems • Development Data • Evaluation Results • Observations Odyssey 2016 PAT 9
Fixed Primary Component Breakout • Primary not far from oracle fusion • Unsupervised BAUD does almost as well as single best ASR DNN BNF1 0.3 Best 0.25 Single System 0.2 0.176 0.173 Cost Average 0.15 Sans Français 0.093 0.089 0.1 0.05 0 BAUD CNT1 BNF1 PITCH1 STATS PRIMARY Oracle Odyssey 2016 PAT 10
Fixed Primary Per-Cluster Breakout • We have analysis showing that BNBS vs. CTS is a major effect in French cluster • Arabic and Iberian clusters have the highest costs after French – Language / source?* 0.25 0.2 0.15 Cost 0.1 0.05 0 arabic chinese english french iberian slavic average avg_nofr *MSA and Portuguese are least confusable languages in their Odyssey 2016 PAT clusters (both dominated by BNBS) 11
French Cluster Analysis • Type (BNBS vs. CTS) appears to be a large factor in dev/eval mismatch WAF HAITIAN CTS BNBS HAITIAN WAF Odyssey 2016 PAT 12
Slavic Cluster Analysis • Type (BNBS vs. CTS) is a factor but does not affect language separation BNBS BNBS RUSSIAN CTS CTS Odyssey 2016 PAT 13
Open Primary Component Breakout • Minor improvement using extra data • Multilingual BNF has slight gain over BNF1 0.3 Best 0.25 Single System 0.2 0.169 0.167 Cost Average 0.15 Sans Français 0.1 0.086 0.084 0.05 0 CNT1 BNF1 PITCH1 MLBNF STATS PRIMARY Oracle Odyssey 2016 PAT 14
Open Task Adding Data to Arabic • Looked at effect of Source Languages Audit Files Speech (hrs) adding extra data to Iraqi Appen Appen 2012 121.90 Levantine Arabic languages Fisher Levantine LDC 1572 120.69 Iraqi Levantine LRE11 LDC 2727 29.89 Maghrebi MSA Egyptian Levantine Mechanical Qatar 20056 122.91 Maghrebi Turk MSA • Bottom line: extra data System Cost provided little gain or hurt Baseline 0.2292 performance on eval Baseline+Appen 0.2235 Baseline+Fisher 0.2255 • Post-eval? Baseline+LRE11 0.2155 Baseline+Qatar 0.2604 Odyssey 2016 PAT 15
Post-eval Experiments Highlights • Additional data – After revisiting open-set submission, training with all data available would have reduced “French” cluster error • Multilingual – Work in progress but reductions observed for some configurations that include a more diverse set of languages Odyssey 2016 PAT 16
Post-eval Experiments Highlights • Spanish errors – 50 samples chosen randomly – Main issues present on these errors • Cuban females (10) • Little speech content (5-7) • English errors – 50 samples chosen randomly – Main issues present on these errors • 80% errors do not involved Indian English • 5 files with no or little speech content Odyssey 2016 PAT 17
Observations • DNN Bottleneck features used in an i-vector system continues to be best single system • Fusion with count (phonotactic) systems provides moderate gains • Possible factors affecting performance this year – Language confusability (amplified by short durations) – Source mismatch (BNBS vs. CTS) • Adding more data did not solve the problem… on dev set • Path forward – Need to better focus on robustness over wider conditions vs. incremental improvements over narrow conditions Odyssey 2016 PAT 18
Odyssey 2016 PAT 19
Fixed Development Data Speech Speech CODE LANGUAGE # Cuts CODE LANGUAGE # Cuts (hrs) (hrs) ara-acm Iraqi 2206 75.59 por-brz Braz. Port. 1838 5.96 ara-apc Levantine 4073 266.67 qsl-pol Polish 695 32.14 ara-arb MSA 912 8.18 qsl-rus Russian 2021 37.80 ara-ary Maghrebi 919 46.91 spa-car Carib. Spa. 194 30.59 ara-arz Egyptian 440 97.27 spa-eur Eur. Spa. 366 8.55 eng-gbr British Eng. 147 2.10 spa-lac Lat. Am. Spa. 160 15.30 eng-sas Indian Eng. 1689 25.37 zho-cdo Min 209 6.46 eng-usg Amer. Eng. 2448 165.92 zho-cmn Mandarin 4131 200.70 fre-hat Hatian Cr. 2192 110.79 zho-wuu Wu 234 10.36 fre-waf West Afr. Fr. 1229 7.02 zho-yue Cantonese 2382 123.61 Odyssey 2016 PAT 20
Open Development Data Preparation Sources Type Cuts LANGUAGE Arabic.egyptian None Arabic.iraqi LRE11, Appen CTS 1788 Arabic.levantine LRE11, Fisher, Appen CTS 3623 Arabic.maghrebi LRE11 BNBS 505 Arabic.msa LRE11 BNBS 506 Chinese.cantonese LRE09, Babel CTS, BNBS 2359 Chinese.mandarin LRE05-07-09-11, Callfriend, CTS, BNBS 3693 OHSU Chinese.minnan LRE07-09 CTS 168 Chinese.wu LRE07-09 CTS 189 Spanish.caribbean LRE07 CTS 74 Spanish.european Ahumada CTS 328 Spanish.latinamerican OHSU (Mexican) CTS 130 Portuguese.brazilian LRE09, OGI-22, VOA CTS, BNBS 1791 scrape English.american LRE05-07-09-11, Callfriend, CTS 2088 OHSU English.indian LRE07-09-11, OHSU, OGI- CTS 1271 22 English.british UK-MI5 SID CTS 148 Polish LRE11 CTS, BNBS 208 Russian LRE07-09-11, Callfriend CTS, BNBS 1551 Missing West African French LRE09, VOA scrape BNBS 1195 Qatar and Kalaka Haitian Creole Babel, VOA scrape CTS, BNBS 1869 Odyssey 2016 PAT 21
Calibration and Fusion Backend Duration System System Fusion Norm Calibration s 1,1 priors LL 1,1 Detector Weight Duration MMI … … w 1 Scale Gaussian 1 s 1,M LL 1,M … LL 1 P 1 s k,1 LL k,1 S Bayes’ Detector Weight … … Duration MMI … … U w k k Scale Gaussian Rule s k,M M LL M P M LL k,M … s K,1 LL K,1 Detector Duration MMI Weight … … K Scale Gaussian w K s K,M LL K,M N # frames (N) d k a N k Odyssey 2016 PAT 22
Recommend
More recommend