Out-of-set i-vector selection for open-set language identification - PowerPoint PPT Presentation

Out-of-set i-vector selection for open-set language identification Hamid Behravan, Tomi Kinnunen, Ville Hautamäki School of Computing University of Eastern Finland Odyssey 2016 June 21-24 Bilbao

Closed-set: a test segment corresponds to one of the known target (in-set) languages Target languages English Spanish Which language? Spanish Test uttarance Persian Finnish Swedish 2

Open-set: the language of a test segment might not be any of the in-set languages Target languages English Spanish Persian One of the target languages or Finnish Which language? Out-of-set model Swedish Test uttarance Non-target languages Unknown languages Out-of-set 3

One way to perform open-set LID is to train an out-of-set model LID: language identification 4

What are the good out-of-set candidates? In-set data  Out-of-set candidates should come + + + + from different linguistic language A + ++++ + + + + families × × + + + ++ + + × B × + + + + + + + + ×  Out-of-set candidates should be + + + + + + + + close to in-set languages; while ++++ others far away [Zhang and Hansen, 2014] Good candidates for out-set-data Q. Zhang and J. H. L. Hansen, “Training candidate selection for effective rejection in open-set language 6 identification,” in Proc. of SLT, 2014, pp. 384–389.

Out-of-set candidate detection methods (1) One-class SVM: Idea: Enclose data with an hypersphere and classify new data as normal ( + ) if it falls within the hypersphere and otherwise as out-of-set ( - ). + - 7

Out-of-set candidate detection methods (2) - K-nearest neighbour ( k NN): K=3 d1 d2 d3 - Distance to class mean 8

Proposed method: non-parametric Kolmogorov-Smirnov test Idea: Estimate whether two samples have the same underlying distribution by computing the maximum difference between their empirical cumulative distribution functions (ECDFs): Maximum difference (KS) 9

Adopting Kolmogorov-Smirnov test to our open-set LID task Goal: Giving each unlabeled i-vector an outlier score Taking average over all Compute ECDFs KS values

Computing outlier score for an unlabled i-vector Min . . . 11

KSEs within each language have values close to zero, whereas, they tend to values close to 1 for out-of-set data. Distribution of in-set and OOS KSE values for two different languages, a) Dari and b) French. 12

So far four methods were presented for out-of-set data selection 13

NIST language i-vector challenge 2015 corpus Distribution of training, development and test sets from the NIST 2015 language i-vector machine learning challenge. - 300 i-vectors for each of the 50 target languages - i-vectors are of dimensionality 400 - i-vectors are further post-processed by within-class covariance normalization (WCCN) and linear discriminant analysis (LDA) 14

Segmenting train data into three portions for out-of-set evaluation All portions are subsets of the original NIST 2015 LRE i-vector challenge training set. 15

Example of test utterance labeling for the evaluation of out-of-set (OOS) data detection task given multiple inset languages 16

KSE outperforms kNN and one-class SVM by 14% and 16% relative EER reductions, respectively. 17

Fusion of KSE to baseline OOS detection methods. Fusion of KSE to one-class SVM yields the best performance. 18

The lowest identification cost is 26.61, outperforming the NIST baseline system by 33% relative improvement. Data selected for out-of-set modeling Identification cost Random (1067) 32.11 Training (15000) 32.61 Development (6431) 31.23 Training + development (21431) 31.74 Proposed selection method (1067) 26.61 Closed-set (no OOS model) 37.23 - Results are reported from the NIST evaluation online system - Numbers in parentheses indicate amounts of selected data for OOS modeling - Back-end is based on SVM classifier - NIST Baseline result: 39.59

Open-set LID results for different out-of-set data selection methods. KSE outperforms the other methods. - The results are reported from the NIST evaluation online system. - Out of 1500 out-of-set data, 1012 are classified correctly as out-of-set using KSE. 20

A simple and effective technique to find out-of-set data in the i-vector space. Open-set LID 33% relative reduction in identification accuracy over the closed-set LID 21

Out-of-set i-vector selection for open-set language identification - PowerPoint PPT Presentation

Out-of-set i-vector selection for open-set language identification Hamid Behravan, Tomi Kinnunen, Ville Hautamki School of Computing University of Eastern Finland Odyssey 2016 June 21-24 Bilbao Closed-set: a test segment corresponds to one

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Commission: Out of touch, out of date, out of pocket April 2017 Commission: Out of touch, out of

School Selection Process For 2017-2018 The School Selection Process School Selection is open to

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

. Vector Graphics Introduction to Web Design Vector graphics contain geometric objects, such as

Class 7: Vector and scalar, components Vector operations in components Multiplying a vector with a

Vector Functions A vector function is simply a function whose codomain is R n . In other words,

Vector Field Topology 8-1 Ronald Peikert SciVis 2007 - Vector Field Topology Vector fields as

Se Sect ction ion 811 1 Pr Proj ojec ect t Ren ental al As Assi sistance ance Pr

Identification in Triangular Systems using Control Functions Maximilian Kasy Department of

Synthetic Benchmarks for Genetic Improvement Aymeric Blot Justyna Petke University College

Towards Computational Assessment of Idea Novelty Kai Wang 1 Boxiang Dong 2 Junjie Ma 1 1 School of

Shape optimization for interface identification in nonlocal models Volker Schulz and Christian

Identification Brady Neal causalcourse.com The magic of randomized experiments Frontdoor

Issued for Abuse Measuring the Underground Trade in Code Signing Certificates Kristin Kozk

Welcome to the Webinar Welcome to the Webinar To hide or unhide your Control Panel To hide or

Out-of-set i-vector selection for open-set language identification - PowerPoint PPT Presentation

Out-of-set i-vector selection for open-set language identification Hamid Behravan, Tomi Kinnunen, Ville Hautamki School of Computing University of Eastern Finland Odyssey 2016 June 21-24 Bilbao Closed-set: a test segment corresponds to one

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Commission: Out of touch, out of date, out of pocket April 2017 Commission: Out of touch, out of

School Selection Process For 2017-2018 The School Selection Process School Selection is open to

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

. Vector Graphics Introduction to Web Design Vector graphics contain geometric objects, such as

Class 7: Vector and scalar, components Vector operations in components Multiplying a vector with a

Vector Functions A vector function is simply a function whose codomain is R n . In other words,

Vector Field Topology 8-1 Ronald Peikert SciVis 2007 - Vector Field Topology Vector fields as

Se Sect ction ion 811 1 Pr Proj ojec ect t Ren ental al As Assi sistance ance Pr

Identification in Triangular Systems using Control Functions Maximilian Kasy Department of

Synthetic Benchmarks for Genetic Improvement Aymeric Blot Justyna Petke University College

Towards Computational Assessment of Idea Novelty Kai Wang 1 Boxiang Dong 2 Junjie Ma 1 1 School of

Shape optimization for interface identification in nonlocal models Volker Schulz and Christian

Identification Brady Neal causalcourse.com The magic of randomized experiments Frontdoor

Issued for Abuse Measuring the Underground Trade in Code Signing Certificates Kristin Kozk

Welcome to the Webinar Welcome to the Webinar To hide or unhide your Control Panel To hide or

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?