Improving Unsupervised Acoustic Word Embeddings using Speaker and - PowerPoint PPT Presentation

Improving Unsupervised Acoustic Word Embeddings using Speaker and Gender Information Lisa van Staden, Herman Kamper 31 January 2020

Zero-Resource Speech Processing Popular methods for speech processing rely on transcribed speech. Obtaining transcriptions is expensive and not always possible. 1

Tasks in Zero-Resource Processing We don’t always need to predict text labels: • Query-by-Example Search: search speech using speech. • Unsupervised Term Discovery: Discover repeating patterns in speech. 2

Speech Segment Comparison These tasks require comparing speech segments. The conventional method is Dynamic Time Warping. • Computationally expensive. 3

Acoustic Word Embeddings We want to map speech to these representation without using labels. 4

Speaker and Gender Information Acoustic properties of speech from difgerent speakers/genders difger. We want embeddings to be robust. 5 cat pan pan pun cat bat Speaker A Speaker B Female Male Male

RNN (Correspondence) Autoencoder 6 x 1 x 2 x T GRU GRU GRU GRU embedding GRU GRU Encoder Decoder x 1 ' / y 1 ' x 2 ' / y 2 ' x T ' / y T '

Speaker/Gender Conditioning 7 Speaker\Gender x 1 x 2 x T GRU GRU GRU GRU embedding GRU GRU Encoder Decoder x 1 ' / y 1 ' x 2 ' / y 2 ' x T ' / y T '

8 Adversarial Training Turn A X Encoder Turn B p Embedding Classifier Decoder X ' /Y '

Speaker/Gender Classifier 9 z p Linear ReLU Dropout Softmax

Evaluating Quality of AWEs Use the same-difgerent task to evaluate AWEs: • Measure if AWEs are similar given a threshold. • Calculate area under Precision vs Recall curve. 10

Results 11 30.49 30.18 29.72 English 30 28.98 Xitsonga 25.53 25.38 25.19 25 22.72 22.52 Average Precision (%) 20 15 12.78 11.65 11.22 10 5 0 AE-Baseline AE-Top-1 AE-Top-2 CAE-Baseline CAE-Top-1 CAE-Top-2 Model Type

Evaluate Speaker and Gender Predictability Analyse if the speaker and gender information has decreased: • Use speaker/gender classifier model. • Evaluate accuracy. 12

Average Precision vs Speaker/Gender Predictability AE CAE 13 27.0 32.0 26.8 31.5 Average Precision Average Precision 26.6 31.0 26.4 26.2 30.5 26.0 30.0 72 74 76 78 80 82 84 68 70 72 74 76 78 80 82 84 Speaker Predictability Speaker Predictability 27.0 32.0 26.8 31.5 Average Precision Average Precision 26.6 31.0 26.4 26.2 30.5 26.0 30.0 89.5 90.0 90.5 91.0 91.5 92.0 92.5 93.0 93.5 88 89 90 91 92 93 Gender Predictability Gender Predictability

Conclusions • English data shows marginal improvement by incorporating speaker information. • Best Xitsonga model shows 22% improvement. • It’s diffjcult to remove speaker and gender information. • Future work ... 14

Improving Unsupervised Acoustic Word Embeddings using Speaker and - PowerPoint PPT Presentation

Improving Unsupervised Acoustic Word Embeddings using Speaker and Gender Information Lisa van Staden, Herman Kamper 31 January 2020 Zero-Resource Speech Processing Popular methods for speech processing rely on transcribed speech. Obtaining

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Unsupervised speech processing using acoustic word embeddings Herman Kamper School of

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Evaluation methods for unsupervised word embeddings EMNLP 2015 Tobias Schnabel, Igor Labutov,

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Deep convolutional acoustic word embeddings using word-pair side information Herman Kamper 1 ,

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM WORD EMBEDDINGS Olivier Ferret

Ina Ganguli a , Ricardo Hausmann b,c , Martina Viarengo b,d a University of Massachusetts Amherst b

Gender Classification with Support vector machines (SVMs) Support Vector Machines The 3

Chapter 03 Segmentation, Targeting & Positioning (STP) Resource Person MATHISHA

Lattice Alignment Align must be linear can be random reference signals => coarse

Choice with multiple alternatives Specification of the deterministic part Michel Bierlaire

Concepts and Algorithms of Scientific and Visual Computing Image Segmentation CS448J,

Universal Dependencies Joakim Nivre, Dan Zeman, Filip Ginter, Sampo Pyysalo, Chris Manning,

Data Mining 2019 Classification Trees (1) Ad Feelders Universiteit Utrecht Ad Feelders (

Sambuz

Useful Links

Newsletter

Mail Us

Improving Unsupervised Acoustic Word Embeddings using Speaker and - PowerPoint PPT Presentation

Improving Unsupervised Acoustic Word Embeddings using Speaker and Gender Information Lisa van Staden, Herman Kamper 31 January 2020 Zero-Resource Speech Processing Popular methods for speech processing rely on transcribed speech. Obtaining

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Unsupervised speech processing using acoustic word embeddings Herman Kamper School of

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Evaluation methods for unsupervised word embeddings EMNLP 2015 Tobias Schnabel, Igor Labutov,

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Deep convolutional acoustic word embeddings using word-pair side information Herman Kamper 1 ,

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM WORD EMBEDDINGS Olivier Ferret

Ina Ganguli a , Ricardo Hausmann b,c , Martina Viarengo b,d a University of Massachusetts Amherst b

Gender Classification with Support vector machines (SVMs) Support Vector Machines The 3

Chapter 03 Segmentation, Targeting &amp; Positioning (STP) Resource Person MATHISHA

Lattice Alignment Align must be linear can be random reference signals =&gt; coarse

Choice with multiple alternatives Specification of the deterministic part Michel Bierlaire

Concepts and Algorithms of Scientific and Visual Computing Image Segmentation CS448J,

Universal Dependencies Joakim Nivre, Dan Zeman, Filip Ginter, Sampo Pyysalo, Chris Manning,

Data Mining 2019 Classification Trees (1) Ad Feelders Universiteit Utrecht Ad Feelders (

Sambuz

Useful Links

Newsletter

Mail Us

Chapter 03 Segmentation, Targeting & Positioning (STP) Resource Person MATHISHA

Lattice Alignment Align must be linear can be random reference signals => coarse