Inferring phonemic classes from CNN activation maps using clustering - PowerPoint PPT Presentation

Inferring phonemic classes from CNN activation maps using clustering techniques Thomas Pellegrini, Sandrine Mouysset Universit´ e de Toulouse; UPS; IRIT; Toulouse, France thomas.pellegrini@irit.fr, sandrine.mouysset@irit.fr 1 / 19

Motivation Slide from Surya Ganguli, http://goo.gl/YmmqCg 2 / 19

Related work in speech: with DNNs Source : Nagamine et al. Exploring How Deep Neural Networks Form Phonemic Categories. INTERSPEECH 2015 3 / 19

Related work in speech: with DNNs ◮ Single nodes and populations of nodes in a layer are selective to phonetic features ◮ Node selectivity to phonetic features becomes more explicit in deeper layers 4 / 19

Related work in speech: with DNNs ◮ Single nodes and populations of nodes in a layer are selective to phonetic features ◮ Node selectivity to phonetic features becomes more explicit in deeper layers ◮ Do these findings still hold with convolutional neural networks? 5 / 19

CNN Model used in this study ◮ BREF corpus: 100 hours, 120 native French speakers ◮ Train / Dev sets: 90%/10%, 1.8M/150K samples ◮ PER: 20% → good accuracy, allows the analysis of the model 6 / 19

Study workflow Does a CNN encode phonemic categories such as a DNN does? ◮ 100 input samples per phone feed-forwarded through the network ◮ The outputs of each layer extracted and fed to either k-means or spectral clustering, with optional front-end dimension reduction ◮ Remark: 4-d tensors reshaped into 2-d matrices 7 / 19

Study workflow Does a CNN encode phonemic categories such as a DNN does? ◮ 100 input samples per phone feed-forwarded through the network ◮ The outputs of each layer extracted and fed to either k-means or spectral clustering, with optional front-end dimension reduction ◮ Remark: 4-d tensors reshaped into 2-d matrices ◮ Experiment 1: fixed number of 33 clusters (French phone set size) ◮ Experiment 2: optimal number of clusters determined automatically 8 / 19

Dimension reduction ◮ Principal Component Analysis (PCA) processed on the whole activation maps: the number of principal components that keeps at least 90% of the covariance matrix spectrum PCA projections of averaged activations http://goo.gl/bbuZn9 9 / 19

Dimension reduction ◮ t-Distributed Stochastic Neighbor Embedding (t-SNE): relies on random walks on neighborhood graphs to extract the local structure of the data and also reveal important global structure t-SNE projections of averaged activations http://goo.gl/4f3nZ3 10 / 19

Clustering methods Consider the two most popular clustering techniques based on either linear separation or non-linear separation: ◮ Kmeans computed with the Manhattan distance ◮ Spectral Clustering selects dominant eigenvectors of the Gaussian affinity matrix in order to build a low-dimensional data space wherein data points are grouped into clusters 11 / 19

Clustering methods Consider the two most popular clustering techniques based on either linear separation or non-linear separation: ◮ Kmeans computed with the Manhattan distance ◮ Spectral Clustering selects dominant eigenvectors of the Gaussian affinity matrix in order to build a low-dimensional data space wherein data points are grouped into clusters Choice of the number of clusters: ◮ Kmeans: within- and between-cluster sums of point-to-centro¨ ıd distances ◮ Spectral Clustering: within- and between-cluster affinity measure 12 / 19

Evaluation for experiment 1 Evaluate the resulting clusters with a fixed number of 33 clusters: tp tp + fn , F = 2 P . R tp P = tp + fp , R = P + R where tp , fp and fn respectively represent the number of true positives, false positives and false negatives 13 / 19

Experiment 1: 33 clusters → Phone-specific clusters become more explicit with layer depth 14 / 19

Experiment 2: optimal number of clusters 7 clusters with SC ◮ 3 clusters for the vowels: 1. 93% of the medium to open vowels [a], [E], [9] 2. 83% of the closed vowels: [y], [i], [e] 3. 60% of the nasal vowels /a � /, /o � /, /U � / ◮ 4 clusters for the consonants: 1. 92% of the nasal consonants: /n/, /m/ and /J/ 2. 81% of the fricatives: /S/, /s/, /f/, /Z/ 3. 76% of the rounded vowels /o/, /u/, /O/, /w/ 4. 68% of the plosives consonants: /p/, /t/, /k/, /b/, /d/, /g/ k-means: similar clusters → Broad phonetic classes are learned by the network 15 / 19

Average activation map example of layer ”conv1” ◮ Vowels ◮ This map encodes the mouth aperture (F1) but not the vowel anteriority (F2) 16 / 19

Average activation map example of layer ”conv1” ◮ Plosives 17 / 19

Conclusions and future work Findings with CNNs similar to previous work by Nagamine with DNNs: 1. Phone-specific clusters become more explicit with layer depth 2. Broad phonetic classes are learned by the network Ongoing/future work: ◮ Studying the maps that do not correspond to phonemic categories ◮ What is the ”gist” of the phone representations for a CNN? 18 / 19

Thank you! Q&A thomas.pellegrini@irit.fr 19 / 19

Inferring phonemic classes from CNN activation maps using clustering - PowerPoint PPT Presentation

Inferring phonemic classes from CNN activation maps using clustering techniques Thomas Pellegrini, Sandrine Mouysset Universit e de Toulouse; UPS; IRIT; Toulouse, France thomas.pellegrini@irit.fr, sandrine.mouysset@irit.fr 1 / 19 Motivation

Sub- & Cross-Phonemic Priming in Vowel Shadowing 1. Memory Types and Respresentation of

Singularities and Characteristic classes for Differentiable Maps

Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning Andrew Habib,

CSCI 104 C++ STL; Iterators, Maps, Sets Mark Redekopp David Kempe 2 Container Classes

Decoding the informative content of brain activation maps: state of the art, challenges and

Building Blocks of Reading: Effective Phonemic Awareness and Decoding Instruction Breda

Internalization, Dimerization, and Activation of CD38 during mNOX Activation: - and Ca 2+

On Inferring and Characterizing On Inferring and Characterizing Internet Routing Policies

Compilers Activation Records Alex Aiken Activation Records The information needed to manage

How Abstract Phonemic Categories Are Necessary for Coping With Speaker-Related Variation Anne

Phonics & Phonemic Awareness Strategies Making Learning Fun! Who am I? Elizabeth Skorohodov

ACTIVATION MARCH 26, 2020 ACTIVATION OF POWER Against COVID-19 ZOOM Room opens at 6:30 pm

CS371m - Mobile Computing Maps Using Google Maps This lecture focuses on using Google Maps

Patient activation and self-management of chronic conditions A/Prof Ben Harris-Roxas Director,

Phonemic Awareness Phonological Awareness Phonological awareness is ones sensitivity

Parent In Information Session 2016 Graphemic Phonemic Awareness Awareness What is THRASS?

Eastbourne Reserve Activation Project Project description The Eastbourne Reserve Activation

3 4 5 6 K Classes K Classes K Classes K Classes Student-Teacher Ratio 24 :1 72 96 120

Historical linguistics : the study of how language changes over time sound change: phonemic and

Sidan 1 The activation concept The activation concept 1. Wakes up, Son starts executing

Automorphism groups of edge-transitive maps (Pilsen 2016) Gareth Jones University of

4 CORE CLASSES HEALTH / CCR + 2 CLASSES OF YOUR CHOICE! ENCORE CLASSES PLEASE SELECT YOUR TOP

An enumerative relationship between maps and 4-regular maps Michael La Croix April 9, 2008 An

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Inferring phonemic classes from CNN activation maps using clustering - PowerPoint PPT Presentation

Inferring phonemic classes from CNN activation maps using clustering techniques Thomas Pellegrini, Sandrine Mouysset Universit e de Toulouse; UPS; IRIT; Toulouse, France thomas.pellegrini@irit.fr, sandrine.mouysset@irit.fr 1 / 19 Motivation

Sub- &amp; Cross-Phonemic Priming in Vowel Shadowing 1. Memory Types and Respresentation of

Singularities and Characteristic classes for Differentiable Maps

Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning Andrew Habib,

CSCI 104 C++ STL; Iterators, Maps, Sets Mark Redekopp David Kempe 2 Container Classes

Decoding the informative content of brain activation maps: state of the art, challenges and

Building Blocks of Reading: Effective Phonemic Awareness and Decoding Instruction Breda

Internalization, Dimerization, and Activation of CD38 during mNOX Activation: - and Ca 2+

On Inferring and Characterizing On Inferring and Characterizing Internet Routing Policies

Compilers Activation Records Alex Aiken Activation Records The information needed to manage

How Abstract Phonemic Categories Are Necessary for Coping With Speaker-Related Variation Anne

Phonics &amp; Phonemic Awareness Strategies Making Learning Fun! Who am I? Elizabeth Skorohodov

ACTIVATION MARCH 26, 2020 ACTIVATION OF POWER Against COVID-19 ZOOM Room opens at 6:30 pm

CS371m - Mobile Computing Maps Using Google Maps This lecture focuses on using Google Maps

Patient activation and self-management of chronic conditions A/Prof Ben Harris-Roxas Director,

Phonemic Awareness Phonological Awareness Phonological awareness is ones sensitivity

Parent In Information Session 2016 Graphemic Phonemic Awareness Awareness What is THRASS?

Eastbourne Reserve Activation Project Project description The Eastbourne Reserve Activation

3 4 5 6 K Classes K Classes K Classes K Classes Student-Teacher Ratio 24 :1 72 96 120

Historical linguistics : the study of how language changes over time sound change: phonemic and

Sidan 1 The activation concept The activation concept 1. Wakes up, Son starts executing

Automorphism groups of edge-transitive maps (Pilsen 2016) Gareth Jones University of

4 CORE CLASSES HEALTH / CCR + 2 CLASSES OF YOUR CHOICE! ENCORE CLASSES PLEASE SELECT YOUR TOP

An enumerative relationship between maps and 4-regular maps Michael La Croix April 9, 2008 An

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Sub- & Cross-Phonemic Priming in Vowel Shadowing 1. Memory Types and Respresentation of

Phonics & Phonemic Awareness Strategies Making Learning Fun! Who am I? Elizabeth Skorohodov