Investigating neural representations of spoken language Grzegorz - PowerPoint PPT Presentation

Investigating neural representations of spoken language Grzegorz Chrupała

In collaboration with Afra Alishahi Lieke Gelderloos Marie Barking Mark van der Laan

Automatic Speech Recognition A major success story in Language Technology

Large amounts of fjne- grained supervision I can see you

Grounded speech perception

Modeling spoken language  Induce representations between  auditory signal, and  visual semantics  Understand :  What representations emerge in models?  How much do they match linguistic analyses?  Which parts of the architecture encode what?

Datasets  Flickr8K Audio Caption Corpus  8K images, fjve audio captions each  MS COCO Synthetic Spoken Captions  300K images, fjve synthetically spoken captions each  Places Audio Caption 400K Corpus  400K spoken captions

Project speech and image to joint space a bird walks on a beam bears play in water

a bird walks on a beam

Image retrieval Grzegorz Chrupała, Lieke Gelderloos and Afra Alishahi. 2017. Representations of language in a model of visually grounded speech signal. In ACL.

Further advances  Harwath, D., Torralba, A., & Glass, J. (2016). Flickr8K Unsupervised learning of spoken language with visual context. In NeurIPS.  Harwath, D., & Glass, J. (2017). Learning Word-Like Units from Joint Audio-Visual Analysis. In ACL.  Chrupała, G. (2019). Symbolic inductive bias for visually grounded learning of spoken language. In ACL.  Merkx, D., Frank, S. L., & Ernestus, M. (2019). Language learning using Speech to Image retrieval. In Interspeech.  Ilharco, G., Zhang, Y., & Baldridge, J. (2019). Large-scale representation learning from visually grounded untranscribed speech. In CoNLL.  Havard, W. N., Chevrot, J. P., & Besacier, L. (2019). Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech. In CoNLL

Levels of representation  What aspects of sentences are encoded?  Which parts of the architecture encode what?

Homonym disambiguation  Utterances with homonyms  pair/pear, waste/waist ...  Decide which meaning was present in an utterance.  Easier if meaning is represented, harder if only form. Afra Alishahi, Marie Barking and Grzegorz Chrupała. 2017. Encoding of phonology in a recurrent neural model of grounded speech. In CoNLL.

Synthetic COCO

Synonym discrimination  Disentangle phonological form and semantics.  Discriminate between synonyms in identical context: A girl looking at a photo. A girl looking at a picture.  How invariant to phonological form is a representation? Afra Alishahi, Marie Barking and Grzegorz Chrupała. 2017. Encoding of phonology in a recurrent neural model of grounded speech. In CoNLL.

Synthetic COCO

Phoneme discrimination ABX task (Schatz et al. 2013) A: /si/ B: /mi/ X: /me / Afra Alishahi, Marie Barking and Grzegorz Chrupała. 2017. Encoding of phonology in a recurrent neural model of grounded speech. In CoNLL.

Synthetic COCO ABX Especially challenging when the target (B) and distractor (A) belong to same phoneme class.

Interim summary  Bottom layers encode form, top layers meaning  Even top layers are not completely form-invariant

Caveats

Synthetic COCO Phoneme decoding Afra Alishahi, Marie Barking and Grzegorz Chrupała. 2017. Encoding of phonology in a recurrent neural model of grounded speech. In CoNLL.

Belinkov, Y., Ali, A., & Glass, J. (2019). Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition. In Interspeech.

Flickr8K Phoneme decoding from random networks

Representational Similarity Analysis Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008). Representational similarity analysis- connecting the branches of systems neuroscience. Frontiers in systems neuroscience , 2, 4.

RSA: an example RSA score: correlation between Sim A and Sim B

Structured Spaces  RSA applies given a similarity/distance metric WITHIN spaces A and B .  No need for a metric BETWEEN A and B  A can be a vector space, while B can be a space of strings/trees/graphs .  For application to syntax, see:  Chrupała, G., & Alishahi, A. (2019). Correlating neural and symbolic representations of language. In ACL

Phoneme with RSA  A – cosine distances between activation vectors  B – edit distances between phonemic transcriptions

Pooling Parameters W, u optimized with respect to RSA scores.

Flickr8K Phonemes with RSA

Conclusion, again  Baselines and sanity checks are a must.  Diagnostic classifjers may lack sensitivity to details of representation.  Multiple analytical approaches to cross- check results.

BlackboxNLP  Workshop on Analyzing and Interpreting Neural Networks for NLP  https:/ /blackboxnlp.github.io  2018: EMNLP in Brussels  2019: ACL, Florence  2020?

References  Grzegorz Chrupała, Lieke Gelderloos and Afra Alishahi. 2017. Representations of language in a model of visually grounded speech signal. In ACL.  Afra Alishahi, Marie Barking and Grzegorz Chrupała. 2017. Encoding of phonology in a recurrent neural model of grounded speech. In CoNLL.  Grzegorz Chrupała and Afra Alishahi. 2019. Correlating neural and symbolic representations of language. In ACL.  Grzegorz Chrupała. 2019. Symbolic inductive bias for visually grounded learning of spoken language. In ACL.

Extras

Model settings

Representational Similarity  Correlations between sets of pairwise similarities according to  Activations VS  Edit ops on text  Human judgments (SICK dataset)

Decoding speaker attributes (Flickr8K) gender identity

Decoding speaker attributes  Substantial amount of speaker information in top layers  Especially gender  Idea: disentangle semantics from speaker info?

RSA + Tree Kernels  Infersent (Conneau 2017)  trained on NLI  BERT (Devlin et al. 2018)  trained on cloze and next-sentence classifjcation  Random versions of these

BERT layers

Investigating neural representations of spoken language Grzegorz - PowerPoint PPT Presentation

Investigating neural representations of spoken language Grzegorz Chrupaa In collaboration with Afra Alishahi Lieke Gelderloos Marie Barking Mark van der Laan Automatic Speech Recognition A major success story in Language Technology

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Spoken Language Structure Berlin Chen 2004 References: - X. Huang et. al., Spoken Language

Spoken Language Structure Berlin Chen 2003 References: - X. Huang et. al., Spoken Language

Defining EBCL descriptors for Reception Spoken and Production Spoken Federica Casalin

61A Lecture 16 Announcements String Representations String Representations 4 String

Speech Signal Representations Berlin Chen 2004 References: 1. X. Huang et. al., Spoken Language

Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language

Speech Signal Representations Berlin Chen 2004 References: 1. X. Huang et. al., Spoken Language

Using Machine Learning to Study the Neural Representations of Language Meanings Tom M. Mitchell

Spoken and Sign Languages Spoken and Sign Languages A Cross Modal Study Purushottam Kar Achla

STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken

Uncertainty in Spoken Uncertainty in Spoken Multimodal - speakers have intentions - speech,

THE SPOKEN BLESSING Numbers 6:22 27 Since the start of human history, the spoken blessing

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Da t a pl a n e f o r SUBSCRIBER GATEWAY OVERVIEW & CHALLENGES Natarajan Venkataraman,

Multi-Task Learning for Improved Discriminative Training in SMT Patrick Simianer and Stefan

MOBILE COMPUTING CSE 40814/60814 Fall 2015 Bluetooth Basic idea Universal radio

Li Xiong CS573 Data Privacy and Security

Business Meeting Andreas Maletti FSMNLP 2015 Dsseldorf, Germany June 22, 2015 Andreas

Fast Byte-Granularity Software Fault Isolation Manuel Costa Microsoft Research, Cambridge Joint

Section 30: Knee Biomechanics Movement and Forces 30-1 From: Iatridis 30-2 From: Iatridis

ACL Injury Prevention App l Web App, which helps assess the

Investigating neural representations of spoken language Grzegorz - PowerPoint PPT Presentation

Investigating neural representations of spoken language Grzegorz Chrupaa In collaboration with Afra Alishahi Lieke Gelderloos Marie Barking Mark van der Laan Automatic Speech Recognition A major success story in Language Technology

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Spoken Language Structure Berlin Chen 2004 References: - X. Huang et. al., Spoken Language

Spoken Language Structure Berlin Chen 2003 References: - X. Huang et. al., Spoken Language

Defining EBCL descriptors for Reception Spoken and Production Spoken Federica Casalin

61A Lecture 16 Announcements String Representations String Representations 4 String

Speech Signal Representations Berlin Chen 2004 References: 1. X. Huang et. al., Spoken Language

Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language

Speech Signal Representations Berlin Chen 2004 References: 1. X. Huang et. al., Spoken Language

Using Machine Learning to Study the Neural Representations of Language Meanings Tom M. Mitchell

Spoken and Sign Languages Spoken and Sign Languages A Cross Modal Study Purushottam Kar Achla

STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken

Uncertainty in Spoken Uncertainty in Spoken Multimodal - speakers have intentions - speech,

THE SPOKEN BLESSING Numbers 6:22 27 Since the start of human history, the spoken blessing

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Da t a pl a n e f o r SUBSCRIBER GATEWAY OVERVIEW &amp; CHALLENGES Natarajan Venkataraman,

Multi-Task Learning for Improved Discriminative Training in SMT Patrick Simianer and Stefan

MOBILE COMPUTING CSE 40814/60814 Fall 2015 Bluetooth Basic idea Universal radio

Li Xiong CS573 Data Privacy and Security

Business Meeting Andreas Maletti FSMNLP 2015 Dsseldorf, Germany June 22, 2015 Andreas

Fast Byte-Granularity Software Fault Isolation Manuel Costa Microsoft Research, Cambridge Joint

Section 30: Knee Biomechanics Movement and Forces 30-1 From: Iatridis 30-2 From: Iatridis

ACL Injury Prevention App l Web App, which helps assess the

Da t a pl a n e f o r SUBSCRIBER GATEWAY OVERVIEW & CHALLENGES Natarajan Venkataraman,