Representations of language in a model of visually grounded speech - PowerPoint PPT Presentation

Aug 21, 2022 •136 likes •379 views

Representations of language in a model of visually grounded speech signal Grzegorz Chrupaa Lieke Gelderloos Afra Alishahi Automatic Speech Recognition A major commercial success story in Language Technology Very heavy-handed

Representations of language in a model of visually grounded speech signal Grzegorz Chrupała Lieke Gelderloos Afra Alishahi
Automatic Speech Recognition A major commercial success story in Language Technology
Very heavy-handed supervision I can see you
Grounded speech perception
Data  Flickr8K Audio (Harwath & Glass 2015)  8K images, fjve audio captions each  MS COCO Synthetic Spoken Captions  300K images, fjve synthetically spoken captions each
Project speech and image to joint space a bird walks on a beam bears play in water
Image model Pre-classifcation layer BOAT BIRD BOAR
Speech model  Input: MFCC  Subsampling CNN  Recurrent Highway Network (Zilly et al 2016)  Attention
Model settings
Image retrieval Flickr8K MSCOCO Newer CNN architecture: Harwath et al 2016 (NIPS), Harwath and Glass 2017 (ACL)
Levels of representation  What aspects of sentences are encoded?  Which layers encode form, which encode meaning?  Auxiliary tasks (Adi et al 2017)
Form-related aspects Use activation vectors to decode  Utterance length in words  Presence of specifjc words
Number of words  Input  Activations for utterance  Model  Linear regression
Word presence  Input  Activations for utterance  MFCC for word  Model  MLP
Semantic aspects
Representational Similarity  Correlations between sets of pairwise similarities according to  Activations AND  Edit ops on written sentences  Human judgments (SICK dataset)
Homonym disambiguation
Follow-up work Afra Alishahi, Marie Barking and Grzegorz Chrupała. Encoding of phonology in a recurrent neural model of grounded speech Friday, session #4 at CoNLL
Conclusion Encodings of form and meaning emerge and evolve in hidden layers of stacked RHN listening to grounded speech Code: github.com/gchrupala/visually-grounded-speech Data: doi.org/10.5281/zenodo.400926
Error analysis  Text usually better  Speech better: a yellow and white birtd is in flight  Long descriptions  Misspellings Speech Text
Length
Text model  Convolution → word embedding  No attention

Recommend

Blind/Visually Impaired Silvia Ludena Veronica Sarabia Katie Stoddard Huong Vo Blind/Visually

Blind/Visually Impaired Silvia Ludena Veronica Sarabia Katie Stoddard Huong Vo Blind/Visually Impaired Any loss of ability to gather information by seeing might be considered a visual impairment Total Blindness - vision Legally blind - is the

1.19k views • 42 slides

61A Lecture 16 Announcements String Representations String Representations 4 String

61A Lecture 16 Announcements String Representations String Representations 4 String Representations An object value should behave like the kind of data it is meant to represent 4 String Representations An object value should behave like the

1.21k views • 96 slides

A Structured Language Approach to Teach Language and Literacy to Hearing and Visually Impaired

A Structured Language Approach to Teach Language and Literacy to Hearing and Visually Impaired Pupils with Autism Enid Wolf-Schein Rhonda Bachmann Christine Polys Ruth Rogge Purpose of Presentation This paper describes how Structured

515 views • 30 slides

Commission for the Blind and Visually Impaired MISSION STATEMENT The mission of the New

Introduction to The State of New Jersey Department of Human Services Commission for the Blind and Visually Impaired MISSION STATEMENT The mission of the New Jersey Commission for the Blind and Visually Impaired is to promote and provide

514 views • 33 slides

Visual disability Low vision 2015 Estimated blind people 2020 Visually impaired 285 M Blind

Visual disability Low vision 2015 Estimated blind people 2020 Visually impaired 285 M Blind 54 M Blind 39 M Global data souce: WHO, IBU See the world through the eyes of a visually impaired person Normal vision Cataract Glaucoma

562 views • 28 slides

Hyper-Connected Haptics technology for visually impaired in Human- Machine Collaboration Lead

Hyper-Connected Haptics technology for visually impaired in Human-Machine Collaboration Hyper-Connected Haptics technology for visually impaired in Human- Machine Collaboration Lead Researcher: Assoc. Prof. Dr. Yap Kian Meng Team Members:

46 views • 4 slides

Visually Grounded Meaning Representation Qi Huang Ryan Rock Outline 1. Motivation 2.

Visually Grounded Meaning Representation Qi Huang Ryan Rock Outline 1. Motivation 2. Visually Grounded Autoencoders 3. Constructing Visual Attributes 4. Constructing Textual Attributes 5. Experiment: Similarity 6. Experiment:

391 views • 35 slides

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

The dual space of a nilpotent Lie group Index sets and representations Index sets and representations Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets and representations June 22 2013 Index

1.49k views • 104 slides

Abstract rule representations in a Abstract rule representations in a bilinear model bilinear

Abstract rule representations in a Abstract rule representations in a bilinear model bilinear model Computational and Systems Neuroscience Conference, 2009 Kai Krueger and Peter Dayan Gatsby Computational Neuroscience Unit Introduction A key

441 views • 16 slides

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Language and Language and Language and Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6: CALL Language learning Language learning Language learning First language aquisition First language

342 views • 6 slides

Visually Grounded, Task-oriented Dialogue Elia Bruni Outline Language grounding Visual dialogue

Visually Grounded, Task-oriented Dialogue Elia Bruni Outline Language grounding Visual dialogue Q A Appendix: Current and future work Q A 2 Distributional semantics We found a cute, hairy wampimuk sleeping behind the tree 3

382 views • 20 slides

Proximity Language Model A Language Model beyond Bag of Words through Proximity Jinglei Zhao 1

Proximity Language Model A Language Model beyond Bag of Words through Proximity Jinglei Zhao 1 1,2 Yeogirl Yun 1 iZENEsoft, Inc. 2 Wisenut, Inc. 2 Wi t I Outline Introduction The proposed model Proximity Language Model

693 views • 27 slides

How the Marrakesh Model for the Visually Impaired Could Facilitate Access to Cross-Border

How the Marrakesh Model for the Visually Impaired Could Facilitate Access to Cross-Border Supplies of Patented Pharmaceuticals Jerome H. Reichman Bunyan S. Womble Professor of Law Duke University School of Law ATRIP 2018 1 The The D Doha

163 views • 15 slides

Negotiating Commercial Loan Covenants, Representations and Warranties Representations and

presents presents Negotiating Commercial Loan Covenants, Representations and Warranties Representations and Warranties Strategies for Lenders and Borrowers Drafting Loan Documentation A Live 90-Minute Teleconference/Webinar with Interactive

928 views • 55 slides

On SAT representations of XOR constraints (towards a theory of good SAT representations) Oliver

On SAT representations of XOR constraints (towards a theory of good SAT representations) Oliver Kullmann Computer Science Department Swansea University http://cs.swan.ac.uk/~csoliver/ Theoretical Foundations of Applied SAT Solving January 24,

722 views • 48 slides

New formula representations of high- New formula representations of high- latitude O + +

New formula representations of high- New formula representations of high- latitude O + + ionospheric ionospheric outflows for outflows for latitude O use in global magnetospheric magnetospheric use in global modeling modeling J. L.

398 views • 14 slides

M A R O A - F O R S Y T H C U S D # 2 MIDDLE SCHOOL FACILITY PLANNING Community Engagement

M A R O A - F O R S Y T H C U S D # 2 MIDDLE SCHOOL FACILITY PLANNING Community Engagement Session #2 10.23.19 Charge: To provide the BEST middle school educational environment, in the most cost-effective manner. COMMUNITY ENGAGEMENT

1.24k views • 85 slides

Overview What is Insta MF? Why Insta MF? How to open an Insta MF account? What are

Overview What is Insta MF? Why Insta MF? How to open an Insta MF account? What are the benefits of Insta MF? Marketing Plan / Branding Coming soon What is Insta MF? Easiest way to invest in Mutual Funds through Sharekhan

418 views • 27 slides

2019-20 Roading Programme 2019-20 Roading Programme Maintenance. Budget $5.3 m Sealed

2019-20 Roading Programme 2019-20 Roading Programme Maintenance. Budget $5.3 m Sealed pavement maintenance Unsealed pavement maintenance Routine drainage maintenance Structures maintenance Environmental maintenance Traffic

341 views • 11 slides

WELCOME Berry bridge & Northern Interchange Work since last time RMS to review

Foxground and Berry bypass WELCOME Berry bridge & Northern Interchange Work since last time RMS to review possibilities for graphically representing the view from the northern off ramp across to the new highway at the next meeting.

379 views • 7 slides

Mrs. Lipinski C10 About Mrs. Crystal Lipinski University of Illinois @ Urbana-Champaign

Mrs. Lipinski C10 About Mrs. Crystal Lipinski University of Illinois @ Urbana-Champaign B.S. Psychology, Minor in Spanish, May 1997 University of Southern California M.A. MFCC, May 1999 Arizona State University INCITE

256 views • 14 slides

Multimedia Data Processing on CIEL Arman Idani 14 Feb 2012 R202 Data Centric Networking

Multimedia Data Processing on CIEL Arman Idani 14 Feb 2012 R202 Data Centric Networking Machine Learning on DC Apache Mahout (library for Hadoop) Tons of independent codes Only on textual/graph content No multimedia input

282 views • 14 slides

NATIONAL CONSULTATION WORKSHOP ON THE DRAFT 12 FYP GUIDELINES (KEY COMPONENTS THIMPHU, 19 th

NATIONAL CONSULTATION WORKSHOP ON THE DRAFT 12 FYP GUIDELINES (KEY COMPONENTS THIMPHU, 19 th September, 2016 Gross National Happiness Commission Secretariat Outline 1.12 th Five Year Plan Conceptual Framework 2. Objective 3.National Key

508 views • 35 slides

COMPANY OVERVIEW Foundation: 1856 Turnover 2016: 2,6 Mio. (2017: appr. 3,0 Mio )

COMPANY OVERVIEW Foundation: 1856 Turnover 2016: 2,6 Mio. (2017: appr. 3,0 Mio ) Employees: 21 Overall mining 300.000 mt / year Thereof MICA & ASPOLIT: 20.000 mt / year ISO certification: ISO 9001 & 14001

289 views • 17 slides