Merging language and vision modalities: Last years work Raffaella - PowerPoint PPT Presentation

Merging language and vision modalities: Last years work Raffaella Bernardi University of Trento November, 2017 Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 1 / 49

Last time Last time we have introduced the first computational work on Language and Vision integration. Today, we look at new tasks that have been proposed more recently. Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 2 / 49

Cross Modal Mapping Layout Cross Modal Mapping 1 Visual Phrases 2 Tasks 3 Intermezzo 4 Find their limitations 5 New task: Visual Reasoning 6 Others 7 Conclusion 8 Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 3 / 49

Cross Modal Mapping Cross-modal mapping: Generalization Angeliki Lazaridou, Elia Bruni and Marco Baroni. (ACL 2014) Transfering knowledge acquired in one modality to the other one. Learn to project one space into the other, from the visual space onto the language space. Two tasks: Zero-Shot Learning Fast Mapping In both tasks, the projected vector of the unseen concept is labeled with the word associated to its cosine-based nearest neighbor vector in the corresponding semantic space. Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 4 / 49

Cross Modal Mapping Zero-Shot Learning: the task Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 5 / 49

Cross Modal Mapping Zero-Shot Learning Learn a classifier X → Y , s.t. X are images, Y are language vectors. Label an image of an unseen concept with the word associated to its cosine-based nearest neighbor vector in the language space. For a subset of concepts (e.g., a set of animals, a set of vehicles), we possess information related to both their linguistic and visual representations. During training, this cross-modal vocabulary is used to induce a projection function, which intuitively represents a mapping between visual and linguistic dimensions. Thus, this function, given a visual vector, returns its corresponding linguistic representation. At test time, the system is presented with a previously unseen object (e.g., wampimuk). This object is projected onto the linguistic space and associated with the word label of the nearest neighbor in that space (containing all the unseen and seen concepts). Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 6 / 49

Cross Modal Mapping Zero-shot leaning: linear mapping Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 7 / 49

Cross Modal Mapping Zero-shot leaning: example Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 8 / 49

Cross Modal Mapping Dataset Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 9 / 49

Cross Modal Mapping Cross Modal Mapping Fast Mapping Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 10 / 49

Cross Modal Mapping Fast Mapping Learn a word vector from a few sentences , associate it to the referring image exploiting cosine-based neighbor vector in the visual space. The fast mapping setting can be seen as a special case of the zero-shot task. Whereas for the latter our system assumes that all concepts have rich linguistic representations (i.e., representations estimated from a large corpus), in the case of the former, new concepts are assumed to be encounted in a limited linguistic context and therefore lacking rich linguistic representations. This is operationalized by constructing the text-based vector for these concepts from a context of just a few occurrences. In this way, we simulate the first encounter of a learner with a concept that is new in both visual and linguistic terms. New paper: Multimodal semantic learning from child-directed input Angeliki Lazaridou, Grzegorz Chrupala, Raquel Fernandez and Marco Baroni NAACL 2016 Short http://clic.cimec.unitn.it/marco/publications/ lazaridou-etal-multimodal-learning-from-cdi-naacl2016.pdf Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 11 / 49

Merging language and vision modalities: Last years work Raffaella - PowerPoint PPT Presentation

Merging language and vision modalities: Last years work Raffaella Bernardi University of Trento November, 2017 Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 1 / 49 Last time

Programming Modalities Modalities of Programming In 2020, there are three prevalent modalities

Optimal Merging in Quantum k -xor and k -sum Algorithms Mara Naya-Plasencia, Andr

Comparison Based Merging Upper and Lower bounds EMADS Fall 2003: Comparison Based Merging Page 1

Track Filtering/Quality/Merging A proposal for data format of track quality and track merging in

Parton Showers and Matching/Merging Lecture 2 of 2: Matching/Merging & Non-Perturbative

Merging DataFrames Merging DataFrames with pandas Population DataFrame In [1]: import pandas as

Marie-France Bellin Technical innovations in existing modalities New imaging modalities

69a History of Massage: Modalities 69a History of Massage: Modalities Class Outline 5

DFG Graduiertenkolleg 1564 (Research Training Group 1564) Imaging New Modalities Multimodal

69a History of Massage: Modalities 69a History of Massage: Modalities Class Outline 5 minutes

Modalities in HoTT Egbert Rijke, Mike Shulman, Bas Spitters 1706.07526 Higher toposes Internal

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Quantum Merging Algorithms Mara Naya-Plasencia 2 , Andr Schrottenloher 2 Joint work with Andr

Buying, Selling, Merging Buying, Selling, Merging and Valuation and Valuation Sponsored by: US

Yet Another Approach To Model Merging merge and diff relations and very short version rules

Identity Linking Identity Linking An Alternative to Merging An Alternative to Merging A

Statistics of non-equilibrium systems from the theory of sample-space-reducing processes Stefan

ESELU Term Project Assignment Bernhard Frmel and Denise Ratasich Institut fr Technische

No Professors Allowed: The Scoop on CS Course Offerings for Fall 2013 April 2, 2013 WACM

Sixteen words Shakespeare did not invent gloomy http://www.oed.com/view/Entry/79096 David

Pipeline Safety 20 Years After The Bellingham Tragedy Whats Changed? Are We Safer? New

Pipeline Safety Embrace the Conflict Every Conflict is an

Bringing the Gathering Home How do we make this more than a five day experience? Before the

Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer (Me) Juncen Li 1

Merging language and vision modalities: Last years work Raffaella - PowerPoint PPT Presentation

Merging language and vision modalities: Last years work Raffaella Bernardi University of Trento November, 2017 Raffaella Bernardi (University of Trento) Merging language and vision modalities: Last years work November, 2017 1 / 49 Last time

Programming Modalities Modalities of Programming In 2020, there are three prevalent modalities

Optimal Merging in Quantum k -xor and k -sum Algorithms Mara Naya-Plasencia, Andr

Comparison Based Merging Upper and Lower bounds EMADS Fall 2003: Comparison Based Merging Page 1

Track Filtering/Quality/Merging A proposal for data format of track quality and track merging in

Parton Showers and Matching/Merging Lecture 2 of 2: Matching/Merging &amp; Non-Perturbative

Merging DataFrames Merging DataFrames with pandas Population DataFrame In [1]: import pandas as

Marie-France Bellin Technical innovations in existing modalities New imaging modalities

69a History of Massage: Modalities 69a History of Massage: Modalities Class Outline 5

DFG Graduiertenkolleg 1564 (Research Training Group 1564) Imaging New Modalities Multimodal

69a History of Massage: Modalities 69a History of Massage: Modalities Class Outline 5 minutes

Modalities in HoTT Egbert Rijke, Mike Shulman, Bas Spitters 1706.07526 Higher toposes Internal

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Quantum Merging Algorithms Mara Naya-Plasencia 2 , Andr Schrottenloher 2 Joint work with Andr

Buying, Selling, Merging Buying, Selling, Merging and Valuation and Valuation Sponsored by: US

Yet Another Approach To Model Merging merge and diff relations and very short version rules

Identity Linking Identity Linking An Alternative to Merging An Alternative to Merging A

Statistics of non-equilibrium systems from the theory of sample-space-reducing processes Stefan

ESELU Term Project Assignment Bernhard Frmel and Denise Ratasich Institut fr Technische

No Professors Allowed: The Scoop on CS Course Offerings for Fall 2013 April 2, 2013 WACM

Sixteen words Shakespeare did not invent gloomy http://www.oed.com/view/Entry/79096 David

Pipeline Safety 20 Years After The Bellingham Tragedy Whats Changed? Are We Safer? New

Pipeline Safety Embrace the Conflict Every Conflict is an

Bringing the Gathering Home How do we make this more than a five day experience? Before the

Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer (Me) Juncen Li 1

Parton Showers and Matching/Merging Lecture 2 of 2: Matching/Merging & Non-Perturbative