Multi MT MTM - Lisbon, 1 Sept 2017 Lucia Specia (USFD) MMT MTM - - PowerPoint PPT Presentation

Multimodal Machine Translation Lucia Specia University of Sheffield l.specia@sheffield.ac.uk Multi MT MTM - Lisbon, 1 Sept 2017 Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 1 / 72

A wall divided the city. Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 2 / 72

A wall divided the city. Eine Wand teilte die Stadt. → Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 2 / 72

A wall divided the city. Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 2 / 72

A wall divided the city. Eine Mauer teilte die Stadt. → Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 2 / 72

Overview Problem definition 1 Background 2 Language grounding Computer Vision Multimodal Machine Translation 3 General framework 4 How well do MMT systems perform? 5 On-going work 6 Examples in MMT 7 Remarks 8 Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 3 / 72

Scope Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 5 / 72

Scope Machine Translation Text Summarisation Text Simplification (Natural Language Generation) Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 6 / 72

Hypothesis Humans Use a lot more cues than just text when making sense of the world and performing tasks Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 7 / 72

Hypothesis Humans Use a lot more cues than just text when making sense of the world and performing tasks Image can contribute in cases of Ambiguity (lexical, gender, syntactic) Vagueness OOV Relevance, etc Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 7 / 72

Hypothesis Humans Use a lot more cues than just text when making sense of the world and performing tasks Image can contribute in cases of Ambiguity (lexical, gender, syntactic) Vagueness OOV Relevance, etc Vision & language very popular nowadays Annual workshops since 2011 Tutorials since 2013 Summer schools since 2015, etc Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 7 / 72

Background Work on language grounding : Images to represent a model of perception of the world: Train a CNN on a object recognition task, e.g. [Xu et al., 2015] Do a forward pass given an image input Use one or more layers (e.g. FC 7 , CONV 5 ) or output for language task Image from (Elliott et al., ACL16) tutorial on Multimodal Learning and Reasoning Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 9 / 72

Background - Language grounding Representational grounded (lexical) semantics Multimodal semantics to represent the meaning of a word Method: Fusion Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 10 / 72

Background - Language grounding Representational grounded (lexical) semantics Multimodal semantics to represent the meaning of a word Method: Fusion Referential grounded (lexical) semantics Cross-modal semantics to determine the referent a word denotes Method: Mapping Images from (Elliott et al., ACL16) tutorial on Multimodal Learning and Reasoning Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 10 / 72

Background - Referential grounding Idea of mapping : Images from (Elliott et al., ACL16) tutorial on Multimodal Learning and Reasoning Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 11 / 72

Background Monolingual work in Computer Vision : Image captioning Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 13 / 72

Background Monolingual work in Computer Vision : Image captioning Images from (Elliott et al., ACL16) tutorial on Multimodal Learning and Reasoning Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 13 / 72

Background - Computer Vision Visual question answering Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 14 / 72

Background - Computer Vision Visual question answering Video captioning Scene description, etc. Images from (Elliott et al., ACL16) tutorial on Multimodal Learning and Reasoning Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 14 / 72

Multimodal Machine Translation Given a text which has one or more images associated with it: Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 16 / 72

Multimodal Machine Translation Find alignments (i.e. mappings): Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 17 / 72

Multimodal Machine Translation Use grounded language as part of a translation model: Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 18 / 72

Challenges 1 Object detection is not perfect and strongly biased towards objects seen in training 2 Mapping models only work well enough in closed domains 3 No obvious way to encode sparse image information along with language models 4 No large enough multimodal dataset to train translation models Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 19 / 72

Challenges 1 Object detection is not perfect and strongly biased towards objects seen in training 2 Mapping models only work well enough in closed domains 3 No obvious way to encode sparse image information along with language models 4 No large enough multimodal dataset to train translation models Solutions : Translate image description datasets Use dense, low-level intermediate layer CNN features Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 19 / 72

Challenges - Object detection ImageNet Image database organised acc. to WordNet hierarchy (nouns) Synsets (or object “categories”): 21,841 Number of images: 14,197,122 (average 500 per synset) Number of images with bounding box annotations: 1,034,908 In practice, we use models trained on 1,000 object categories from ILSVRC shared tasks [Russakovsky et al., 2015] Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 20 / 72

Challenges - Object detection ImageNet Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 21 / 72

Challenges - Object detection Top-10 easiest categories to predict [Russakovsky et al., 2014] from ImageNet ( ILSVRC ) Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 22 / 72

Challenges - Datasets General texts make mapping too complex Use sentences that are descriptions : image captioning datasets Evidence that image description generation is “good enough” Monolingual datasets exist which can be extended to other languages Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 23 / 72

Challenge - Dataset creation 32.5K English → German/French images and professional translations from English Flickr30K [Elliott et al., 2016, Elliott et al., 2017] Sentences and images Training set Development set Test2016 29,000 1,014 1,000 Sentences and images Test2017 TestCOCO 1,000 461 Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 24 / 72

Challenge - Dataset creation Flick30K Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 25 / 72

Challenge - Dataset creation Ambiguous COCO (from Verse [Gella et al., 2016]) Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 26 / 72

General framework Sequence-to-sequence ( encoder-decoder ) neural net models Visual information: Dense, low-level feature vectors (layers of CNN) Less common : sparse object categories (output of CNN) Basic method : visual information to initialise encoder/decoder/both, or concatenated with word representations (at each time step) Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 28 / 72

General framework NMT → MMT Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 29 / 72

Multi MT MTM - Lisbon, 1 Sept 2017 Lucia Specia (USFD) MMT MTM - - PowerPoint PPT Presentation

Multimodal Machine Translation Lucia Specia University of Sheffield l.specia@sheffield.ac.uk Multi MT MTM - Lisbon, 1 Sept 2017 Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 1 / 72 A wall divided the city. Lucia Specia (USFD) MMT MTM

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Modeling of multi-scale and multi-physical properties of acoustic materials Camille Perrot

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ

CENG3420 Lecture 13: Multi-Threading & Multi-Core Bei Yu byu@cse.cuhk.edu.hk (Latest

A multi- -layer layer A multi A multi-layer research and training platform research and

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

MULTI-SCREEN WORLD AGENDA What is a multi-screen website? The growing importance of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

10/20/19 Multi-objective Evolutionary Algorithms Genetic Algorithms Multi-objective

Data Assimilation and Detection in Multi-Sensor & Multi-Scale Environments N. Sri

Synthesis and Exploration of Multi- Level, Multi-Perspective Architectures of Automotive

Multi-Schema and Multi-Server Advances for C2-Simulation Interoperation in MSG-085 Dr. Mark

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Mar Markh kham am Multi Multi-use use Pathw thway ( ay (MU MUP) P) th Aven (16 (16 th

Analysis and Control of Multi-Robot Systems Multi-Robot Localization Dr. Paolo Stegagno

Scheduling Multi-Periodic Mixed-Criticality DAGs on Multi-Core Architectures Roberto MEDINA

Multi-plane multi-view approach to project the sphere viewing sphere Introduction Global Map

F From STEM EELS to multi dimensional and STEM EELS t lti di i l d multi signal

multi media multi media why use multimedia? quick & direct delivery of information

Multi MT MTM - Lisbon, 1 Sept 2017 Lucia Specia (USFD) MMT MTM - - PowerPoint PPT Presentation

Multimodal Machine Translation Lucia Specia University of Sheffield l.specia@sheffield.ac.uk Multi MT MTM - Lisbon, 1 Sept 2017 Lucia Specia (USFD) MMT MTM - Lisbon, 1 Sept 2017 1 / 72 A wall divided the city. Lucia Specia (USFD) MMT MTM

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Modeling of multi-scale and multi-physical properties of acoustic materials Camille Perrot

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ

CENG3420 Lecture 13: Multi-Threading &amp; Multi-Core Bei Yu byu@cse.cuhk.edu.hk (Latest

A multi- -layer layer A multi A multi-layer research and training platform research and

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

MULTI-SCREEN WORLD AGENDA What is a multi-screen website? The growing importance of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

10/20/19 Multi-objective Evolutionary Algorithms Genetic Algorithms Multi-objective

Data Assimilation and Detection in Multi-Sensor &amp; Multi-Scale Environments N. Sri

Synthesis and Exploration of Multi- Level, Multi-Perspective Architectures of Automotive

Multi-Schema and Multi-Server Advances for C2-Simulation Interoperation in MSG-085 Dr. Mark

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Mar Markh kham am Multi Multi-use use Pathw thway ( ay (MU MUP) P) th Aven (16 (16 th

Analysis and Control of Multi-Robot Systems Multi-Robot Localization Dr. Paolo Stegagno

Scheduling Multi-Periodic Mixed-Criticality DAGs on Multi-Core Architectures Roberto MEDINA

Multi-plane multi-view approach to project the sphere viewing sphere Introduction Global Map

F From STEM EELS to multi dimensional and STEM EELS t lti di i l d multi signal

multi media multi media why use multimedia? quick &amp; direct delivery of information

CENG3420 Lecture 13: Multi-Threading & Multi-Core Bei Yu byu@cse.cuhk.edu.hk (Latest

Data Assimilation and Detection in Multi-Sensor & Multi-Scale Environments N. Sri

multi media multi media why use multimedia? quick & direct delivery of information