Show, Attend, and Tell Neural Image Caption Generation with Visual - PowerPoint PPT Presentation

Jan 06, 2023 •312 likes •481 views

Show, Attend, and Tell Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio University of Montreal and University of

Show, Attend, and Tell Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio University of Montreal and University of Toronto Presented By: Hannah Li, Sivaraman K S 1
Introduction We can easily: Segment, localize, and categorize However, Interpreting the image is more difficult Goal of this work: Generate captions for images using attention mechanism 2
Related Work - Generating Image Captions - Recurrent neural networks (Cho et al., 2014, Bahdanau et al., 2014, Stuskever et al., 2014) - LSTM for videos and images (Vinyals et al., 2014, Donahue et al., 2014) - Joint CNN-RNN with object detection (Karpathy & Li, 2014, Fang et al., 2014) - Attention (Larochelle & Hinton 2010) 3
Model Overview Generates a caption y as a sequence of encoded words 4
Encoder: Convolutional Features Goal: input raw image and produce a set of feature vectors (annotation vectors) Produces L vectors (each a D-dimensional representation corresponding to part of an image) 5
Decoder: Long Short-Term Memory Network Input, forget, memory, output and hidden state W, U, Z: weight matrices b: biases E: an embedding matrix z t : representation of the relevant part of the image at time t 6
Decoder: Long Short-Term Memory Network Logistic sigmoid activation Deep output layer to compute the output word probability Stochastic attention: the probability that location i is the correct place to focus on for producing the next word Deterministic attention: the relative importance to give to location i in blending the a i ’s together 7
Hard Attention Hard attention - learning to maximize the context vector z from a combination of a one-hot encoded variable s t,i and the extracted features a i . Trained using Sampling method s t - where the model decides to focus attention when generating the t th word Stochastic - Assign a Multinoulli distribution 8
Soft Attention Learning by maximizing the expectation of the context vector. Trained End-to-End Deterministic - Whole distribution optimized, not single choices (s t not picked from a distribution) 9
Training The attention framework learns latent alignments from scratch instead of explicitly using object detectors. Allows the model to go beyond "objectness" and learn to attend to abstract concepts. 10
Dataset Flickr8k and Flickr30k datasets 5 reference captions MS COCO dataset Discarded caption in excess of 5 Applied basic tokenization Fixed vocabulary size of 10K 11
Results 1. Significantly improve the state of the art performance METEOR on MS COCO 2. More flexibility - attend to non object salient regions 12
13 Source: http://d3kbpzbmcynnmx.cloudfront.net/wp-content/uploads/2015/12/Screen-Shot-2015-12-30-at-1.42.58-PM.png
Analysis of learning to attend 14
Mistakes 15
Reference ● https://arxiv.org/pdf/1502.03044.pdf ● http://kelvinxu.github.io/projects/capgen.html ● http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nl p/ ● https://blog.heuritech.com/2016/01/20/attention-mechanism/ 16

Recommend

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu*, Jimmy Ba

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu*, Jimmy Ba , Ryan Kiros , Kyunghyun Cho*, Aaron Courville*, Ruslan Salakhutdinov , Richard Zemel , Yoshua Bengio* eal*/ University of Toronto

824 views • 46 slides

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio Presented by Kathy Ge Motivation: Attention

567 views • 20 slides

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Lei

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio. ICML 2015 Presented By: Sai Krishna Bollam

410 views • 18 slides

June 12, 2020 Type to enter a caption. Greeter Graham Drake Type to enter a caption. Give

June 12, 2020 Type to enter a caption. Greeter Graham Drake Type to enter a caption. Give Thanks / Sunshine Report / Story Brian Gentles Type to enter a caption. Club Assembly Directors Intro Ginny Murphy Type to enter a

998 views • 64 slides

Image Caption Image Caption Image Caption Lorem ipsum dolor sit amet, consectetur adipiscing

A New Home for MADISON SQUARE GARDEN New York City These pages represent a sample Project Presentation. It is intended to provide as full and complete a description of a submitted project, and as such will serve as the basis of the bulk of the

428 views • 8 slides

RNNs for Image Caption Generation James Guevara Recurrent Neural Networks Contain at least

RNNs for Image Caption Generation James Guevara Recurrent Neural Networks Contain at least one directed cycle. Applications include: pattern classification, stochastic sequence modeling, speech recognition. Train using

681 views • 22 slides

CSC2539 - Datasets and Metrics for Image Caption Generation Kaustav Kundu University of Toronto

CSC2539 - Datasets and Metrics for Image Caption Generation Kaustav Kundu University of Toronto Kaustav Kundu (UofT) Datasets and Metrics 1 / 32 Types of Image Descriptions Conceptual Specific: Identifying people and locations

972 views • 58 slides

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural IR tasks Neural IR architecture Feature Representations Neural IR query auto completion Neural IR query suggestion Neural IR document

1.48k views • 18 slides

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image Analysis In Image Enhancement, the quality of the improved image was Image Restoration and Reconstruction judged subjectively. In contrast, Image

349 views • 8 slides

Show n Tell (Berkeley Style) Bruce A. Mah bmah@tenet.Berkeley.EDU 15 June 1992 Show

Show n Tell (Berkeley Style) Bruce A. Mah bmah@tenet.Berkeley.EDU 15 June 1992 Show n Tell (bmah) 1/8 What Ive Been Doing RCAP (Will he ever finish?) XUNET 3 (The Search for Gigabits) Sendmail (Once a sysadmin...) Show

426 views • 8 slides

April 3, 2020 Type to enter a caption. Estate Planning | 9 Estate Planning | 10 Jamie

April 3, 2020 Type to enter a caption. Estate Planning | 9 Estate Planning | 10 Jamie Powell Questions Type to enter a caption. This Weeks Wisdom Always plan ahead! It wasnt raining when Noah built the Ark!" -

492 views • 17 slides

GANs for Discrete Text Generation Junfu Oct. 20 th , 2018 Show, Tell and Discriminate

Paper Reading GANs for Discrete Text Generation Junfu Oct. 20 th , 2018 Show, Tell and Discriminate Problems in Image Captioning Imitate the language structure patterns (phrases, sentences) Templated and Generic (Different image

244 views • 21 slides

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks and Handwriting Recognition Steven Sloss Math 164 Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven Sloss Structure Training Neural Networks Math 164 Motivation Problem

889 views • 41 slides

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural Networks can represent complex decision boundaries decision boundaries Variable size. Any boolean function can be Variable size. Any boolean

358 views • 14 slides

To Attend or not to Attend: A Case Study on Syntactic Structures for Semantic Relatedness

To Attend or not to Attend: A Case Study on Syntactic Structures for Semantic Relatedness Authors Amulya Gupta Zhu (Drew) Zhang Email: guptaam@iastate.edu Email: zhuzhang@iastate.edu https://github.com/amulyahwr/ acl2018 2 Agenda

357 views • 25 slides

SPICE: Semantic Propositional Image Caption Evaluation Presented to the COCO Consortium, Sept

SPICE: Semantic Propositional Image Caption Evaluation Presented to the COCO Consortium, Sept 2016 Peter Anderson 1 , Basura Fernando 1 , Mark Johnson 2 and Stephen Gould 1 1 Australian National University 2 Macquarie University ARC Centre of

544 views • 21 slides

Attention Models Attention Models: Motivation bird Image: H x W x 3 The whole input volume is

Day 4 Lecture 6 Attention Models Attention Models: Motivation bird Image: H x W x 3 The whole input volume is used to predict the output... 2 Attention Models: Motivation bird Image: H x W x 3 The whole input volume is used to predict

497 views • 31 slides

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason Weston. EMNLP 2015 Presented by Peiyao Li, Spring 2020 Extractive vs. Abstractive Summarization Extractive Summarization: Abstractive Summarization:

782 views • 64 slides

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models Language Models Language models are generative models of text s ~ P(x) The Malfoys! said Hermione. Harry was watching him. He looked like

1.18k views • 37 slides

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP 1 Deep Neural Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Computational Challenges

1.33k views • 110 slides

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to Neural Machine Translation Neural language models review Sequence to sequence models for MT Encoder-Decoder Sampling and search

298 views • 18 slides

Attention-based Learning for Missing Data Imputation in HoloClean Richard Wu 1 , A oqian Zhang 1 ,

Attention-based Learning for Missing Data Imputation in HoloClean Richard Wu 1 , A oqian Zhang 1 , Ihab F. Ilyas 1 Theodoros Rekatsinas 2 1 2 Problem Missing data is a persistent problem in many fields Sciences Data mining

322 views • 28 slides

PAY ATTENTION! Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention:

PAY ATTENTION! Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention: 1. Indoors 2. At the dog park 3. On leash In all situations, the goal is to become more rewarding than the outside environment. SEQUENCE FOR

257 views • 13 slides

Modeling Sub-Document Attention Using Viewport Time Max Grusky Jeiran Jahani Josh Schwartz Dan

Modeling Sub-Document Attention Using Viewport Time Max Grusky Jeiran Jahani Josh Schwartz Dan Valente Yoav Artzi Mor Naaman (with support from Nir Grinberg) Current understanding of user engagement on the Web is limited mostly to the

296 views • 12 slides