Image Captioning Describe an image with meaningful and sensible - PowerPoint PPT Presentation

23rd International Conference on MultiMedia Modeling (MMM 2017) What Convnets Make for Image Captioning? Yu Liu*, Yanming Guo*, and Michael S. Lew Leiden Institute of Advanced Computer Science, Leiden University Presenter: Yanming Guo Discover the world at Leiden University

Image Captioning Describe an image with meaningful and sensible sentence-level captions.  Objects  Actions  Descriptive words  Relations … A large bus sitting next to a very tall building Discover the world at Leiden University

Image Captioning  Retrieval approaches ---- Map images to pre-defined sentences  Generative approaches ---- Estimate novel sentences A white dog and a brown dog run along side each other at the beach; A dog running on a wet suit on the beach Discover the world at Leiden University

Image Captioning  Retrieval approaches ---- Map images to pre-defined sentences  Generative approaches ---- Estimate novel sentences Advantages:  Caption does not have to be previous seen  A good language model  More intelligent  Better performance Discover the world at Leiden University

General Structure “White” “Cup” END ? … “White” “Cup” START CNN RNN Generate a sentence of words High-level image features Discover the world at Leiden University

General Structure “White” “Cup” END ? … “White” “Cup” START CNN RNN What Convnets make for image captioning? Discover the world at Leiden University

Three types of Convnets Single-label finetune Multi-label Multi-attribute  Single-label Convnet Generic representation ---- Convnet pre-trained on ImageNet dataset, e.g. AlexNet , VGG …  Multi-label Convnet Salient objects ---- Fine-tune Convnet on 80 object categories of MS COCO  Multi-attribute Convnet Salient objects, actions, relations… ---- Fine-tune Convnet on attributes of MS COCO (e.g. 300 attributes) Discover the world at Leiden University

Three types of Convnets Input image Single-label Convnet Multi-label Convnet Multi-attribute Convnet The visualization of the most activated feature map in conv5_3 Discover the world at Leiden University

Multi-Convnet Aggregation Single-label feature Aggregation feature Multi-label feature Multi-attribute feature 𝑦 0 𝑦 1 𝑦 i−2 𝑦 T−1 ag(x) ag(x) ag(x) ag(x) … … LSTM LSTM LSTM LSTM 𝑞 2 𝑞 1 𝑞 i−1 𝑞 T Discover the world at Leiden University

Multi-Scale Testing … CNN 224 Caption generation transfer average … FCN LSTM 256 x t transfer … FCN 320 Discover the world at Leiden University

Experiments  BLUE: measures the precision of n-grams between the generated and reference sentences (e.g. B-1, B-2, B-3, B-4).  METEOR: computed based on the alignment between the words in a generated and reference sentences.  ROUGE-L: focus on a set words that are appear in the same order in two sentences.  CIDEr: use a tf-idf weights for computing each n-grams. Discover the world at Leiden University

Experiments  Multi-scale: considerable improvement  SL-Net: largest dimension & worst performance  ML-Net: smallest dimension & considerable improvement  MA-Net : medium dimension & significant improvement Discover the world at Leiden University

Experiments  Multi-scale testing using FCN is always better;  The aggregation of different Convnets can enhance the performance Discover the world at Leiden University

Experiments Single-label Convnet: A man is sitting on the water with a surfboard. Multi-label Convnet: A man sitting on a boat in front of a boat. Multi-attribute Convnet: A man and a dog on a boat. Multi-Convnet aggregation: A man and a dog on a small boat. Ground truth: A man and a dog on a small yellow boat. Discover the world at Leiden University

Experiments Discover the world at Leiden University

Experiments Ours: A man riding Ours: A living room Ours: A man riding a Ours: A close up a wave in the ocean. with a lot of furniture. horse at a horse. of an elephant with an elephant GT: A man riding a GT: Living room GT: A man getting a GT: A horse that wave on a surfboard with furniture with kiss on the neck threw a man off a in the ocean. garage door at one from an elephant's horse. end. trunk Discover the world at Leiden University

Conclusion  Multi-attribute Convnet performs better for image captioning  The aggregation of different Convnets can deliver slightly better performance than each individual Convnet  Efficient multi-scale augmentation test using FCNs  Comparable results with the state-of-the-art Discover the world at Leiden University

Thanks for your attention! Questions please?

Image Captioning Describe an image with meaningful and sensible - PowerPoint PPT Presentation

23rd International Conference on MultiMedia Modeling (MMM 2017) What Convnets Make for Image Captioning? Yu Liu, Yanming Guo, and Michael S. Lew Leiden Institute of Advanced Computer Science, Leiden University Presenter: Yanming Guo Discover

Image Captioning Image Captioning Image Captioning A survey of recent deep-learning approaches

Video Captioning Erin Grant March 1 st , 2016 Last Class: Image Captioning From Kiros et al.

Tutorial on Recent Advances in Visual Captioning Luowei Zhou 06/15/2020 1 Outline Problem

Ques Question Answ tion Answering ering Jiyang Zhang, Tong Gao Background Image captioning and

Phrase-based Image Captioning Rmi Lebret , Pedro O. Pinheiro, Ronan Collobert Idiap Research

Implementing Closed Captioning Implementing Closed Captioning for DTV for DTV Graham Jones

Session Transcript: 6/26/2020 Closed Captioning/ Transcript Disclaimer Closed captioning and/or

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Real Time American Sign Language Video Captioning using Deep Neural Networks Syed Tousif Ahmed

Closed Captioning in the US Technology for TV & Internet By Jason Livingston Telestream, LLC

Seattles New Closed Captioning Requirements 08/20/2019 08/20/2019 Seattle Office for Civil

Session Transcript: 6/25/2020 - Afternoon Closed Captioning/ Transcript Disclaimer Closed

YogaAlliance - YA Community Sangha (USYOGA1307A) Closed Captioning/ Transcript Disclaimer Closed

COVID-19 Business Forum RCC (Relay Conference Captioning) Participants can access real-time

Yoga Alliance - Community Sangha Fri 7/24 (USYOGA2407A) Closed Captioning/ Transcript Disclaimer

Yoga Alliance - Mon 7/27 1400 (USYOGA2707B) Closed Captioning/ Transcript Disclaimer Closed

Access Control and Protection Overview Access control: What and Why Abstract Models of

Lecturer: Mr. Michael Allotey Contact Information: mallotey@ug.edu.gh School of Information and

Proverbs Series Lesson #006 February 3, 2013 Dean Bible Ministries www.deanbible.org Dr.

Assessment of donkey body lesions and work types using the Brookes Standardised Equine Based

JA-STIT: the stit way to (public) justification announcements Grigory Olkhovikov Ruhr University

On Selecting the Right Words for Vocabulary Instruction William E. Nagy, Seattle Pacific

An Ultraweak Focusing Storage Ring for Proton EDM Measurement Richard Talman Laboratory for

Hot Topics in Clinical Nutrition Hot Topics In Clinical Nutrition Disclosure Robert Baron, MD

Image Captioning Describe an image with meaningful and sensible - PowerPoint PPT Presentation

23rd International Conference on MultiMedia Modeling (MMM 2017) What Convnets Make for Image Captioning? Yu Liu*, Yanming Guo*, and Michael S. Lew Leiden Institute of Advanced Computer Science, Leiden University Presenter: Yanming Guo Discover

Image Captioning Image Captioning Image Captioning A survey of recent deep-learning approaches

Video Captioning Erin Grant March 1 st , 2016 Last Class: Image Captioning From Kiros et al.

Tutorial on Recent Advances in Visual Captioning Luowei Zhou 06/15/2020 1 Outline Problem

Ques Question Answ tion Answering ering Jiyang Zhang, Tong Gao Background Image captioning and

Phrase-based Image Captioning Rmi Lebret , Pedro O. Pinheiro, Ronan Collobert Idiap Research

Implementing Closed Captioning Implementing Closed Captioning for DTV for DTV Graham Jones

Session Transcript: 6/26/2020 Closed Captioning/ Transcript Disclaimer Closed captioning and/or

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Real Time American Sign Language Video Captioning using Deep Neural Networks Syed Tousif Ahmed

Closed Captioning in the US Technology for TV &amp; Internet By Jason Livingston Telestream, LLC

Seattles New Closed Captioning Requirements 08/20/2019 08/20/2019 Seattle Office for Civil

Session Transcript: 6/25/2020 - Afternoon Closed Captioning/ Transcript Disclaimer Closed

YogaAlliance - YA Community Sangha (USYOGA1307A) Closed Captioning/ Transcript Disclaimer Closed

COVID-19 Business Forum RCC (Relay Conference Captioning) Participants can access real-time

Yoga Alliance - Community Sangha Fri 7/24 (USYOGA2407A) Closed Captioning/ Transcript Disclaimer

Yoga Alliance - Mon 7/27 1400 (USYOGA2707B) Closed Captioning/ Transcript Disclaimer Closed

Access Control and Protection Overview Access control: What and Why Abstract Models of

Lecturer: Mr. Michael Allotey Contact Information: mallotey@ug.edu.gh School of Information and

Proverbs Series Lesson #006 February 3, 2013 Dean Bible Ministries www.deanbible.org Dr.

Assessment of donkey body lesions and work types using the Brookes Standardised Equine Based

JA-STIT: the stit way to (public) justification announcements Grigory Olkhovikov Ruhr University

On Selecting the Right Words for Vocabulary Instruction William E. Nagy, Seattle Pacific

An Ultraweak Focusing Storage Ring for Proton EDM Measurement Richard Talman Laboratory for

Hot Topics in Clinical Nutrition Hot Topics In Clinical Nutrition Disclosure Robert Baron, MD

23rd International Conference on MultiMedia Modeling (MMM 2017) What Convnets Make for Image Captioning? Yu Liu, Yanming Guo, and Michael S. Lew Leiden Institute of Advanced Computer Science, Leiden University Presenter: Yanming Guo Discover

Closed Captioning in the US Technology for TV & Internet By Jason Livingston Telestream, LLC