Deep Image-Text Embeddings Learning Deep Structure-Preserving - PowerPoint PPT Presentation

CS688 Paper Presentation 1 Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016) Woobin Im ( 임우빈 ) 2016-11-08

Sentence-to-image Retrieval Retrieval system Query text A cat next to a blue chair and a deck User Result image 2

Image-to-sentence Retrieval Query image Retrieval system User A black and white cat laying on the carrying case of a computer Result text 3

Image-to-sentence Retrieval Query image Retrieval system User Among sentence list A black and white cat laying on the carrying case of a computer Result text 4

Image Description Generation Query image Retrieval system User Text generation by NLP tech. A black and white cat laying on the carrying case of a computer Result text 5

Text-sentence Embeddings Image Text representation representation Projection Projection Source: Accounting for the Relative Importance of Objects in Image Retrieval 6

Examples of image-to-sentence retrieval Source: Associating neural word embeddings with deep image representations using Fisher Vectors 7

Datasets ● MSCOCO, Flickr 8K, Flickr 30K, Pascal 1K … ● Have a few captions for each image ● MSCOCO has object segment information ● Flickr30K has phrase localizations Example of Flikr30k Entities dataset Source: Flickr30k Entities: Collecting Region-to-Phrase 8 Correspondences for Richer Image-to-Sentence Models

Paper ● Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016) Image feature Text feature 9

Paper ● Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016) Image Sentence Word2vec Pretrained CNN FV-HGLMM Image feature Text feature 10

Paper ● Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016) Loss Image Sentence B-norm B-norm Word2vec fc fc Pretrained CNN (VGG) FV-HGLMM fc fc PCA Image feature Text feature 11

Image feature extraction ● Using Pretrained VGG-VD-19 5 crops & flip = 10 crops Image feature (4096D) Averaging 4 corners + center ImageFeatures (4096D) x 10 Resized Image 13

Text feature extraction ● Word2Vec – word semantic embedding 15 Source: Distributed representations of words and phrases and their compositionality

Text feature extraction ● Fisher Vector of (HGLMM + GMM) Hybrid Sentence Gaussian-Laplacian Gaussian Mixture model Mixture model Word2Vec EM(Training) Fisher Vector Final Vector (6000D) Fisher Vector PCA Concatenation Word Vector (18000D) Work of “Associating neural word embeddings with deep image 16 representations using Fisher Vectors” v

Loss Calculation ● Structure-preserving triplet loss ! : anchor instance $ : image " : matching instance % : sentence # : non-matching instance &((, *) : euclidean distance , : margin image - sentence sentence - image Image structure preserving Text structure preserving 18

Loss Calculation ● Triplet loss? margin Source: “FaceNet: A unified embedding for face recognition and clustering” 19

Loss Calculation ● Structure-preserving Square: image Circle: sentence 20

Evaluation ● Task: ● Image-to-sentence retrieval - Given an image, find nearest K sentences ● Sentence-to-image retrieval - Given a sentence, find nearest K images ● L2-distance ● Dataset ● MSCOCO ● Flickr30K ● Metric ● Recall @ 1, 5, 10 (GT: 5 captions per image) 22

Evaluation setting index ● Net models ● Linear: just one linear projection (one fc) ● Non-linear: what we’ve covered image - sentence ● Training constraints ● One-directional : - . = 0 sentence - image ● Bi-directional : - . = 1 Image structure preserving ● Structure : - 2 = 0. 1 Text structure preserving ● - 1 = 0 for all cases ● No images have the same caption 23

Result (Flickr30K) ● Mean vector: mean of word2vec vectors in a sentence ● Tf-idf: what we learned 24

Result (MSCOCO 1K test) ● Mean vector: mean of word2vec vectors in a sentence ● Tf-idf: what we learned 25

Additional application - Phrase localization on Flickr30K ● Region proposal + text-image Embedding 26

Summary ● Image-to-text & text-to-image retrieval ● By embedding them to one space ● Image feature: pretrained CNN ● Text feature: word2vec + HLGMM+ FV ● Loss: structure-preserving triplet loss ● Test: ● Image-to-text & text-to-image retrieval ● Phrase localization 27

Q&A 28

Deep Image-Text Embeddings Learning Deep Structure-Preserving - PowerPoint PPT Presentation

CS688 Paper Presentation 1 Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016) Woobin Im ( ) 2016-11-08 Sentence-to-image Retrieval Retrieval system Query text A cat next to a blue chair

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Learning Deep Structure-Preserving Image-Text Embeddings Liwei Wang Yin Li Svetlana Lazebnik

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

on AIM Disclaimer The information contained in these slides and the accompanying oral

1 From Fall, 2010, there was a decrease in tenure line faculty from 659 in 2010 to 610 in 2015.

SESSION 1: IMPROVING QUALITY of LIFE PANEL 1.1B: Human health Former President of UICC and

11/22/2015 CATEGORIES LOTS and LOTS Science & Engineering Fair of Metro Detroit

Recognition and Classification of Radioactive Waste using Computer Vision-based Deep Learning

7 LESSONS LEARNED MENTAL HEALTH CASES 1 DONT LOSE SIGHT OF THE BIG PICTURE PEOPLE WITH

Tampa Florida Trip 44th Annual Conference of the National Alliance of Black School Educators

PARCC (Partnership for Assessment of Readiness for College and Careers) 2016 - 2017 Purpose of

Deep Image-Text Embeddings Learning Deep Structure-Preserving - PowerPoint PPT Presentation

CS688 Paper Presentation 1 Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016) Woobin Im ( ) 2016-11-08 Sentence-to-image Retrieval Retrieval system Query text A cat next to a blue chair

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Learning Deep Structure-Preserving Image-Text Embeddings Liwei Wang Yin Li Svetlana Lazebnik

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

on AIM Disclaimer The information contained in these slides and the accompanying oral

1 From Fall, 2010, there was a decrease in tenure line faculty from 659 in 2010 to 610 in 2015.

SESSION 1: IMPROVING QUALITY of LIFE PANEL 1.1B: Human health Former President of UICC and

11/22/2015 CATEGORIES LOTS and LOTS Science &amp; Engineering Fair of Metro Detroit

Recognition and Classification of Radioactive Waste using Computer Vision-based Deep Learning

7 LESSONS LEARNED MENTAL HEALTH CASES 1 DONT LOSE SIGHT OF THE BIG PICTURE PEOPLE WITH

Tampa Florida Trip 44th Annual Conference of the National Alliance of Black School Educators

PARCC (Partnership for Assessment of Readiness for College and Careers) 2016 - 2017 Purpose of

11/22/2015 CATEGORIES LOTS and LOTS Science & Engineering Fair of Metro Detroit