Discriminative Bimodal Networks for Visual Localization and - PowerPoint PPT Presentation

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries Yuting Zhang , Luyao Yuan, Yijie Guo, Zhiyuan He, I - An Huang, Honglak Lee University of Michigan, Ann Arbor

Detection with natural language queries a car a doorway with an arched entryway a small domed roof a tree with bare branches large white multi level building light in the roof of building Detection results from our work. Detection: Boxes with SOLID edges. Ground truth: Semi-transparent boxes with DASHED edges.

Typical previous works (based on captioning) +(# $ | ⋯ ) +(# & | ⋯ ) +(# ' | ⋯ ) +(# ( | ⋯ ) +(# ) | ⋯ ) +(# ./0 | ⋯ ) ⋅ ⋅ ⋅ ⋅ ⋅ + # 3 = white dog with black spots end ! RNN RNN RNN RNN RNN RNN start white dog with black spots = = = = = (# $ , # ) ) # & , # ' , # ( , # = white dog with black spots • Based on generative models for image captioning. • The posterior probability in the huge language space is hard to model. • Only positive training samples (matched box and text) • Or a limited amount of negative training samples (mismatched box and text)

Discriminative Bimodal Networks (DBNet) • Fully discriminative: matching probability • A classifier to model a binary output • Extensive use of negative text-box pairs 1 +(6 = 1|!, #) white dog with black spots dog with ball in his month CNN 0 +(6 = 0|!, #) ⋯ ⋯ black leather chair 0 +(6 = 0|!, #) positive image region positive phrase negative image region negative phrase

Discriminative Bimodal Networks (DBNet) Image Fast Image Detection Classifier region R-CNN feature score Text Text CNN FC Layer phrase feature 1 +(6 = 1|!, #) white dog with black spots dog with ball in his month CNN 0 +(6 = 0|!, #) ⋯ ⋯ black leather chair 0 +(6 = 0|!, #) positive image region positive phrase negative image region negative phrase

DBNet: Training labels for text-box pairs • Spatial overlapping based labeling 0.00: waterfall into a fountain 0.00: yellow flowers in the plant 0.88: duck Training box 0.32: is standing male duck 0.48: torso of duck Uncertain phrase Positive phrase 0.86: brown duck with orange beak Negative phrase 0.09: duck is getting in the water Uncertain phrases: • Text similarity based augmentation • torso of duck • of uncertain phrases male duck • a male duck • …

Experiments: Localization in Single Images • Visual Genome dataset • VGGNet is the default backbone image network Accuracy/% for IoU@ Median Mean Method 0.3 0.5 0.7 IoU IoU DenseCap 25.7 10.1 2.4 0.092 0.178 SCRC 27.8 11.0 2.5 0.115 0.189 DBNet 38.3 23.7 9.9 0.152 0.258 DBNet (ResNet) 42.3 26.4 11.2 0.205 0.284

Experiments: Detection in Multiple Images • We propose a new evaluation protocol for detection with text queries • 3 difficulty levels: increasing numbers of negative images per phrase • Mean AP (mAP): each phrase has its own decision threshold • Global AP (gAP): all phrases share the same decision threshold (requires scores to be calibrated over phrases) Difficulty level: 0 1 2 AP / % mAP gAP mAP gAP mAP gAP DenseCap 15.7 0.5 10.0 0.3 1.7 0.0 SCRC 16.5 0.5 16.3 0.4 12.8 0.2 DBNet 30.0 10.8 28.8 9.9 17.7 3.9 DBNet (ResNet) 32.6 11.5 31.2 10.7 19.8 4.3

Thank you! a bright colored snow board a green dollar sign on a board a red and white sign a snowboarder with a red jacket bright white snow on a ski slop dark green pine trees in the snow Data, Code & Models: Detection results from our work. http:// DBNet.link Detection: Boxes with SOLID edges. Ground truth: Semi-transparent boxes with DASHED edges.

Discriminative Bimodal Networks for Visual Localization and - PowerPoint PPT Presentation

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries Yuting Zhang , Luyao Yuan, Yijie Guo, Zhiyuan He, I - An Huang, Honglak Lee University of Michigan, Ann Arbor Detection with natural language

Category-level localization Cordelia Schmid Category-level localization Localization of

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

A Bimodal Analysis of Knowability Sergei Artemov & Tudor Protopopescu Logic Colloquium 2011

Bimodal Multicast And Cache Invalidation Who/What/Where Bruce Spang Software

Bimodal Algorithms Uni-modal distribution Input data block boundaries unimodal chunking 64 KB

Localization in Sensor Networks Rahul Jain ETH Z urich May 5, 2010 Rahul Jain Localization

Localization Nischal K N System Overview Mapping Hector Mapping Localization Path Planning

Category-level localization Cordelia Schmid Category-level localization Localization up to a

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Maulik Shah,

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

DEEP SEMANTIC-VISUAL EMBEDDING WITH LOCALIZATION Thursday 4th October, 2018 Martin Engilberge,

Localization of Sensor Networks Localization of Sensor Networks Jie Gao Computer Science

Localization in Sensor Networks Localization in Sensor Networks Jie Gao Computer Science

Review Session I CS 466 Wesley Wei Qian March 10th 2020 Midterm Exam This Thursday!

CSE182-L12 LW statistics/Assembly Quiz Who are these people, and what is the occasion?

Basics and Prospects in YUM! YUM! Epigenomics Epigenetics Outline Epigenetics

Bioinformatics Vocabulary Processing, analyzing, experimenting with data Where does the

Knowledge-Based Reasoning in Computer Vision CSC 2539 Paul Vicol Outline Knowledge Bases

Learning Where to Look and Listen: Egocentric and 360 Computer Vision Kristen Grauman Facebook

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22,

Algorithms in Bioinformatics: A Practical Introduction Motif Finding Composition of our genome

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Discriminative Bimodal Networks for Visual Localization and - PowerPoint PPT Presentation

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries Yuting Zhang , Luyao Yuan, Yijie Guo, Zhiyuan He, I - An Huang, Honglak Lee University of Michigan, Ann Arbor Detection with natural language

Category-level localization Cordelia Schmid Category-level localization Localization of

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

A Bimodal Analysis of Knowability Sergei Artemov &amp; Tudor Protopopescu Logic Colloquium 2011

Bimodal Multicast And Cache Invalidation Who/What/Where Bruce Spang Software

Bimodal Algorithms Uni-modal distribution Input data block boundaries unimodal chunking 64 KB

Localization in Sensor Networks Rahul Jain ETH Z urich May 5, 2010 Rahul Jain Localization

Localization Nischal K N System Overview Mapping Hector Mapping Localization Path Planning

Category-level localization Cordelia Schmid Category-level localization Localization up to a

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Maulik Shah,

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

DEEP SEMANTIC-VISUAL EMBEDDING WITH LOCALIZATION Thursday 4th October, 2018 Martin Engilberge,

Localization of Sensor Networks Localization of Sensor Networks Jie Gao Computer Science

Localization in Sensor Networks Localization in Sensor Networks Jie Gao Computer Science

Review Session I CS 466 Wesley Wei Qian March 10th 2020 Midterm Exam This Thursday!

CSE182-L12 LW statistics/Assembly Quiz Who are these people, and what is the occasion?

Basics and Prospects in YUM! YUM! Epigenomics Epigenetics Outline Epigenetics

Bioinformatics Vocabulary Processing, analyzing, experimenting with data Where does the

Knowledge-Based Reasoning in Computer Vision CSC 2539 Paul Vicol Outline Knowledge Bases

Learning Where to Look and Listen: Egocentric and 360 Computer Vision Kristen Grauman Facebook

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22,

Algorithms in Bioinformatics: A Practical Introduction Motif Finding Composition of our genome

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

A Bimodal Analysis of Knowability Sergei Artemov & Tudor Protopopescu Logic Colloquium 2011