Segmentation from Natural Language Expressions Ronghang Hu, Marcus - PowerPoint PPT Presentation

Segmentation from Natural Language Expressions Ronghang Hu, Marcus Rohrbach, Trevor Darrell Presenter: Tianyi Jin

Comparisons between different semantic image segmentation problems (f) Natural Language Object Retrieval: bounding box only, non (e) Grabcut: generate a mask over the pixelwise foreground (or the most salient) object

Overview Goal: Pixel-level segmentation of image, based on natural language expression

Related Work • Localizing objects with natural language - bounding box only • Fully convolutional network for segmentation - used for feature extraction and segmentation output • Attention and visual question answering - only learn to generate coarse spatial outputs, with other purposes

Our Model A Detailed Look At 👁

Spatial feature map extraction • Fully convolutional network - Input image size: W x H , spatial feature map size: w × h , with each position on the feature map containing D im channels ( D im dimensional local descriptors) • Apply L2-normalization to the D im dimensional local descriptor - Extract a w × h × D im spatial feature map as the representation for each image • Add extra channels of x, y coordinate of each spatial location - Get a w × h × (D im +2) representation containing descriptors and spatial coordinates • In this implementation: VGG-16 with treating fc6, fc7 and fc8 as convolutional layers, which outputs D im = 1000 dimensional local descriptors. • Resulting feature map size: w = W/s and h = H/s , where s = 32 is the pixel stride on fc8 layer output. (Here W = H = 512 )

Encoding expressions with LSTM network • Embed each word into a vector through a word embedding matrix • Use a recurrent Long-Short Term Memory (LSTM) network with D text dimensional hidden state to scan through the embedded word sequence • L2-normalize • In this implementation: LSTM network with D text = 1000 dimensional hidden state

Spatial classification and upsampling • Fully convolutional classifier over the local image descriptor and the encoded expression - Tile and concatenate hidden state to the local descriptor at each spatial location in the spatial grid -> a w × h × D’ (where D’ = D im +D text +2 ) spatial map - Train a two-layer classification network (two 1 x 1 convolutional layers), with a D cls dimensional hidden layer, which takes at input the D’ dimensional representation -> a score indicating whether a spatial location belong to the target image region or not - In this implementation: D cls = 500 • Upsampling through deconvolution - a 2 s × 2 s deconvolution filter with stride s (here s = 32) - Produces a W × H high resolution response map that has the same size as the input image

Loss Function

Experiments • Dataset: ReferIt [1] • 20,000 images. 130,525 expressions annotated on 96,654 segmented image regions. - Here: 10,000 images for training and validation,10,000 images for testing - contains both “object” regions (car, person, bottle) and “stuff” regions (sky, river and mountain) • Baseline methods - Combination of per-word segmentation - Foreground segmentation from bounding boxes - Classification over segmentation proposals - Whole image [1] Kazemzadeh, S., Ordonez, V., Matten, M., Berg, T.L.: Referitgame: Referring to objects in photographs of natural scenes (EMNLP 2014)

Evaluation • Two-stage training strategy: - Low resolution version: w × h = 16 × 16 coarse response map - High resolution version: upsampled from low resolution model, predict W × H high resolution segmentation • Overall IoU: total intersection area divided by the total union area • Precision: the percentage of test samples where the IoU between prediction and ground-truth passes the threshold

Results

Questions?

Thank you!

Segmentation from Natural Language Expressions Ronghang Hu, Marcus - PowerPoint PPT Presentation

Segmentation from Natural Language Expressions Ronghang Hu, Marcus Rohrbach, Trevor Darrell Presenter: Tianyi Jin Comparisons between different semantic image segmentation problems (f) Natural Language Object Retrieval: bounding box

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Regular Expressions (REs) Regular Expressions (REs) p.1/37 Expressions In arithmetic:

Chapter 7 Expressions and Statements Expressions Arithmetic Expressions Conditional

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Fem Poble(s): Expressions Meritxell (Txell) Martn Pardo, Ph.D Research associate Data

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Regexp Lecture 26: Regular Expressions Regular Expressions Regular expressions are a small

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Detection and Segmentation of Detection and Segmentation of Touching Characters in Touching

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Potentially dangerous glacial lakes Cartographers in the northern Tien Shan Manfred

REPL::Financial Statements and Related Announcement::Full Yearly Results Page 1 of 1

EECS 192: Mechatronics Design Lab Discussion 12: AGC & Mechanical Tuning GSI: Varun Tolani

ARio Kart Sourav Panda David Yang Bujji Setty Problem Drones Not easily accessible

Advanced Microeconomics Part 3 a. Expected Utility Theory b. Value of Information c. Games with

Op#mizing ARIANNA Design (all-nu, nu-tau, cosmic ray) US, Sweden, Taiwan, Germany, Denmark S.

Stochastic / Randomized Derivative Free Optimization Anne Auger (Inria and CMAP, Ecole

Advanced architectures Benoit Favre < benoit.favre@univ-mrs.fr > Aix-Marseille Universit,

Segmentation from Natural Language Expressions Ronghang Hu, Marcus - PowerPoint PPT Presentation

Segmentation from Natural Language Expressions Ronghang Hu, Marcus Rohrbach, Trevor Darrell Presenter: Tianyi Jin Comparisons between different semantic image segmentation problems (f) Natural Language Object Retrieval: bounding box

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Regular Expressions (REs) Regular Expressions (REs) p.1/37 Expressions In arithmetic:

Chapter 7 Expressions and Statements Expressions Arithmetic Expressions Conditional

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Fem Poble(s): Expressions Meritxell (Txell) Martn Pardo, Ph.D Research associate Data

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Regexp Lecture 26: Regular Expressions Regular Expressions Regular expressions are a small

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Detection and Segmentation of Detection and Segmentation of Touching Characters in Touching

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Potentially dangerous glacial lakes Cartographers in the northern Tien Shan Manfred

REPL::Financial Statements and Related Announcement::Full Yearly Results Page 1 of 1

EECS 192: Mechatronics Design Lab Discussion 12: AGC &amp; Mechanical Tuning GSI: Varun Tolani

ARio Kart Sourav Panda David Yang Bujji Setty Problem Drones Not easily accessible

Advanced Microeconomics Part 3 a. Expected Utility Theory b. Value of Information c. Games with

Op#mizing ARIANNA Design (all-nu, nu-tau, cosmic ray) US, Sweden, Taiwan, Germany, Denmark S.

Stochastic / Randomized Derivative Free Optimization Anne Auger (Inria and CMAP, Ecole

Advanced architectures Benoit Favre &lt; benoit.favre@univ-mrs.fr &gt; Aix-Marseille Universit,

EECS 192: Mechatronics Design Lab Discussion 12: AGC & Mechanical Tuning GSI: Varun Tolani

Advanced architectures Benoit Favre < benoit.favre@univ-mrs.fr > Aix-Marseille Universit,