Few-shot Object Reasoning for Robot Instruction Following Yoav - PowerPoint PPT Presentation

Few-shot Object Reasoning for Robot Instruction Following Yoav Artzi Workshop on Spatial Language Understanding EMNLP 2020

Task • Navigation between landmarks • Agent: quadcopter drone • Inputs: poses, raw RGB camera images, and natural language instructions

Task go straight and stop before reaching the planter turn left towards the globe and go forward until just before it

Mapping Instructions to Control ( • The drone maintains a configuration of target velocities Linear forward velocity Angular yaw rate ( v , ω ) • Each action updates the configuration or stops • Goal: learn a mapping from inputs to configuration updates v t f ( , , ) = STOP go straight and stop before reaching the planter ω t turn left globe …

Modular Approach • Build/train separate components • Symbolic meaning representation • Complex integration Instruction Language Understanding Planning Control Perception Mapping

Single-model Approach (a.k.a end-to-end) f Instruction Action How to think of extensibility, interpretability, and modularity when packing everything in a single model?

Single-model Approach • Extensibility: extending the model to reason about new object after training • Interpretability: viewing how the model reasons about object grounding and trajectories • Modularity: re-using parts of the model Within a representation learning framework

Representation: Design vs. Learning • Systems that use symbolic representations are interpretable and (potentially) extensible • However: representation design of every possible concept is brittle and hard to scale • Instead: design the most general concepts and let representation learning fill them with content • Today, two concepts: objects and trajectories

Today Few-shot instruction following: • Few-shot language-conditioned object segmentation • Object context mapping • Integration into a visitation-prediction policy for mapping instructions to drone control

Language-conditioned Object Segmentation • Input: instruction and observation images • Goal: identify and align objects and references

Few-shot Version • Input: instruction, observation images, and database • Goal: identify previously unseen objects and mentions and align them Database blue ball planet earth orange cup plant pot

Alignment via a Database • Approach: align blue ball planet observations and earth references through the orange database cup plant pot • Adding objects to the database extends the alignment ability • Requires only adding a few image and language exemplars

Alignment via a Database • Approach: align blue ball planet observations and earth references through the orange database cup plant pot • Adding objects to the database extends the alignment ability Melon the fruit • Requires only adding a wedge slice watermelon few image and language exemplars the red red cube lego red brick

<latexit sha1_base64="63JDzvbINZs3luchNf8vAbdEYw=">ACU3icbVFNaxsxFNRu0tTd1KnbHnsRMQEbitkNKe2l4NJDe3QgtgNeYyRZa6vWxyK9LTXL/scS6KF/pJceWvkjIbEzIBhm5vGkEc2lcBDHv4Pw4PDJ0dPas+j4ef3kRePlq4EzhW8z4w09poSx6XQvA8CJL/OLSeKSj6ki8rf/idWyeMvoJlzseKzLTIBCPgpUnjWwr8BzhWfpJipqsWfYtG3/EUeoKNSlN1WtRjFMlphibdq9lNtxnorPbSJpZwso7j/oYvR+tVl67mjSacSdeA+TZEuaIvepHGTg0rFNfAJHFulMQ5jEtiQTDJqygtHM8JW5AZH3mqieJuXK47qfCZV6Y4M9YfDXit3p8oiXJuqahPKgJzt+utxMe8UQHZh3EpdF4A12yzKCskBoNXBeOpsJyBXHpCmBX+rpjNiW8I/DdEvoRk98n7ZHDeSd514suLZvfLto4aeoNOUQsl6D3qoq+oh/qIoZ/oD/oXoOBX8DcMw8NAy2M6/RA4T1/yOUr2Y=</latexit> Alignment Score go straight and stop before reaching the planter Reference turn left towards the globe and go forward until just before it Bounding box X Align ( b, r ) = P ( b | o ) P ( o | r ) o Database Object record b Bounding box orange r Reference blue ball cup plant planet pot earth o Database object

<latexit sha1_base64="uZINEPQvNpHiWOKMysbyWhr8Wg=">ACU3icbVFNaxsxFNRu0tTd1KnbHnsRMQEbitkNKe2l4NJDe3QgtgNeYyRZa6vWxyK9LTXL/scS6KF/pJceWvkjIbEzIBhm5vGkEc2lcBDHv4Pw4PDJ0dPas+j4ef3kRePlq4EzhW8z4w09poSx6XQvA8CJL/OLSeKSj6ki8rf/idWyeMvoJlzseKzLTIBCPgpUnjWwr8BzhWfpJipqsWfYtG3/E0VnqCjUpTdVrUYxTJaYm3avZTbch6LbRJpZwso7i/oUvZ+sVl67mjSacSdeA+TZEuaIvepHGTg0rFNfAJHFulMQ5jEtiQTDJqygtHM8JW5AZH3mqieJuXK47qfCZV6Y4M9YfDXit3p8oiXJuqahPKgJzt+utxMe8UQHZh3EpdF4A12yzKCskBoNXBeOpsJyBXHpCmBX+rpjNiW8I/DdEvoRk98n7ZHDeSd514suLZvfLto4aeoNOUQsl6D3qoq+oh/qIoZ/oD/oXoOBX8DcMw8NAy2M6/RA4T1/x39r2Y=</latexit> Alignment Score go straight and stop before reaching the planter Reference turn left towards the globe and go forward until just before it Bounding box P ( o | b ) P ( b ) P ( o | r ) X Align ( b, r ) = P ( o ) o Database Object record b Bounding box orange r Reference blue ball cup plant planet pot earth o Database object

<latexit sha1_base64="uZINEPQvNpHiWOKMysbyWhr8Wg=">ACU3icbVFNaxsxFNRu0tTd1KnbHnsRMQEbitkNKe2l4NJDe3QgtgNeYyRZa6vWxyK9LTXL/scS6KF/pJceWvkjIbEzIBhm5vGkEc2lcBDHv4Pw4PDJ0dPas+j4ef3kRePlq4EzhW8z4w09poSx6XQvA8CJL/OLSeKSj6ki8rf/idWyeMvoJlzseKzLTIBCPgpUnjWwr8BzhWfpJipqsWfYtG3/E0VnqCjUpTdVrUYxTJaYm3avZTbch6LbRJpZwso7i/oUvZ+sVl67mjSacSdeA+TZEuaIvepHGTg0rFNfAJHFulMQ5jEtiQTDJqygtHM8JW5AZH3mqieJuXK47qfCZV6Y4M9YfDXit3p8oiXJuqahPKgJzt+utxMe8UQHZh3EpdF4A12yzKCskBoNXBeOpsJyBXHpCmBX+rpjNiW8I/DdEvoRk98n7ZHDeSd514suLZvfLto4aeoNOUQsl6D3qoq+oh/qIoZ/oD/oXoOBX8DcMw8NAy2M6/RA4T1/x39r2Y=</latexit> Alignment Score P ( o | b ) P ( b ) P ( o | r ) X Align ( b, r ) = P ( o ) o b Bounding box • Region proposal r Reference network gives bounding o Database object P ( b ) boxes and orange cup plant pot P ( o ) is uniform • blue ball planet earth

<latexit sha1_base64="uZINEPQvNpHiWOKMysbyWhr8Wg=">ACU3icbVFNaxsxFNRu0tTd1KnbHnsRMQEbitkNKe2l4NJDe3QgtgNeYyRZa6vWxyK9LTXL/scS6KF/pJceWvkjIbEzIBhm5vGkEc2lcBDHv4Pw4PDJ0dPas+j4ef3kRePlq4EzhW8z4w09poSx6XQvA8CJL/OLSeKSj6ki8rf/idWyeMvoJlzseKzLTIBCPgpUnjWwr8BzhWfpJipqsWfYtG3/E0VnqCjUpTdVrUYxTJaYm3avZTbch6LbRJpZwso7i/oUvZ+sVl67mjSacSdeA+TZEuaIvepHGTg0rFNfAJHFulMQ5jEtiQTDJqygtHM8JW5AZH3mqieJuXK47qfCZV6Y4M9YfDXit3p8oiXJuqahPKgJzt+utxMe8UQHZh3EpdF4A12yzKCskBoNXBeOpsJyBXHpCmBX+rpjNiW8I/DdEvoRk98n7ZHDeSd514suLZvfLto4aeoNOUQsl6D3qoq+oh/qIoZ/oD/oXoOBX8DcMw8NAy2M6/RA4T1/x39r2Y=</latexit> Alignment Score P ( o | b ) P ( b ) P ( o | r ) X Align ( b, r ) = P ( o ) o P ( o ∣ b ) is computed using • b Bounding box visual similarity r Reference • Using Kernel Density o Database object Estimation with a symmetric multivariate Gaussian kernel orange cup plant pot P ( o ∣ r ) is computed similarly • using text similarity with pre- trained embeddings blue ball planet earth

Mask Refinement • Refine each bounding box with a UNet model UNet • Gives a tight object mask Align = 0.7 • Paired with a bounded alignment score to a reference in the text go straight and stop before reaching the planter turn left towards the globe and go forward until just before it

<latexit sha1_base64="uZINEPQvNpHiWOKMysbyWhr8Wg=">ACU3icbVFNaxsxFNRu0tTd1KnbHnsRMQEbitkNKe2l4NJDe3QgtgNeYyRZa6vWxyK9LTXL/scS6KF/pJceWvkjIbEzIBhm5vGkEc2lcBDHv4Pw4PDJ0dPas+j4ef3kRePlq4EzhW8z4w09poSx6XQvA8CJL/OLSeKSj6ki8rf/idWyeMvoJlzseKzLTIBCPgpUnjWwr8BzhWfpJipqsWfYtG3/E0VnqCjUpTdVrUYxTJaYm3avZTbch6LbRJpZwso7i/oUvZ+sVl67mjSacSdeA+TZEuaIvepHGTg0rFNfAJHFulMQ5jEtiQTDJqygtHM8JW5AZH3mqieJuXK47qfCZV6Y4M9YfDXit3p8oiXJuqahPKgJzt+utxMe8UQHZh3EpdF4A12yzKCskBoNXBeOpsJyBXHpCmBX+rpjNiW8I/DdEvoRk98n7ZHDeSd514suLZvfLto4aeoNOUQsl6D3qoq+oh/qIoZ/oD/oXoOBX8DcMw8NAy2M6/RA4T1/x39r2Y=</latexit> Learning P ( o | b ) P ( b ) P ( o | r ) X Align ( b, r ) = UNet P ( o ) o • Region proposal network parameters for bounding box proposal b Bounding box P ( o ∣ b ) • Image similarity measure for r Reference UNet parameters for mask refinement • o Database object • Text similarity uses pre-trained embeddings • Challenge: need large-scale heavily annotated visual data

Augmented Reality Training Data FPV Overlay Composite Mask labels

Augmented Reality Training Data Large-scale generation with ShapeNet objects Learned representations generalize beyond specific objects for: • Region proposal network for bounding boxes P ( o ∣ b ) • Image similarity measure for UNet parameters for mask • refinement Composite Mask labels

Today Few-shot instruction following: • Few-shot language-conditioned object segmentation • Object context mapping • Integration into a visitation-prediction policy for mapping instructions to drone control

Object Context Mapping Goal: create maps that capture object location and the instruction behavior around objects 1. Identify and align object mentions to observations 2. Compute abstract contextual representations for object references 3. Project and aggregate masks over time 4. Combine aggregated masks with contextual representations to create a map

Object Context Mapping Step I: Identify and Align orange • Bounding box proposals from plant cup pot Region Proposal Network blue planet ball • Object references from tagger earth • Align with language- conditioned segmentation and the database • To compute: first-person masks aligned to instruction references

Few-shot Object Reasoning for Robot Instruction Following Yoav - PowerPoint PPT Presentation

Few-shot Object Reasoning for Robot Instruction Following Yoav Artzi Workshop on Spatial Language Understanding EMNLP 2020 Task Navigation between landmarks Agent: quadcopter drone Inputs: poses, raw RGB camera images, and

Robothlon Team competition, each team programs a robot for each event Events Robot

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Infinite Mixture Prototypes for Few-Shot Learning Adaptively inferring model capacity for simple

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Automated Reasoning Course Presentation Summary Automated Reasoning Motivations Course Plan

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

Building New Robots 1 Extending Robot Language Suppose we needed a Robot to patrol the walls

Evidential and Causal Reasoning Much reasoning in AI can be seen as evidential reasoning ,

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due tonight, Homework 2 out soon

Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz Masud Ziko, Jose Dolz, Eric

A Baseline for Few-Shot Image Classification Guneet S. Dhillon 1 , Pratik Chaudhari 2 , Avinash

Concepts with Few-shot Supervision Xuming He ShanghaiTech University

Few Shot Learning for Robot Motion Intelligent Robotics Seminar 06.01.2020 University of Hamburg

Teaching Texture Mapping as a rectangular solid, and the lawn and the sky were Visually created

D3 Tutorial Geographical Map Edit by Jiayi Xu and Han-Wei Shen, The Ohio State University Three

Cartographic Visualization Generally: using InfoVis techniques to stretch cartography to new

Discrete Ion Signature on Energy-Time Spectrogram Maxwellian Fitting for Velocity

Asymptotic Analysis Autumn 2018 Shrirang (Shri) Mare shri@cs.washington.edu Thanks to Kasey

Direct neutrino mass measurements neutrino oscillations evidence m 0 BUT oscillation

RANDOM WALKS ON SOME NONCOMMUTATIVE SPACES Philippe Biane Vietri Sul Mare, 01/09/2009 Classical

G o i n g b e y o n d L o c a l D e n s i t y a n d G r a d i e n