Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu - PowerPoint PPT Presentation

March 2020 Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic Semantics by Playing “I Spy” Tong Gao

Introduction Early work on grounded language learning enabled a • machine to map from adjectives and nouns to objects in a scene But other sensory modalities such as haptic and auditory are • also useful This paper proposes the first robotic system to perform • natural language grounding using multi-modal sensors perception

Robot Configuration Kinova MICO arm • mounted on top of a custom-built mobile base Perception: • – Sensors in each motors – a microphone – Xtion Asus Pro RGBD camera

Robot Actions Grasp • Lift • Hold • Lower • Drop • Push • Press •

Objects in the Dataset 32 common household items • – Cups – Bottles – Cans – … Some contains liquids/other contents • Others are empty •

Sensory Data Gathering • Given a context 𝑑 ∈ 𝐷 , object 𝑗 ∈ 𝑃 , • Performed full sequence of exploratory actions on object 𝑗 for 𝐷 five different times, let the set 𝑌 𝑗 contain all five features vectors.

Visual Context Robot performs look • action which produces: – RGB color histogram, 8 bins per channel – Fast point feature histogram ( fpfh ) shape features – Deep visual features from 16-layer VGG network

Multimodal Context Record haptic and auditory sensory modalities during executing actions; Record proprioceptive information during the grasp action. - Haptics & proprioception : joint efforts & joint positions, recorded for 6 joints at 15 Hz - Audio : Discrete Fourier Transform, 65 frequency bins

“I Spy” Task Human and robot take turns describing objects from among • 4 on a tabletop Participants describe objects using attributes - robot guess • – E.g. “black rectangle” as opposed to “whiteboard eraser” Robot describe a random object with up to 3 predicates - • participants guess

“I Spy” Task Metrics: robot guess and human guess • Compare performance of two playing systems: multi-modal • and vision-only

Predicate Classifier For each language predicate 𝑞 , a classifier 𝑯 𝒒 ∈ [−1,1] is learned to # contexts: 18 decide whether objects possessed 𝑑 1 𝜆 𝑑 1 × 𝑁 𝑑 1 𝑌 𝑗 the attribute 𝑞 : 𝑑 2 𝜆 𝑑 2 × 𝑁 𝑑 2 𝑌 𝑗 𝐻 𝑞 𝑑 ) ∈ [−1,1] , a quadratic-kernel 𝑁 𝑑 (𝑌 𝑗 … SVM 𝜆 𝑑 ∈ [0,1] , Cohen’s Kappa, measuring 𝑑 18 the performance of 𝑁 𝑑 on the ground 𝜆 𝑑 18 × 𝑁 𝑑 18 𝑌 𝑗 truth labels

For example… • With wide & yellow cylinder, we want to determine whether “fat” is applicable to it • 𝐻 𝑔𝑏𝑢 𝑥𝑗𝑒𝑓 − 𝑧𝑓𝑚𝑚𝑝𝑥 − 𝑑𝑧𝑚𝑗𝑜𝑒𝑓𝑠 = 0.137 – Correlated since the sign is positive – With confidence 0.137 = 0.137 • 𝜆 𝑕𝑠𝑏𝑡𝑞,𝑏𝑣𝑒𝑗𝑢𝑝𝑠𝑧 = 0.515 – In 𝐻 𝑔𝑏𝑢 , we are confident on the decision made by classifier 𝑁 𝑕𝑠𝑏𝑡𝑞,𝑏𝑣𝑒𝑗𝑢𝑝𝑠𝑧

Grounded Language Learning – Human Turn Participant pick one of the four objects and describe it in one • phrase Robot • – Strip out stopwords, remaining words are treated as a set 𝐼 𝑞 of language predicates – Compute the sum of scores 𝐻 𝑞 for 𝑞 ∈ 𝐼 𝑞 for each object – Guess objects in descending order by score • Ties are broken randomly – Add positive training example for all 𝑞 ∈ 𝐼 𝑞 after a correct guess

Grounded Language Learning – Robot Turn Robot attempted to describe the object with predicates (not • ambiguously) Denote 𝑃 𝑈 the set of objects on the table during a given • game. For the chosen object 𝑗 ∗ , compute the score 𝑆 for a predicate 𝑞 as: Choose up to 3 highest scoring predicates ෠ 𝑄 •

Grounded Language Learning – Follow up In addition to ෠ 𝑄 , robot also selects 5 − | ෠ 𝑄| additional • predicates, whose confidences are likely to be close to zero Asks participants whether or not 𝑞 can be applied to the • object 𝑗 ∗ – Collect more positive/negative examples.

Experiments 42 human participants • Undergraduate & graduate students + some staff at our • university Divide 32-object dataset into 4 folds • ≥ 10 human participants played “I Spy” on both systems for • each fold. Each participant played 4 games •

Experiments For fold 0, the systems were undifferentiated and so only • one set of 2 games was played by each participant For subsequent folds, the systems were incrementally • trained using labels from previous folds only, such that the systems were always being tested against novel, unseen objects

Quantitative Results – Robot Guess

Quantitative Results – Human Guess Human guesses hovered around 2.5 throughout all levels of • training and sets of objects. Reflection : Confidence 𝜆 can easily reaches 1 with only few • examples – can system perform better if 𝑆 favored predicates trained with many examples?

Quantitative Results – Predicate Agreement Train the predicate classifiers using leave-one-out cross • validation over objects

Qualitative Results – when multi-model helps?

Qualitative Results – Correlations to physical properties • Compute Pearson’s correlation 𝑠 between the decision and the object’s weight, height and width • Vision only – no predicates correlated against these physical object features • Multi-model: – Tall - height (0.521) – Small - width (-0.665) – Water - weight (0.814) – Blue - weight (0.549, spurious)

Critique • Do pre-defined actions really produce informative audios? • To what extend the robot should apply the force, so that sound can be produced while the object is not destroyed? • While some features might be too complicated for SVM, some other might be redundant - Ablation study • Lack of generalization ability to new predicates? – Classify new predicates by their distance to learned predicates in Wordnet or word embeddings? • Qualitative results only cared about the weight, height and width – If that’s all what they want, why not measure these properties with ruler & weight scale? – Should select some physical properties that can only be obtained by multi-modal system.

Thank you!

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu - PowerPoint PPT Presentation

March 2020 Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic Semantics by Playing I Spy Tong Gao Introduction Early work on grounded language learning enabled a machine to map from

The Expressive Power of Backround Modal Dependence Logic Modal logic Team semantics Modal

Mobile Communications Towards 2020 Carlos Caseiro January 2017 Evolution Mobile Networks

OEBB 2019-20 Open Enrollment: Vision Moda Health Vision Moda will continue offering three

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Social Social and Emotional Lear and Emotional Learning ning (SEL) (SEL) Phas Phase e 2:

W HAT IS EHD? Introduction EHD without cross-flow Modal Dielectric fluid Non-modal EHD with

Modal logic Benzm uller/Rojas, 2014 Artificial Intelligence 2 What is Modal Logic?

Why is modal logic decidable Petros Potikas NTUA 9/5/2017 Petros Potikas (NTUA) Modal logic

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

2019 EMPLOYER BENEFIT PROGRAM OVERVIEW 2019 Employee Benefit Review Terminology Moda

Enrollment: High-Deductible Health Plans Moda Health High-deductible health plans (HDHPs)

CMU 15-781 Lecture 21: Multi-Robot Systems Teacher: Gianni A. Di Caro M ULTI -R OBOT S YSTEMS ?

Transportation Advisory Committee February 25, 2014 Pro rogram Spo Sponsor Mult ulti-modal

Learning Lear ning Center Centers s In In-Depth Depth : Building Policy Around

NumClaim: Investor's Fine-grained Claim Detection Chung-Chi Chen 1 , Hen-Hsen Huang 2,3 , Hsin-Hsi

+"!."'!M !"#$%#&'()#((+&#&'+( ! !"#$%#&'(),A,&!-, "

Better than their Reputation? On the Reliability of Relevance Assessments with Students Philipp

Welcome! Todays Agenda: Rendering Overview Matrices Transforms INFOGR

Explicit and Implicit Discourse Relations: An Extrinsic Evaluation Peter Bourgonje and Manfred

A Cascade Model for Proposi1on Extrac1on in Argumenta1on Yohan Jo 1 , Jacky Visser 2 , Chris Reed

Computational Pragmatics Autumn 2015 Raquel Fernndez Institute for Logic, Language &

INTO THE PIPELINE: THE LATEST IN PSYCHOPHARMACOLOGY Learning Objectives Describe the

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu - PowerPoint PPT Presentation

March 2020 Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic Semantics by Playing I Spy Tong Gao Introduction Early work on grounded language learning enabled a machine to map from

The Expressive Power of Backround Modal Dependence Logic Modal logic Team semantics Modal

Mobile Communications Towards 2020 Carlos Caseiro January 2017 Evolution Mobile Networks

OEBB 2019-20 Open Enrollment: Vision Moda Health Vision Moda will continue offering three

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Social Social and Emotional Lear and Emotional Learning ning (SEL) (SEL) Phas Phase e 2:

W HAT IS EHD? Introduction EHD without cross-flow Modal Dielectric fluid Non-modal EHD with

Modal logic Benzm uller/Rojas, 2014 Artificial Intelligence 2 What is Modal Logic?

Why is modal logic decidable Petros Potikas NTUA 9/5/2017 Petros Potikas (NTUA) Modal logic

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

2019 EMPLOYER BENEFIT PROGRAM OVERVIEW 2019 Employee Benefit Review Terminology Moda

Enrollment: High-Deductible Health Plans Moda Health High-deductible health plans (HDHPs)

CMU 15-781 Lecture 21: Multi-Robot Systems Teacher: Gianni A. Di Caro M ULTI -R OBOT S YSTEMS ?

Transportation Advisory Committee February 25, 2014 Pro rogram Spo Sponsor Mult ulti-modal

Learning Lear ning Center Centers s In In-Depth Depth : Building Policy Around

NumClaim: Investor's Fine-grained Claim Detection Chung-Chi Chen 1 , Hen-Hsen Huang 2,3 , Hsin-Hsi

+&quot;!.&quot;'!M !&quot;#$%#&amp;'()*#((+&amp;#&amp;'+( ! !&quot;#$%#&amp;'()*,A,&amp;!-, &quot;

Better than their Reputation? On the Reliability of Relevance Assessments with Students Philipp

Welcome! Todays Agenda: Rendering Overview Matrices Transforms INFOGR

Explicit and Implicit Discourse Relations: An Extrinsic Evaluation Peter Bourgonje and Manfred

A Cascade Model for Proposi1on Extrac1on in Argumenta1on Yohan Jo 1 , Jacky Visser 2 , Chris Reed

Computational Pragmatics Autumn 2015 Raquel Fernndez Institute for Logic, Language &amp;

INTO THE PIPELINE: THE LATEST IN PSYCHOPHARMACOLOGY Learning Objectives Describe the

+"!."'!M !"#$%#&'()#((+&#&'+( ! !"#$%#&'(),A,&!-, "

Computational Pragmatics Autumn 2015 Raquel Fernndez Institute for Logic, Language &