[PPT] - COMP 150: Probabilistic Robotics for Human-Robot Interaction PowerPoint Presentation

SLIDE 1

COMP 150: Probabilistic Robotics for Human-Robot Interaction

Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov

SLIDE 2

Language Acquisition

How would you describe this object? It is a small orange spray can My model of the word ‘orange’ has improved!

SLIDE 3

Something fun...

SLIDE 4

Announcements

SLIDE 5

Project Deadlines

Project Presentations: Apr 23 and 25
Final Report + Deliverables: May 10
Deliverables:

– Presentation slides + videos – Final Report (PDF) – Source code (link to github repositories)

SLIDE 6

Presentation Guidelines

Length:

– Individual projects: 5 minutes talk + 2 min for questions – Team projects: 8 minutes talk + 3 min for questions

Practice! Time your presentation when you

practice and use a timer during the actual presentation as well

My advice: find another group and practice to each
ther
Format: Google Slides (so that we don’t have to

switch computers)

SLIDE 7

Language Acquisition

How would you describe this object? It is a small orange spray can My model of the word ‘orange’ has improved!

SLIDE 8

The Turing Test

SLIDE 9

The Turing Test

SLIDE 10

The Turing Test

SLIDE 11

The First ChatBot (~1966)

SLIDE 12

ELIZA

http://psych.fullerton.edu/mbirnbaum/psych101/

Eliza.htm

SLIDE 13

Discussion: what is missing from programs like ELIZA?

SLIDE 14

Natural Language Processing

The study of algorithms and data structures

used to manipulate text and text-like data

Applications in information retrieval, web

search, dialogue agents, text mining, etc.

Traditionally, not concerned with connecting

semantic representations to the real world

SLIDE 15

Example: Computing Parse Trees

SLIDE 16

Example: Document Classification

https://abbyy.technology/_media/en:features:classification- scheme.png

SLIDE 17

Example: Word Embeddings

https://image.slidesharecdn.com/introductiontowordembeddings-160405062343/95/a-simple-introduction-to-word-embeddings-5-638.jpg?cb=1494520542

SLIDE 18

The Symbol Grounding Problem

“How can the semantic interpretation of a formal symbol system be made intrinsic to the system, rather than just parasitic on the meanings in our heads? How can the meanings of the meaningless symbol tokens, manipulated solely on the basis

f their (arbitrary)shapes, be grounded in

anything but other meaningless symbols?”

Steven Hamas, 1990

SLIDE 19

Deb Roy, “Grounding Language in the World: Schema Theory Meets Semiotics” (2005)

SLIDE 20

Circular Definitions

SLIDE 21

Grounding

SLIDE 22

Sensor Projections

SLIDE 23

Sensor Projections

INPUT IMAGE Color Histogram

SLIDE 24

Transformer Projection

SLIDE 25

Transformer Projection

Color Histogram Entropy of Histogram

SLIDE 26

Categorizer

Entropy of Histogram “Multicolored”

SLIDE 27

Action Projector

SLIDE 28

SLIDE 29

SLIDE 30

Schemas for Actions

SLIDE 31

Schemas for Objects

SLIDE 32

Spatial Relations

SLIDE 33

Deb Roy’s Definition of Grounding

“I define grounding as a causal-predictive cycle

by which an agent maintains beliefs about its world.” (p. 8)

“An agent’s basic grounding cycle cannot

require mediation by another agent.” (p. 9)

“An autonomous robot simply cannot afford to

have a human in the loop interpreting sensory data on its behalf.” (p. 9)

SLIDE 34

“Cyclic interactions between robots and their

environment, when well designed, enable a robot to learn, verify, and use world knowledge to pursue goals. I believe we should extend this design philosophy to the domain of language and intentional communication.” (p. 5)

SLIDE 35

“causality alone is not a sufficient basis for

grounding beliefs. Grounding also requires prediction of the future with respect to the agent’s own actions.” (p. 10)

“The problem with ignoring the predictive part of

the grounding cycle has sometimes been called the ”homunculus problem”.”

SLIDE 36

SLIDE 37

Take Home Message

Language should be grounded in terms of the robot’s own perceptual and sensorimotor capabilities

SLIDE 38

Thomason, J., Sinapov, J., Svetlik, M., Stone, P., and Mooney, R. (2016) Learning Multi-Modal Grounded Linguistic Semantics by Playing I, Spy In proceedings of the 2016 International Joint Conference on Artificial Intelligence (IJCAI)

SLIDE 39

39

Motivation: Grounded Language Learning

Robot, fetch me the green empty bottle

SLIDE 40

40

Vision-Based Approached to Word Grounding

SLIDE 41

41

Vision-Based Approached to Word Grounding

SLIDE 42

42

Exploratory Behaviors in our Robot

SLIDE 43

43

Video

SLIDE 44

44

Video

SLIDE 45

45

Video

SLIDE 46

46

Sensorimotor Feature Extraction

Time Joint Efforts (Haptics) . . . . . .

SLIDE 47

47

Sensorimotor Contexts

grasp lift hold lower drop

proprio- ception

push press

haptics

look

audio shape color VGG

SLIDE 48

48

Sensorimotor Contexts

grasp lift hold lower drop

proprio- ception

push press

haptics

look

audio shape color VGG

SLIDE 49

49

Feature Extraction: Color

Color Histogram (4 x 4 x 4 = 64 bins)

Object Segmentation

SLIDE 50

50

Feature Extraction: Shape

3D Object Point Cloud Histogram of Shape Features

SLIDE 51

51

Joint-Torque values for all joints Joint-Torque Features

Feature Extraction: Haptics

SLIDE 52

52

Feature Extraction: Audio

audio spectrogram Spectro-temporal Features

SLIDE 53

53

Feature Extraction: VGG

SLIDE 54

54

Feature Extraction: VGG

SLIDE 55

55

Data from a single exploratory trial

grasp lift hold lower drop

proprio- ception

push press

haptics

look

audio shape color VGG

x 5 per object

SLIDE 56

56

Category Recognition Overview

Category Recognition Models

Sensorimotor Feature Extraction Interaction with Object Category Estimates

. . . Empty? Red? Container?

Sinapov, J., Schenck, C., and Stoytchev, A. (2014). Learning Relational Object Categories Using Behavioral Exploration and Multimodal Perception In the Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA)

SLIDE 57

57

Key Questions

How can the robot learn object-related words from everyday human users? Do human users use non-visual object descriptors when referring to objects?

SLIDE 58

58

Object Exploration Dataset

32 common household and

ffice items

Each object was explored a total of 5 times with 7 different behaviors The robot perceived objects using the visual, auditory, and haptic sensory modalities

Thomason, J., Sinapov, J., Svetlik, M., Stone, P., and Mooney, R. (2016). Learning Multi-Modal Grounded Linguistic Semantics by Playing I, Spy In proceedings of the 2016 International Joint Conference on Artificial Intelligence (IJCAI)

SLIDE 59

59

Our attempt: I-Spy game

SLIDE 60

60

Learning Words via Game-play

Human: “an empty metallic aluminum container”

SLIDE 61

61

Speech Recognition

[https://miro.medium.com/max/3200/1*nLdHrhd5TjqdS4mO7ANPLA.jpeg]

SLIDE 62

62

Semantic Parsing

SLIDE 63

63

Semantic Parsing

SLIDE 64

64

Semantic Parsing

TEXT: MEANING: Go to Alice’s office

SLIDE 65

65

Combinatory Categorical Grammar (CCG) Parser Resources

Tutorial:

https://yoavartzi.com/tutorial/

Code:

https://github.com/lil-lab/spf

SLIDE 66

66

Example Words for an Object

SLIDE 67

67

Learning Words via Game-play

SLIDE 68

68

Learning Words via Game-play

Human: “a tall blue cylindrical container”

SLIDE 69

69

Learning Words via Game-play

Robot: “open half-full container”

SLIDE 70

70

Asking Verification Questions

SLIDE 71

71

Results

SLIDE 72

72

“can” “tall” “half-full” “pink” WORD F-measure improvement as a result of adding non- visual modalities 0.857 0.516 0.463

. . . . . . . .

SLIDE 73

73

Summary of Experiment

The robot learned over 80 words through interactive

game play

The robot's word representations were grounded in

multiple behaviors and sensory modalities

Future Work:

– Active action selection when classifying a new object – Active action selection when learning a new words – Actively seek humans out for help with learning about

bjects

SLIDE 74

74

“Opportunistic” Active Learning

Thomason, J., Padmakumar, A., Sinapov, J., Hart, J., Stone, P., and Mooney, R. (2017) Opportunistic Active Learning for Grounding Natural Language Descriptions In proceedings of the 1st Annual Conference on Robot Learning (CoRL 2017)

SLIDE 75

75

“Opportunistic” Active Learning

Thomason, J., Padmakumar, A., Sinapov, J., Hart, J., Stone, P., and Mooney, R. (2017) Opportunistic Active Learning for Grounding Natural Language Descriptions In proceedings of the 1st Annual Conference on Robot Learning (CoRL 2017)

SLIDE 76

76

What actions should the robot perform when learning a new word?

Baseline: perform all actions on a set of labeled
bjects and estimate which ones work well
But can we do better?

SLIDE 77

77

Sensorimotor Word Embeddings

Sinapov, J., Schenck, C., and Stoytchev, A. (2014). Learning Relational Object Categories Using Behavioral Exploration and Multimodal Perception In the Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA)

SLIDE 78

78

SLIDE 79

79

Sensorimotor Word Embeddings

Sinapov, J., Schenck, C., and Stoytchev, A. (2014). Learning Relational Object Categories Using Behavioral Exploration and Multimodal Perception In the Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA)

SLIDE 80

80

Behavior Scores for Words

SLIDE 81

81

Word Embeddings

Thomason, J., Sinapov, J., Stone, P., and Mooney, R. (2018) Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions In proceedings of the 32nd Conference of the Association for the Advancement of Artificial Intelligence (AAAI)

SLIDE 82

82

Word Embeddings

Thomason, J., Sinapov, J., Stone, P., and Mooney, R. (2018) Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions In proceedings of the 32nd Conference of the Association for the Advancement of Artificial Intelligence (AAAI)

SLIDE 83

83

Word Embeddings

Thomason, J., Sinapov, J., Stone, P., and Mooney, R. (2018) Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions In proceedings of the 32nd Conference of the Association for the Advancement of Artificial Intelligence (AAAI)

SLIDE 84

84

Results

SLIDE 85

85

Results

SLIDE 86

86

Putting it all together...

Thomason, J., Padmakumar, A., Sinapov, J., Walker, N., Jiang, Y., Yedidsion, H., Hart, J., Stone, P., and Mooney, R. (2019) Improving Grounded Natural Language Understanding through Human-Robot Dialog Accepted to the 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019.

SLIDE 87

87

Putting it all together...

Thomason, J., Padmakumar, A., Sinapov, J., Walker, N., Jiang, Y., Yedidsion, H., Hart, J., Stone, P., and Mooney, R. (2019) Improving Grounded Natural Language Understanding through Human-Robot Dialog Accepted to the 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019.

SLIDE 88

88

Additional Reading

Thomason, J., Padmakumar, A., Sinapov, J., Walker, N., Jiang, Y., Yedidsion, H., Hart, J., Stone, P. and Mooney, R.J. (2020) Jointly improving parsing and perception for natural language commands through human-robot dialog Journal of Artificial Intelligence Research 67 (2020)

SLIDE 89

Discussion

What are some of the limitations of these

approaches?

When will they fail?

SLIDE 90

Student Paper Presentation

SLIDE 91

SLIDE 92