Walk the Talk: Connecting Language, Knowledge, and Action in Route - PowerPoint PPT Presentation

Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions Clara Cannon ccannon@cs.utexas.edu

AGENDA INTRODUCTION ● MACRO ARCHITECTURE ● Modeling Route Instructions ○ Representing Expected Views and Actions ○ Interleaving Action, Perception, and Modeling ○ Robustness to Errors and Ambiguities ○ Inferring Actions Implicit in Instructions ○ EVALUATION ● Implicit Action Inference Experiment ○ CONCLUSION ● DISCUSSION ●

Introduction Senario: A host provided instructions for you to find an office in a building you have never visited. Most likely, the instruction set will be incomplete and you will have to infer certain steps in order to arrive at the correct destination. You are using your knowledge of language, understanding of spatial actions (i.e. “turn so that your back is facing the pink wall”), and a model of the environment to resolve ambiguities.

What is Marco? An agent that follows free-form natural language route ● instructions Represents and executes a sequence of compound action ● specifications Infers implicit actions ● Can perform, implicit, explicit, and exploratory actions in ● its environment Manually built and hand tuned ●

MARCO Architecture 6 Modules: 1. Syntax Parser 2. Content Framer Linguistic Grounding 3. Instruction Modeler 4. Executor 5. Robot Controller Spatial Grounding 6. View Description Matcher

MACRO Architecture An example of a parse tree, shows the transformation of text into an ● imperative model Syntax Parser : models surface structure of a word/statement ● Content Framer : interprets surface meaning of a word/statement ● Instruction Modeler: applies spatial and linguistic knowledge to ● combine information across phrases and sentences Executor : interleaves action and perceptions; acts to gain ● knowledge of environment and execute instructions in the context of the spatial model Robot Controller : interface for the particular follower's motor and ● sensory functions View Description Matcher : checks symbolic view descriptions ● against sensory observations and world models (expected model against world model)

Modeling Route Instructions Syntax parser parses raw route instruction text ● The content framer translates the surface structure of a ● word/statement to a model of the surface meaning as a nested attribute value matrix The content frame models the nested structure and sense of a ● word/statement by deleting punctuation, arbitrary text ordering, inflectional suffixes, and spelling variations Syntax parser uses probabilistic context-free grammar which ● directly models verb-argument structure Content framer gets word sense from WordNet ●

Modeling Route Instructions Instruction modeler translates the content frame’s ● representation of the surface meaning of an instruction element to an imperative model containing compound action specifications Infers the model by applying knowledge of verbs and ● prepositions in route instructions and knowledge of how perception and action depend on local spatial configuration in similar environments

Representing Expected Views and Actions View description represents what the follower expects at ● a pose/orientation in the environment, given descriptions in the instructions Instructions tend to describe some distinctive attribute of some ● scenes along the route For each expected object, it models the object’s type, ● location within the relative view of observer, and description of appearance and attributes

Representing Expected Views and Actions Route instructions require at least four low level simple ● actions TURN: changes an agent’s pose/orientation but preserves its ○ location TRAVEL: changes an agent’s location but preserves its pose ○ VERIFY: checks an observation against a description of an expected ○ view DECLARE-GOAL: terminated instruction following by assertion ○ that the agent is at the desired destination

Representing Expected Views and Actions Compound action specifications capture the commands ● in route instructions by modeling which simple actions to take under which perceptual or cognitive conditions Each clause is interpreted as a compound action ● specification Adverbs, verb objects, and prepositional phrases ● translate to pre-conditions, while-conditions, and post conditions

Interleaving Action, Perception, and Modeling The executor sequences simple actions given the environment ● context and state of the following route instruction Executes each compound action specification before moving to ● the next The robot controller executes simple actions ● The view description matcher checks symbolic view descriptors ● against sensory observations. It treats view description as constraints the observation stream must meet. Defers handling ambiguity until the environment can provide ● enough context to disambiguate

Robustness to Errors and Ambiguities When MARCO does not know a word, it searches for its ● nearsted know synonym or abstract hypernym using WordNet If the content framer encounters a constituent it cannot model, ● the constituent is ignored and the remaining clause is modeled Similar strategy applies to the parser. If the parser cannot parse ● a sentence from a set of instructions, it will parse the others. Argument for techniques working: ● 1. Route instructions contain redundant information 2. Essential information in route instructions is stated using a small variety of content frames for direction movements

Inferring Actions Implicit in Instructions Implicit actions are inferred using linguistic and spatial ● knowledge Ex: “Go down the hall to the chair” ● The language model interprets the phrase structure as ● along and until parameters of a TRAVEL action MACRO infers conditions of the TRAVEL action as ● Pre: the path should be immediately in front and the chair should be ○ in the front in the distance Post: The chair will be local to the agent ○

Inferring Actions Implicit in Instructions If the pre and post condition are not met, the executor ● may take exploratory actions to gain information or determine the location of a reference object The figure shows how an instruction is applied to ● navigate with different maps and starting poses In some scenarios, the agent must perform an ● exploratory action in order to orient itself with the environment according to the instruction

Evaluation Evaluated MACRO in 3 environments with corpus of ● route instructions written by 6 human directors Corpus consists of 786 route instruction texts (682 ● with omissions) 36 human subjects followed route instructors to ● establish baseline Used desktop virtual reality environment ● Text route instructions were ranked 1-6 (1: vague, ● hard to follow, 6: detailed, easy to follow)

Evaluation Benefits of using VR environment: 1. All route directors had similar exposure to environment 2. Pertinent aspects of environment were known and repeatable 3. Directors learn environment through first person perspective as followers 4. MARCO can navigate same environment as people

Evaluation For testing, gave the agent route instructions via text ● from a starting location Success was determined by whether or not the agent ● reached the desired destination Did not account for speed of arrival ● No explanation for what happens when an agent gets ● lost No account of what happens when an agent makes an ● incorrect inference (i.e. incorrectly identifying the color blue)

Implicit Action Inference Experiment Results for 5 types of follower: (1) human ● participants, (2) full MARCO model, (3) MARCO w/o TURN inference, (4), MARCO w/o TRAVEL inference, (5) MARCO w/o TURN or TRAVEL inference Humans were able to find the destination with an ● overall mean success rate of 69% MARCO successfully followed 61% of route ● instructions TURN inference is more valuable than TRAVEL ● inference

Conclusion Authors claim this experiment is more easily and less ● expensively replicated than similar works This paper uses knowledge of language and space to infer ● implicit actions when following natural language route instructions Contributes an assessment of human performance for ● communicating route information through unfamiliar large scale spaces Future work includes: replacing executor algorithm with a full ● action sequencer or an algorithm reasoning on inferred route topology, generalizing methods to larger domains (i.e. cooking, first aid, etc), using similar evaluation techniques for other large scale instructional tasks

Discussion!

Walk the Talk: Connecting Language, Knowledge, and Action in Route - PowerPoint PPT Presentation

Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions Clara Cannon ccannon@cs.utexas.edu AGENDA INTRODUCTION MACRO ARCHITECTURE Modeling Route Instructions Representing Expected Views and Actions

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

Onelight.com Training Series Connecting the Pyramids and the Crystal Cities the ISIS Walk 2 The

Sin: to miss the mark. Walk the talk. Sin: to miss the mark. Walk the talk. The Mark: the

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Southeast Cooler Corporation Southeast Cooler Corporation Walk Walk- -In Cooler In Cooler

Stop Walk Talk Stop Walk Talk Focuses on the expected behavior Reinforces standing Up Takes

Connecting the Dots: Connecting the Dots: Black Lives Matter, Connecting the Dots: COVID-19,

CONNECTING THE DOTS Current Events Forecast Revenue Estimate 1 11/17/2016 CONNECTING THE

Connecting Families in Bath & North East Somerset Paula Bromley Connecting Families Manager

Autumn @ Connecting with God Connecting with God What are you ashamed of .. Family

John Finley Walk ADA Ramps Located at East 82nd and East 83rd Streets, John Finley Walk Borough

Be Inspired. Get Connected. Walk MS. Be Inspired. Get Connected. Walk MS. OBJECTIVES

Roslindale Village Walk Assessment Walk Assessment Introduce all participants Discuss basics of

Chalk Talk, Index Cards & Kahoot Chalk Talk Walk around the room and respond to the

Immersive Wikipedia Summer Semester 2020 Ephraim Schott, Pauline Bimberg, Alexander Kulik

Interacting with a 3D World Sung-Eui Yoon ( ) Course URL:

Mental Models CMSC 691R - Human-Robot Interaction March 7th, 2019 Luke Richards What is a

ML2VR Providing MATLAB Users an Easy Transition to Virtual Reality and Immersive Interactivity

Artificial Intelligence Janyl Jumadinova January 1315, 2020 Janyl Jumadinova Artificial

Introduction to Virtual Environments Simon Julier, William Steptoe Department of Computer Science

Lecture Series - MSG 141 Technical Architecture and

Communicating Robot Motion Intent with Augmented Reality Yiyao Wei University of Hamburg

Sambuz

Useful Links

Newsletter

Mail Us

Walk the Talk: Connecting Language, Knowledge, and Action in Route - PowerPoint PPT Presentation

Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions Clara Cannon ccannon@cs.utexas.edu AGENDA INTRODUCTION MACRO ARCHITECTURE Modeling Route Instructions Representing Expected Views and Actions

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

Onelight.com Training Series Connecting the Pyramids and the Crystal Cities the ISIS Walk 2 The

Sin: to miss the mark. Walk the talk. Sin: to miss the mark. Walk the talk. The Mark: the

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Southeast Cooler Corporation Southeast Cooler Corporation Walk Walk- -In Cooler In Cooler

Stop Walk Talk Stop Walk Talk Focuses on the expected behavior Reinforces standing Up Takes

Connecting the Dots: Connecting the Dots: Black Lives Matter, Connecting the Dots: COVID-19,

CONNECTING THE DOTS Current Events Forecast Revenue Estimate 1 11/17/2016 CONNECTING THE

Connecting Families in Bath &amp; North East Somerset Paula Bromley Connecting Families Manager

Autumn @ Connecting with God Connecting with God What are you ashamed of .. Family

John Finley Walk ADA Ramps Located at East 82nd and East 83rd Streets, John Finley Walk Borough

Be Inspired. Get Connected. Walk MS. Be Inspired. Get Connected. Walk MS. OBJECTIVES

Roslindale Village Walk Assessment Walk Assessment Introduce all participants Discuss basics of

Chalk Talk, Index Cards &amp; Kahoot Chalk Talk Walk around the room and respond to the

Immersive Wikipedia Summer Semester 2020 Ephraim Schott, Pauline Bimberg, Alexander Kulik

Interacting with a 3D World Sung-Eui Yoon ( ) Course URL:

Mental Models CMSC 691R - Human-Robot Interaction March 7th, 2019 Luke Richards What is a

ML2VR Providing MATLAB Users an Easy Transition to Virtual Reality and Immersive Interactivity

Artificial Intelligence Janyl Jumadinova January 1315, 2020 Janyl Jumadinova Artificial

Introduction to Virtual Environments Simon Julier, William Steptoe Department of Computer Science

Lecture Series - MSG 141 Technical Architecture and

Communicating Robot Motion Intent with Augmented Reality Yiyao Wei University of Hamburg

Sambuz

Useful Links

Newsletter

Mail Us

Connecting Families in Bath & North East Somerset Paula Bromley Connecting Families Manager

Chalk Talk, Index Cards & Kahoot Chalk Talk Walk around the room and respond to the