Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - PowerPoint PPT Presentation

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley → Microsoft Semantic Machines → MIT

Linguistic sca fg olds for policy learning Work on language! Jacob Andreas Berkeley → Microsoft Semantic Machines → MIT

What RL can do for language Crafting environment replace the last letter of the word make plank get wood use toolshed drop head make stick get wood use workbench What language make cloth get grass use factory change the final letter to t i make rope get grass use toolshed add a z if the last character is a make bridge get iron get wood every vowel becomes y can do for RL make bed ∗ get wood use toolshed change only the first consonant to make axe ∗ get wood use workbench first & last 3 letters make shears get wood use workbench delete every vowel get gold get iron get wood replace all n s with c get gem get wood use workbench

What RL can do for language Daniel   Ronghang   Volkan   Fried Hu Cirik w/ Anja Rohrbach, L.P. Morency, Taylor Berg-Kirkpatrick, Trevor Darrell and Dan Klein

Generation & understanding Turn right and walk through the kitchen. Go right into the living room and stop by the rug. [Anderson et al. 18]

A reference game [Frank & Goodman 12]

“glasses" [Frank & Goodman 12]

The rational speech acts model 1/2 1/2 L 0 ( . | glasses) 0 1 L 0 ( . | hat) [Frank & Goodman 12]

Pragmatics Q: Do you know what time it is?

Pragmatics Q: Do you know what time it is? A: Yes

Pragmatics Q: Do you know what time it is? A: Yes I find his cooking very interesting. [Grice 70]

RSA game tree speaker hat glasses

RSA game tree: as speaker speaker (listener) +1 hat hat -1 +1 glasses glasses -1

RSA game tree: as listener (speaker) listener ? ? glasses glasses ?

RSA game tree: as listener (speaker) listener ? Language use is gameplay! ? glasses glasses ?

A recipe for pragmatic text generation 1. Train a base listener model smiley glasses   plain hat &   glasses man glasses glasses

A recipe for pragmatic text generation 1. Train a base listener model 2. Train a reasoning speaker to win when   playing with the listener +1 hat hat -1 +1 glasses glasses -1

Application: image captioning 1. Train an image retrieval / gen model a snake is slithering away from Jenny

Application: image captioning 2. Describe images using the listener model   for search at inference time +1 -1 a snake is slithering away -1 the sun is in the sky -1 [A & Klein 16, Vedantam et al. 17]

Application: image captioning 2. Describe images using the listener model   as a training-time reward (“self-play”) -1 the sun is in the sky captioner model retrieval loss [Yu et al. 16, Mao et al. 16]

Descriptive captions [Vedantam et al. 17] s eq2seq captioner: this bird has a yellow breast with a short pointy bill pragmatic captioner: a small yellow bird with black stripes on its body and   black stripe on the wings.

Contrastive captions without contrastive data! (a) (b) (c) Mike is holding a baseball bat. The snake is slithering away from Mike & Jenny. [A & Klein 16]

Application: instruction generation 1. Train a base instruction following model 2. Train an instruction generation model to   get the follower to goal states

Application: instruction generation seq2seq: Walk past the dining room table and chairs and wait there. speaker-listener : Walk past the dining room table and chairs and take a right into the living room. Stop once you are on the rug. human : Turn right and walk through the kitchen. Go right into the living room and stop by the rug. [Fried, Hu, Cirik et al. 18]

Listener mode (a) orange : trajectory human : Go through the door on the right and continue straight. Stop in the next room in front of without pragmatic instruction : the bed. inference top-down Go through the door on overview of the right and continue (b) green : trajectory speaker-listener seq-to-seq trajectories straight. Stop in the next with pragmatic [Fried, Hu, Cirik et al. 18] room in front of the bed. inference

The rules of the game +1 glasses glasses

The rules of the game +1 hat hat

Killer robots [Lewis et al. 17] Bob : i can i i everything else . . . . . . . . . . . . . . Alice : balls have zero to me to me to me to me to me to me to me to me to Bob : you i everything else . . . . . . . . . . . . . . Alice : balls have a ball to me to me to me to me to me to me to me

Problems to work on How do we use tools like self-play and tree   search while remaining within the rules of   natural language? How do we do e ffj cient search in string-   valued action spaces?

Problems to work on How do we use tools like self-play and tree   search while remaining within the rules of natural language? How do we do e ffj cient search in string-   valued action spaces?

What language can do for RL w/ Dan Klein and Sergey Levine

A crafting game make planks make sticks

Learning with sketches get wood get wood use saw use axe

The options framework [Su$on et al. 99]

Unsupervised option learning +r [Bacon & Precup 16]

Learning with intermediate rewards +r +r [Kearns & Singh 02, Kulkarni et al. 16]

Segmenting demonstrations Ï [Stolle & Precup 02, Fox & Krishnan et al. 16]

Learning from sketches get wood use saw Ï [A, Klein & Levine 17]

Modular policies get wood use saw π 1 π 2 get wood use axe π 1 π 3

Modular policies TURN LEFT π 1 get wood

Results: crafting game

Results: crafting game Sketches: modular Sketches: joint Reward Unsupervised 0 1 2 3 x 10 6 episodes

Results: locomotion

Results: locomotion Sketches: modular Reward Sketches: joint Unsupervised 0 1 2 3 x 10 8 Tmesteps

Generalization What if I don’t get a sketch at test Tme? ???

Generalization What if I don’t get a sketch at test Tme? 100 89 75 Unsupervised 76 50 Sketches 47 42 25 0 Training AdaptaTon

Moral A little bit of (structured) language goes a long way!

Beyond structured sketches Language learning Learning from demonstrations emboldens emboldecs itch itctch dogtrot dogtrot first & last 3 letters loneliness locelicess vein ??? [A, Klein & Levine 17]

Beyond structured sketches Language learning Learning from demonstrations emboldens emboldecs itch itctch dogtrot dogtrot first & last 3 letters loneliness locelicess vein ???

Pretraining via language learning f ( · ; η , ) wonderful wonful first & last 3 letters [Branavan et al., 09]

Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness

Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness every vowel becomes i

Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness 128.6 every vowel becomes i

Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness 128.6 every vowel becomes i 52.3 change consonants to c

Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness 128.6 every vowel becomes i 52.3 change consonants to c 8.3 replace n with c

Prediction L ( f ( · ; η , ) , · ) replace n with c

Evaluation L ( f ( · ; η , ) , · ) loonies replace n with c

Evaluation f ( · ; η , ) loonies loocies replace n with c

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - PowerPoint PPT Presentation

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines MIT Linguistic sca fg olds for policy learning Work on language! Jacob Andreas Berkeley Microsoft Semantic Machines MIT What RL can do

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines

5 self 4.5 classmates 4 3.5 3 2.5 2 1.5 1 4-year 5-6 year 6-7 year olds olds olds

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy Jesse Thomason Jivko

RAC Community Education Child and Youth Membership Little Legends Club 5 to 12 year olds

Detection and Localisation of Neural Responses to Linguistic Phenomena using Machine Learning

Learning from memoirs: Classifying dementia using linguistic features extracted from non-clinical

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &

Thinking about children and adolescents use of social media Dr Dawn Watling Department of

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning

ONLINE LINGUISTIC SUPPORT (OLS) Make the most of your experience abroad! OLS: LANGUAGE

Discovery of Linguistic Relations Using Lexical Attraction Deniz Yuret Overview Motivation

Automatic Linguistic Knowledge Acquisition for Web-based Translation and Language Learning

Master EmLex CiTIUS Design and use of linguistic tools Introduction Linguistic Analysis

Cognitive Complexity of Linguistic Patterns Artificial Grammar Learning Workshop Max Planck

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

Online Learning Mechanisms Input for Bayesian Models Abstract internal (specific linguistic

Off-policy methods with approximation Recall off-policy learning involves two policies One

LCS 11: Cognitive Science Linguistic relativity Linguistic relativity GQ # 4.3 discussions

DC COMMUNITY OF PRACTICE ON CULTURAL AND LINGUISTIC COMPETENCE IN DEVELOPMENTAL DISABILITIES

Issue Seatbelt usage rate for 10 14 year olds only 67% Seatbelt usage rate for 10 14 year

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3.

BBM Messaging is the new social media More than 85% of 13-34 year olds use messaging apps every

From unsupervised induction of linguistic structures from text towards applications in deep

Modelling Cognition SE 367 : Cognitive Science Group C Nature of Linguistic Sign Linguistic

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - PowerPoint PPT Presentation

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines MIT Linguistic sca fg olds for policy learning Work on language! Jacob Andreas Berkeley Microsoft Semantic Machines MIT What RL can do

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines

5 self 4.5 classmates 4 3.5 3 2.5 2 1.5 1 4-year 5-6 year 6-7 year olds olds olds

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy Jesse Thomason Jivko

RAC Community Education Child and Youth Membership Little Legends Club 5 to 12 year olds

Detection and Localisation of Neural Responses to Linguistic Phenomena using Machine Learning

Learning from memoirs: Classifying dementia using linguistic features extracted from non-clinical

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &amp;

Thinking about children and adolescents use of social media Dr Dawn Watling Department of

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning

ONLINE LINGUISTIC SUPPORT (OLS) Make the most of your experience abroad! OLS: LANGUAGE

Discovery of Linguistic Relations Using Lexical Attraction Deniz Yuret Overview Motivation

Automatic Linguistic Knowledge Acquisition for Web-based Translation and Language Learning

Master EmLex CiTIUS Design and use of linguistic tools Introduction Linguistic Analysis

Cognitive Complexity of Linguistic Patterns Artificial Grammar Learning Workshop Max Planck

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

Online Learning Mechanisms Input for Bayesian Models Abstract internal (specific linguistic

Off-policy methods with approximation Recall off-policy learning involves two policies One

LCS 11: Cognitive Science Linguistic relativity Linguistic relativity GQ # 4.3 discussions

DC COMMUNITY OF PRACTICE ON CULTURAL AND LINGUISTIC COMPETENCE IN DEVELOPMENTAL DISABILITIES

Issue Seatbelt usage rate for 10 14 year olds only 67% Seatbelt usage rate for 10 14 year

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3.

BBM Messaging is the new social media More than 85% of 13-34 year olds use messaging apps every

From unsupervised induction of linguistic structures from text towards applications in deep

Modelling Cognition SE 367 : Cognitive Science Group C Nature of Linguistic Sign Linguistic

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &