Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - PowerPoint PPT Presentation

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley → Microsoft Semantic Machines → MIT

Linguistic sca fg olds for policy learning (what can language do for RL?) Jacob Andreas Berkeley → Microsoft Semantic Machines → MIT

An NLPer’s view of RL ( , R )

An NLPer’s view of RL ( , R ) memorize 1 reward fn

An NLPer’s view of RL ( , R 1 ) ( , R ) ( , R 2 ) memorize k reward fns [e.g. Taylor & Stone 09]

An NLPer’s view of RL ( , R 1 ) ( , R ) Learn to accomplish ( , R 2 ) new goals! ( , R 1 ) (-2, 3) ( , R 1 ) (-2, -2) [e.g. Schaul et al. 15]

An NLPer’s view of RL ( , R 1 ) ( , R ) Learn to follow   ( , R 2 ) instructions! ( , R 1 ) ( , R 1 ) (-2, 3) run northwest ( , R 1 ) ( , R 1 ) (-2, -2) go southwest

Instructions as observations ( , R 1 ) ( , R ) ( , R 2 ) ( , R 1 ) ( , R 1 ) (-2, 3) run northwest ( , R 1 ) ( , R 1 ) (-2, -2) go southwest

Beyond observations (1) Instructions are moves in a game, not observations of an environment. ( , R 1 ) ( , R 1 ) (-2, 3) run northwest ( , R 1 ) ( , R 1 ) (-2, -2) go southwest

Beyond goals (2) There’s more to language learning   than instruction following! ( , R 1 ) ( , R 1 ) run northwest ??? ( , R 1 ) ( , R 1 ) not so fast go southwest

Language use as gameplay

Generation & understanding Turn right and walk through the kitchen. Go right into the living room and stop by the rug. [Anderson et al. 18]

A reference game [Frank & Goodman 12]

“glasses" [Frank & Goodman 12]

The rational speech acts model 1/2 1/2 L 0 ( . | glasses) 0 1 L 0 ( . | hat) [Frank & Goodman 12, Degen 13]

Pragmatics Q: Do you know what time it is?

Pragmatics Q: Do you know what time it is? A: Yes

Pragmatics Q: Do you know what time it is? A: Yes I find his cooking very interesting. [Grice 70]

RSA game tree speaker hat glasses

RSA game tree: as speaker speaker listener +1 hat hat -1 +1 glasses glasses -1

RSA game tree: as listener speaker listener ? ? glasses glasses ?

A recipe for pragmatic language understanding hat &   glasses 1. Train a base speaker model guy with   glasses   hat man smiley glasses   plain hat &   glasses man glasses

A recipe for pragmatic language understanding 1. Train a base speaker model 2. Solve this POMDP: +1 hat hat -1 Ronghang   Volkan   Daniel   Hu Cirik Fried +1 glasses glasses Speaker—follower models for vision- -1 and-language navigation. NeurIPS 18.

Application: instruction following (a) orange : trajectory human : Go through the door on baseline policy the right and continue straight. Stop in the next room in front of without pragmatic instruction : Reasoning the bed. inference top-down Go through the door on overview of the right and continue (b) green : trajectory trajectories straight. Stop in the next with pragmatic room in front of the bed. inference

Application: instruction generation seq2seq: Walk past the dining room table and chairs and wait there. reasoning : Walk past the dining room table and chairs and take a right into the living room. Stop once you are on the rug. human : Turn right and walk through the kitchen. Go right into the living room and stop by the rug.

Lesson Utterances are chosen to facilitate   correct interpretation in context. (This makes the learning problem easier!)

Language as a sca fg old   for learning

What else is an instruction follower good for? Language learning Reinforcement learning go east of the heart Learning with latent language.   A, Klein & Levine. NAACL 18.

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit> Pretraining via language learning f ( · ; η , ) NORTH π go east of the heart [Branavan et al., 09]

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit> <latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit> (Standard) reinforcement learning L ( f ( · ; η , ) , · ) ??? R π

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit> <latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit> Concept learning L ( f ( · ; η , ) , · ) NORTH,… R π find the horse

<latexit sha1_base64="MfoOZUbGzRkaB76umvTEWj+CN8=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0Et4v1xq+4cZJV4OalAjka/NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQlYzrErqWSRqj9bH7qlJxZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULFojAVxMRk9jcZcIXMiIklClubyVsRBVlxqZTsiF4y+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnzmGP3A+fwBRC43R</latexit> <latexit sha1_base64="cVRUNBy/RTcU6LUbsjbBwonoaeo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx7ByCOBDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJYPZpygH9GB5CFn1Fipft8rltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+AK3vjNo=</latexit> Concept learning L ( f ( · ; η , ) , · ) NORTH,… R π -0.52 find the horse

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - PowerPoint PPT Presentation

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines MIT Linguistic sca fg olds for policy learning (what can language do for RL?) Jacob Andreas Berkeley Microsoft Semantic Machines MIT An

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines

SCA Based SCA Based SCA Based SCA Based Wideband Networking Wideband Networking Waveforms

Taking the SCA to New Taking the SCA to New Frontiers Frontiers Steve Bernier & Claude

5 self 4.5 classmates 4 3.5 3 2.5 2 1.5 1 4-year 5-6 year 6-7 year olds olds olds

The SCA: Myths vs Reality Is the SCA what you think it is? Steve Bernier Researcher, Project

Introd u ction IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT 2 Rick Sca v e a

TOWARDS AN ONTOLOGY FOR SCA APIS Durga Suresh and Mieczyslaw Kokar Northeastern University

SCA Tools 2.1.0 (Helios) Release Review Planned Review Date: June 11, 2010 Communication Channel:

RAC Community Education Child and Youth Membership Little Legends Club 5 to 12 year olds

Thinking about children and adolescents use of social media Dr Dawn Watling Department of

Evolution of the SCA Past, Present, and Future Presented by: Steve Bernier, M.Sc. Research

Mapping the SCA to Embedded Platforms Using the SCA with DSPs and FPGAs Steve Bernier Project

Sovereign SCA Based Sovereign SCA Based Waveform Development Waveform Development p Mark

Components into Heterogeneous SCA Platforms Using the SCA with DSPs and FPGAs Steve Bernier

Software Communications Architecture (SCA) and Rapid Application Development Presented by:

SCA Interim Report 1 January 30 June 2010 Interim Report Q2 2010 Q2 2010 vs Q2 2009 SCA

Spin glasses, concepts and analysis of susceptibility P.C.W. Holdsworth Ecole Normale Suprieure

OEBB 2019-20 Open Enrollment: Vision Moda Health Vision Moda will continue offering three

Disclosures Genetics for the general Chairman DSMB Sanofi gene ophthalmologist therapy

Optical illusions and their influence on machine vision Paula Cr ciun, AYIN team You dont

Adversarially Robust Optimization with Gaussian Processes Ilija Bogunovic, Jonathan Scarlett,

(Unifying?) rheology of soft glasses and jammed solids Ludovic Berthier Laboratoire Charles

dnstap: introduction and status update Robert Edmonds (edmonds@fsi.io) Farsight Security, Inc.

Designing deep architectures for Visual Question Answering Matthieu Cord Sorbonne University

Sambuz

Useful Links

Newsletter

Mail Us

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - PowerPoint PPT Presentation

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines MIT Linguistic sca fg olds for policy learning (what can language do for RL?) Jacob Andreas Berkeley Microsoft Semantic Machines MIT An

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines

SCA Based SCA Based SCA Based SCA Based Wideband Networking Wideband Networking Waveforms

Taking the SCA to New Taking the SCA to New Frontiers Frontiers Steve Bernier &amp; Claude

5 self 4.5 classmates 4 3.5 3 2.5 2 1.5 1 4-year 5-6 year 6-7 year olds olds olds

The SCA: Myths vs Reality Is the SCA what you think it is? Steve Bernier Researcher, Project

Introd u ction IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT 2 Rick Sca v e a

TOWARDS AN ONTOLOGY FOR SCA APIS Durga Suresh and Mieczyslaw Kokar Northeastern University

SCA Tools 2.1.0 (Helios) Release Review Planned Review Date: June 11, 2010 Communication Channel:

RAC Community Education Child and Youth Membership Little Legends Club 5 to 12 year olds

Thinking about children and adolescents use of social media Dr Dawn Watling Department of

Evolution of the SCA Past, Present, and Future Presented by: Steve Bernier, M.Sc. Research

Mapping the SCA to Embedded Platforms Using the SCA with DSPs and FPGAs Steve Bernier Project

Sovereign SCA Based Sovereign SCA Based Waveform Development Waveform Development p Mark

Components into Heterogeneous SCA Platforms Using the SCA with DSPs and FPGAs Steve Bernier

Software Communications Architecture (SCA) and Rapid Application Development Presented by:

SCA Interim Report 1 January 30 June 2010 Interim Report Q2 2010 Q2 2010 vs Q2 2009 SCA

Spin glasses, concepts and analysis of susceptibility P.C.W. Holdsworth Ecole Normale Suprieure

OEBB 2019-20 Open Enrollment: Vision Moda Health Vision Moda will continue offering three

Disclosures Genetics for the general Chairman DSMB Sanofi gene ophthalmologist therapy

Optical illusions and their influence on machine vision Paula Cr ciun, AYIN team You dont

Adversarially Robust Optimization with Gaussian Processes Ilija Bogunovic, Jonathan Scarlett,

(Unifying?) rheology of soft glasses and jammed solids Ludovic Berthier Laboratoire Charles

dnstap: introduction and status update Robert Edmonds (edmonds@fsi.io) Farsight Security, Inc.

Designing deep architectures for Visual Question Answering Matthieu Cord Sorbonne University

Sambuz

Useful Links

Newsletter

Mail Us

Taking the SCA to New Taking the SCA to New Frontiers Frontiers Steve Bernier & Claude