NMNs and strong generalization [ A RDK16a] TRAIN 100 Is there anything left of a circle? 88 90.6 Is there anything above a circle? 75 TEST 76.5 63 Is there anything above and left of a circle? CNN + RNN NMN 50 73
NMNS for other tasks [ A , Rohrbach, Darrell, Klein; 16b] name type coastal island Is Key Largo city no no Columbia an island? river yes no Cooper city yes no Charleston [Hu, Rohrbach, A , Darrell, Saenko; 17] man in sunglasses walking towards [Cirik, Berg-Kirkpatrick, Morency; 18] two men [Yu, Lin, Shen, Yang, Lu, Bansal, Berg; 18] [Suhr, Lewis, Artzi; 17] There is exactly one black triangle not touching any edge. 74
Lessons ↦ Linguistic structure lets us learn composable neural ↦ yes modules from weak supervision. These modules allow us to exists more accurately interpret and new statements, questions and above red references. circle 75
REASONING LANGUAGE & LEARNING BELIEF A , Klein & Levine. Modular Multitask Reinforcement Learning […]. ICML 17.
Learning classifiers yes blue exists color and right above red circle circle Is there a red shape above a circle? What color is the shape right of a circle? 77
Learning behaviors get wood use saw get wood use axe Make planks: Make sticks: get wood, then use a saw. get wood, then use an axe. 78
Learning from intermediate rewards r r [Kearns & Singh 02, Kulkarni et al. 16] 79
Learning from demonstrations [Stolle & Precup 02, Fox & Krishnan et al. 16] 80
Learning from intermediate rewards -has(wood) +has(wood) +has(plank) +at(saw) [e.g. SacerdoE 75, Hauskrecht et al. 98] 81
Learning from sketches get wood use saw Ï 82
Learning from sketches Make planks: get wood use saw 83
Learning from sketches Make sticks: get wood use axe 84
Learning from sketches use saw get wood π 1 π 2 STOP STOP get wood use axe π 1 π 3 STOP 85
Experiments: crafting game 86
Experiments: crafting game 87
Experiments: crafting game Sketches / Modular Instruction following Reward Unsupervised 0 1 2 3 x 10 6 episodes 88
Experiments: crafting game Sketches / Modular Instruction following Reward Unsupervised 0 1 2 3 x 10 6 episodes 89
Experiments: locomotion 90
Experiments: locomotion Sketches / Modular log Reward Instruction following Unsupervised 0 1 2 3 x 10 8 timesteps 91
Fast adaptation What if I don’t get a sketch at test time? ??? 92
Fast adaptation What if I don’t get a sketch at test time? 93
Fast adaptation What if I don’t get a sketch at test time? get iron use axe 94
Fast adaptation What if I don’t get a sketch at test time? get iron use saw 95
Fast adaptation What if I don’t get a sketch at test time? 100 75 76 50 42 25 1 Unsup. / Sketches / Ordinary RL Modular Modular 0 Avg. Reward 96
Lessons We can also learn modular behaviors from ungrounded get iron use axe “sketches” of abstract plans. We can use these modules to use saw help reinforcement learning even get wood when sketches are not available. use axe 97
Beyond “tasks” LOCALIZATION Q&A POLICY SEARCH Man in glasses How many go near the near two men. men? corner 98
Toward a model of everything LANGUAGE LEARNING Man in glasses How many go near the near two men. men? corner 99
REASONING LEARNING LANGUAGE & BELIEF A & Klein. Reasoning about Pragmatics with Neural Listeners and Speakers . EMNLP 16. A , Dr ă gan & Klein. Translating Neuralese. ACL 17. A & Klein. Analogs of Linguistic Structure in Deep Representations . EMNLP 17. Fried, A & Klein. Unified Pragmatic Models for Generating and Following […] . NAACL 18.
Recommend
More recommend