linguistic sca fg olds for policy learning
play

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley - PowerPoint PPT Presentation

Linguistic sca fg olds for policy learning Jacob Andreas Berkeley Microsoft Semantic Machines MIT Linguistic sca fg olds for policy learning Work on language! Jacob Andreas Berkeley Microsoft Semantic Machines MIT What RL can do


  1. Linguistic sca fg olds for policy learning Jacob Andreas Berkeley → Microsoft Semantic Machines → MIT

  2. Linguistic sca fg olds for policy learning Work on language! Jacob Andreas Berkeley → Microsoft Semantic Machines → MIT

  3. What RL can do for language Crafting environment replace the last letter of the word make plank get wood use toolshed drop head make stick get wood use workbench What language make cloth get grass use factory change the final letter to t i make rope get grass use toolshed add a z if the last character is a make bridge get iron get wood every vowel becomes y can do for RL make bed ∗ get wood use toolshed change only the first consonant to make axe ∗ get wood use workbench first & last 3 letters make shears get wood use workbench delete every vowel get gold get iron get wood replace all n s with c get gem get wood use workbench

  4. What RL can do for language Daniel 
 Ronghang 
 Volkan 
 Fried Hu Cirik w/ Anja Rohrbach, L.P. Morency, Taylor Berg-Kirkpatrick, Trevor Darrell and Dan Klein

  5. Generation & understanding Turn right and walk through the kitchen. Go right into the living room and stop by the rug. [Anderson et al. 18]

  6. A reference game [Frank & Goodman 12]

  7. “glasses" [Frank & Goodman 12]

  8. “glasses" [Frank & Goodman 12]

  9. “glasses" [Frank & Goodman 12]

  10. “glasses" [Frank & Goodman 12]

  11. The rational speech acts model 1/2 1/2 L 0 ( . | glasses) 0 1 L 0 ( . | hat) [Frank & Goodman 12]

  12. The rational speech acts model 1/2 1/2 L 0 ( . | glasses) 0 1 L 0 ( . | hat) 1 1/3 S 1 ( glasses | . ) ∝ L 0 ( . | glasses) 0 2/3 S 1 ( hat | . ) [Frank & Goodman 12]

  13. The rational speech acts model L 1 ( . | glasses ) ∝ S 1 ( glasses | . ) 3/4 1/4 0 1 L 1 ( . | hat ) 1 1/3 S 1 ( glasses | . ) ∝ L 0 ( . | glasses) 0 2/3 S 1 ( hat | . ) [Frank & Goodman 12]

  14. Pragmatics Q: Do you know what time it is?

  15. Pragmatics Q: Do you know what time it is? A: Yes

  16. Pragmatics Q: Do you know what time it is? A: Yes I find his cooking very interesting. [Grice 70]

  17. RSA game tree speaker hat glasses

  18. RSA game tree: as speaker speaker (listener) +1 hat hat -1 +1 glasses glasses -1

  19. RSA game tree: as speaker speaker (listener) +1 hat hat -1 +1 glasses glasses -1

  20. RSA game tree: as speaker speaker (listener) +1 hat hat -1 +1 glasses glasses -1

  21. RSA game tree: as listener (speaker) listener ? ? glasses glasses ?

  22. RSA game tree: as listener (speaker) listener ? Language use is gameplay! ? glasses glasses ?

  23. A recipe for pragmatic text generation 1. Train a base listener model smiley glasses 
 plain hat & 
 glasses man glasses glasses

  24. A recipe for pragmatic text generation 1. Train a base listener model 2. Train a reasoning speaker to win when 
 playing with the listener +1 hat hat -1 +1 glasses glasses -1

  25. Application: image captioning 1. Train an image retrieval / gen model a snake is slithering away from Jenny

  26. Application: image captioning 2. Describe images using the listener model 
 for search at inference time +1 -1 a snake is slithering away -1 the sun is in the sky -1 [A & Klein 16, Vedantam et al. 17]

  27. Application: image captioning 2. Describe images using the listener model 
 as a training-time reward (“self-play”) -1 the sun is in the sky captioner model retrieval loss [Yu et al. 16, Mao et al. 16]

  28. Descriptive captions [Vedantam et al. 17] s eq2seq captioner: this bird has a yellow breast with a short pointy bill pragmatic captioner: a small yellow bird with black stripes on its body and 
 black stripe on the wings.

  29. Contrastive captions without contrastive data! (a) (b) (c) Mike is holding a baseball bat. The snake is slithering away from Mike & Jenny. [A & Klein 16]

  30. Application: instruction generation 1. Train a base instruction following model 2. Train an instruction generation model to 
 get the follower to goal states

  31. Application: instruction generation seq2seq: Walk past the dining room table and chairs and wait there. speaker-listener : Walk past the dining room table and chairs and take a right into the living room. Stop once you are on the rug. human : Turn right and walk through the kitchen. Go right into the living room and stop by the rug. [Fried, Hu, Cirik et al. 18]

  32. Listener mode (a) orange : trajectory human : Go through the door on the right and continue straight. Stop in the next room in front of without pragmatic instruction : the bed. inference top-down Go through the door on overview of the right and continue (b) green : trajectory speaker-listener seq-to-seq trajectories straight. Stop in the next with pragmatic [Fried, Hu, Cirik et al. 18] room in front of the bed. inference

  33. The rules of the game +1 glasses glasses

  34. The rules of the game +1 hat hat

  35. Killer robots [Lewis et al. 17] Bob : i can i i everything else . . . . . . . . . . . . . . Alice : balls have zero to me to me to me to me to me to me to me to me to Bob : you i everything else . . . . . . . . . . . . . . Alice : balls have a ball to me to me to me to me to me to me to me

  36. Killer robots [Lewis et al. 17] Bob : i can i i everything else . . . . . . . . . . . . . . Alice : balls have zero to me to me to me to me to me to me to me to me to Bob : you i everything else . . . . . . . . . . . . . . Alice : balls have a ball to me to me to me to me to me to me to me

  37. Problems to work on How do we use tools like self-play and tree 
 search while remaining within the rules of 
 natural language? How do we do e ffj cient search in string- 
 valued action spaces?

  38. Problems to work on How do we use tools like self-play and tree 
 search while remaining within the rules of natural language? How do we do e ffj cient search in string- 
 valued action spaces?

  39. What language can do for RL w/ Dan Klein and Sergey Levine

  40. A crafting game make planks make sticks

  41. Learning with sketches get wood get wood use saw use axe

  42. The options framework [Su$on et al. 99]

  43. Unsupervised option learning +r [Bacon & Precup 16]

  44. Learning with intermediate rewards +r +r [Kearns & Singh 02, Kulkarni et al. 16]

  45. Segmenting demonstrations Ï [Stolle & Precup 02, Fox & Krishnan et al. 16]

  46. Learning from sketches get wood use saw Ï [A, Klein & Levine 17]

  47. Modular policies get wood use saw π 1 π 2 get wood use axe π 1 π 3

  48. Modular policies get wood use saw π 1 π 2 get wood use axe π 1 π 3

  49. Modular policies TURN LEFT π 1 get wood

  50. Results: crafting game

  51. Results: crafting game Sketches: modular Sketches: joint Reward Unsupervised 0 1 2 3 x 10 6 episodes

  52. Results: locomotion

  53. Results: locomotion Sketches: modular Reward Sketches: joint Unsupervised 0 1 2 3 x 10 8 Tmesteps

  54. Generalization What if I don’t get a sketch at test Tme? ???

  55. Generalization What if I don’t get a sketch at test Tme? 100 89 75 Unsupervised 76 50 Sketches 47 42 25 0 Training AdaptaTon

  56. Moral A little bit of (structured) language goes a long way!

  57. Beyond structured sketches Language learning Learning from demonstrations emboldens emboldecs itch itctch dogtrot dogtrot first & last 3 letters loneliness locelicess vein ??? [A, Klein & Levine 17]

  58. Beyond structured sketches Language learning Learning from demonstrations emboldens emboldecs itch itctch dogtrot dogtrot first & last 3 letters loneliness locelicess vein ???

  59. Pretraining via language learning f ( · ; η , ) wonderful wonful first & last 3 letters [Branavan et al., 09]

  60. Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness

  61. Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness every vowel becomes i

  62. Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness 128.6 every vowel becomes i

  63. Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness 128.6 every vowel becomes i 52.3 change consonants to c

  64. Concept learning emboldecs emboldens L ( f ( · ; η , ) , · ) veic vein locelicess loneliness 128.6 every vowel becomes i 52.3 change consonants to c 8.3 replace n with c

  65. Prediction L ( f ( · ; η , ) , · ) replace n with c

  66. Evaluation L ( f ( · ; η , ) , · ) loonies replace n with c

  67. Evaluation f ( · ; η , ) loonies loocies replace n with c

Recommend


More recommend