opening the pod bay doors
play

Opening the pod bay doors building intelligent agents that can - PowerPoint PPT Presentation

Opening the pod bay doors building intelligent agents that can interpret, generate and learn from natural language Jacob Andreas, MIT / Microsoft web.mit.edu/jda/www / @jacobandreas Following natural language instructions


  1. Augmented state spaces States S = S e ⨉ S r Actions A = A e ⋃ A r Transitions T : S ⨉ A → S Goal: find a policy S ⨉ X → A

  2. Augmented state spaces: training Training Evaluation max p( action | text , state; θ ) max p( action | text , state; θ ) action max E state | θ R( action | state ) [Branavan et al., ACL ’09]

  3. clear the two long columns, and then the row

  4. Augmented state spaces: better training Training Evaluation max p( action | text , state; θ ) max p( action | text , state; θ ) action max E state | θ R( action | state )

  5. Learning the reading state Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

  6. Learning the reading state Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right

  7. Learning the reading state Key idea: move “reading state” into the hidden state of an RNN. [Mei et al., AAAI ’16]

  8. Learning the reading state Training Evaluation max p( action | text , state; θ ) max p( action | text , state; θ ) action max E state | θ R( action | state )

  9. human : Walk past hall table. Walk into bedroom. Make left at table clock. Wait at bathroom door threshold.

  10. Approach 2: predicting constraints

  11. Actions, goals, constraints Find a table next to a chair. go_forward go_forward turn_left go_forward turn_left

  12. Actions, goals, constraints [Find] [a table] [next to] [a chair]. go_forward go_forward turn_left go_forward turn_left

  13. Actions, goals, constraints [Find] [a table] [next to] [a chair].

  14. Actions, goals, constraints [Find] [a table] [next to] [a chair].

  15. Actions, goals, constraints Key idea: predict constraints rather than action sequences, and let a planner do the rest of the work.

  16. Predicting constraints x3 x3 [Find] [a table] [next to] [a chair]. x6 x1 x2 x5

  17. Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x1? x3? x4?

  18. Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? x5?

  19. Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5?

  20. Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5?

  21. Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 ? ? ?

  22. Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 ? ? ?

  23. Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 obj? rel? obj?

  24. Learning a constraint parser max p( labels | text , graph; θ ) θ x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5? obj? rel? obj?

  25. Inferring constraints max p( labels | text , graph; θ ) labels x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5? obj? rel? obj?

  26. Inferring constraints max p( labels | text , graph; θ ) labels x3 x3 x3 x3 [Put] [the cup] [on] [the table]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5? obj? rel? obj? [Tellex et al., NCAI ’11]

  27. Logical constraint languages max p( constraint | text; θ ) max p( constraint | text; θ ) θ constraint Find a table next to a chair. at( x1 ) table( x1 ) next_to( x1 , x2 ) chair( x2 )

  28. Logical constraint languages max p( constraint | text; θ ) max p( constraint | text; θ ) θ constraint Find a table next to a chair. at( x1 ) table( x1 ) next_to( x1 , x2 ) chair( x2 )

  29. Logical constraint languages ⇢ � (a) chair D" E" X" 1" 2" 3" 4" 5" λ x.chair ( x ) y" ⇢ � (b) hall A" B" 1" 0$ λ x.hall ( x ) 270$ 90$ E" (c) the chair 2" ι x.chair ( x ) 180$ C" (d) you B" E" D" 3" you ⇢ � (e) blue hall B" λ x.hall ( x ) ∧ blue ( x ) C" 4" (f) chair in the intersection ⇢ � E" λ x.chair ( x ) ∧ 5" intersect ( ι y.junction ( y ) , x ) ⇢ � A" B" E" (g) in front of you A" λ x.in front of ( you, x ) ⇢ � [Artzi et al., TACL ’13]

  30. action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... go_forward turn_left turn_left go_forward turn_right Find a table next to a chair. Constraints without logic

  31. Constraints without logic Key idea: use freeform learned potential functions rather than symbolic constraints [Andreas & Klein, EMNLP ’16]

  32. action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! go_forward turn_left turn_left go_forward turn_right Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... Find a table next to a chair. Constraints without logic

  33. action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! go_forward turn_left turn_left go_forward turn_right Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... Find a table next to a chair. Constraints without logic

  34. alignment action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic ∑ f ( plan’, alignment’ | text; θ ) Panoramic Panoramic θ , alignment plan, 
 max f ( plan, alignment | text; θ ) max f ( plan, alignment | text; θ ) go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... Find a table next to a chair. Constraints without logic

  35. Constraints without logic Clear the columns, 
 then the row

  36. Constraints without logic Clear the columns, 
 then the row (no “column”!)

  37. [Janner et al., TACL ’18]

  38. Our toolkit so far

  39. Instruction following Act in complex environments With expressive policies that condition on instructions and observations Track progress over time In the underlying state space or RNN state Plan ahead and reason about outcomes With a symbolic planner or learned cost function

  40. What else can we do?

  41. Application: instruction generation

  42. action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... go_forward turn_left turn_left go_forward turn_right Move into the living room. Go forward then face the sofa. Instruction following

  43. action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... go_forward turn_left turn_left go_forward turn_right Move into the living room. Go forward then face the sofa. Instruction following generation

  44. action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... go_forward turn_left turn_left go_forward turn_right find a sofa Prediction action sequences

  45. Instruction generation Key idea: a good instruction gets readers to their goal with high probability (whatever the training data says!)

  46. Instruction generation Max posterior probability max p( text | plan; θ ) text (“how do people describe this?”)

  47. Instruction generation Max posterior probability max p( text | plan; θ ) text (“how do people describe this?”) min Bayes risk max p( plan | text; θ ) text (“how do I make people do this?”)

  48. Reasoning about outcomes max p( plan | text; θ ) text I will make a turn. Instruction 
 follower

  49. Reasoning about outcomes max p( plan | text; θ ) text I will make a turn. Listener

  50. Reasoning about outcomes max p( plan | text; θ ) text I will go straight through. Listener

  51. Reasoning about outcomes max p( plan | text; θ ) text I will turn left at the brick intersection. Listener [Fried et al., NAACL ’18]

  52. Reasoning about belief I will turn left at the brick intersection. [Frank & Goodman, Trends in Cog. Sci. ’12]

  53. speaker: Walk past the dining room table and chairs and wait there. listener : Walk past the dining room table and chairs and take a right into the living room. Stop once you are on the rug. human : Turn right and walk through the kitchen. Go right into the living room and stop by the rug.

  54. Application: machine teaching

  55. Instructions as sca fg olds for RL

  56. Instructions as parameter-tying schemes

  57. Instructions as parameter tying schemes Environment states S e Reading states S e Environment actions A e Reading actions A e Go forward then face the sofa. instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... Go forward then face the sofa. Low-level Low-level visuomotor space turn left turn left turn left turn left go forward visuomotor space turn left turn left turn left turn left go forward s 1 Go forward then face the sofa. s 3 t u r n _ r i g h t Panoramic Go forward then face the sofa. Panoramic go towards this direction! action space go towards this direction! action space

  58. Instructions as parameter-tying schemes go north, go east, go south go north, go east, go north, … go north, go north, go west

  59. Go north. Go east. Go north. [Andreas et al., ICML ’17]

  60. Learning interactively from corrections

  61. Supervision s 1 s 3 t u r n _ r i g h t go_forward s 4 s 0

  62. Conditioning on the past Push the chair against the wall. go_forward grasp turn_left go_forward release

Recommend


More recommend