Augmented state spaces States S = S e ⨉ S r Actions A = A e ⋃ A r Transitions T : S ⨉ A → S Goal: find a policy S ⨉ X → A
Augmented state spaces: training Training Evaluation max p( action | text , state; θ ) max p( action | text , state; θ ) action max E state | θ R( action | state ) [Branavan et al., ACL ’09]
clear the two long columns, and then the row
Augmented state spaces: better training Training Evaluation max p( action | text , state; θ ) max p( action | text , state; θ ) action max E state | θ R( action | state )
Learning the reading state Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right
Learning the reading state Move into the living room. Go forward then face the sofa. go_forward turn_left turn_left go_forward turn_right
Learning the reading state Key idea: move “reading state” into the hidden state of an RNN. [Mei et al., AAAI ’16]
Learning the reading state Training Evaluation max p( action | text , state; θ ) max p( action | text , state; θ ) action max E state | θ R( action | state )
human : Walk past hall table. Walk into bedroom. Make left at table clock. Wait at bathroom door threshold.
Approach 2: predicting constraints
Actions, goals, constraints Find a table next to a chair. go_forward go_forward turn_left go_forward turn_left
Actions, goals, constraints [Find] [a table] [next to] [a chair]. go_forward go_forward turn_left go_forward turn_left
Actions, goals, constraints [Find] [a table] [next to] [a chair].
Actions, goals, constraints [Find] [a table] [next to] [a chair].
Actions, goals, constraints Key idea: predict constraints rather than action sequences, and let a planner do the rest of the work.
Predicting constraints x3 x3 [Find] [a table] [next to] [a chair]. x6 x1 x2 x5
Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x1? x3? x4?
Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? x5?
Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5?
Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5?
Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 ? ? ?
Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 ? ? ?
Predicting constraints x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 obj? rel? obj?
Learning a constraint parser max p( labels | text , graph; θ ) θ x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5? obj? rel? obj?
Inferring constraints max p( labels | text , graph; θ ) labels x3 x3 x3 x3 [Find] [a table] [next to] [a chair]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5? obj? rel? obj?
Inferring constraints max p( labels | text , graph; θ ) labels x3 x3 x3 x3 [Put] [the cup] [on] [the table]. x6 x6 x1 x1 x2 x2 x5 x5 x6? adj x5? obj? rel? obj? [Tellex et al., NCAI ’11]
Logical constraint languages max p( constraint | text; θ ) max p( constraint | text; θ ) θ constraint Find a table next to a chair. at( x1 ) table( x1 ) next_to( x1 , x2 ) chair( x2 )
Logical constraint languages max p( constraint | text; θ ) max p( constraint | text; θ ) θ constraint Find a table next to a chair. at( x1 ) table( x1 ) next_to( x1 , x2 ) chair( x2 )
Logical constraint languages ⇢ � (a) chair D" E" X" 1" 2" 3" 4" 5" λ x.chair ( x ) y" ⇢ � (b) hall A" B" 1" 0$ λ x.hall ( x ) 270$ 90$ E" (c) the chair 2" ι x.chair ( x ) 180$ C" (d) you B" E" D" 3" you ⇢ � (e) blue hall B" λ x.hall ( x ) ∧ blue ( x ) C" 4" (f) chair in the intersection ⇢ � E" λ x.chair ( x ) ∧ 5" intersect ( ι y.junction ( y ) , x ) ⇢ � A" B" E" (g) in front of you A" λ x.in front of ( you, x ) ⇢ � [Artzi et al., TACL ’13]
action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... go_forward turn_left turn_left go_forward turn_right Find a table next to a chair. Constraints without logic
Constraints without logic Key idea: use freeform learned potential functions rather than symbolic constraints [Andreas & Klein, EMNLP ’16]
action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! go_forward turn_left turn_left go_forward turn_right Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... Find a table next to a chair. Constraints without logic
action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! go_forward turn_left turn_left go_forward turn_right Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... Find a table next to a chair. Constraints without logic
alignment action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic ∑ f ( plan’, alignment’ | text; θ ) Panoramic Panoramic θ , alignment plan, max f ( plan, alignment | text; θ ) max f ( plan, alignment | text; θ ) go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... Find a table next to a chair. Constraints without logic
Constraints without logic Clear the columns, then the row
Constraints without logic Clear the columns, then the row (no “column”!)
[Janner et al., TACL ’18]
Our toolkit so far
Instruction following Act in complex environments With expressive policies that condition on instructions and observations Track progress over time In the underlying state space or RNN state Plan ahead and reason about outcomes With a symbolic planner or learned cost function
What else can we do?
Application: instruction generation
action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... go_forward turn_left turn_left go_forward turn_right Move into the living room. Go forward then face the sofa. Instruction following
action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... go_forward turn_left turn_left go_forward turn_right Move into the living room. Go forward then face the sofa. Instruction following generation
action space action space action space go towards this direction! go towards this direction! go towards this direction! action space action space go towards this direction! go towards this direction! Panoramic Panoramic Panoramic Panoramic Panoramic go forward turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left turn left go forward turn left go forward turn left turn left go forward turn left turn left turn left turn left go forward turn left turn left visuomotor space visuomotor space visuomotor space visuomotor space visuomotor space Low-level Low-level Low-level Low-level Low-level instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... go_forward turn_left turn_left go_forward turn_right find a sofa Prediction action sequences
Instruction generation Key idea: a good instruction gets readers to their goal with high probability (whatever the training data says!)
Instruction generation Max posterior probability max p( text | plan; θ ) text (“how do people describe this?”)
Instruction generation Max posterior probability max p( text | plan; θ ) text (“how do people describe this?”) min Bayes risk max p( plan | text; θ ) text (“how do I make people do this?”)
Reasoning about outcomes max p( plan | text; θ ) text I will make a turn. Instruction follower
Reasoning about outcomes max p( plan | text; θ ) text I will make a turn. Listener
Reasoning about outcomes max p( plan | text; θ ) text I will go straight through. Listener
Reasoning about outcomes max p( plan | text; θ ) text I will turn left at the brick intersection. Listener [Fried et al., NAACL ’18]
Reasoning about belief I will turn left at the brick intersection. [Frank & Goodman, Trends in Cog. Sci. ’12]
speaker: Walk past the dining room table and chairs and wait there. listener : Walk past the dining room table and chairs and take a right into the living room. Stop once you are on the rug. human : Turn right and walk through the kitchen. Go right into the living room and stop by the rug.
Application: machine teaching
Instructions as sca fg olds for RL
Instructions as parameter-tying schemes
Instructions as parameter tying schemes Environment states S e Reading states S e Environment actions A e Reading actions A e Go forward then face the sofa. instruction: … Turn left and go towards the sofa ... instruction: … Turn left and go towards the sofa ... Go forward then face the sofa. Low-level Low-level visuomotor space turn left turn left turn left turn left go forward visuomotor space turn left turn left turn left turn left go forward s 1 Go forward then face the sofa. s 3 t u r n _ r i g h t Panoramic Go forward then face the sofa. Panoramic go towards this direction! action space go towards this direction! action space
Instructions as parameter-tying schemes go north, go east, go south go north, go east, go north, … go north, go north, go west
Go north. Go east. Go north. [Andreas et al., ICML ’17]
Learning interactively from corrections
Supervision s 1 s 3 t u r n _ r i g h t go_forward s 4 s 0
Conditioning on the past Push the chair against the wall. go_forward grasp turn_left go_forward release
Recommend
More recommend