• • • • • •
Image source: octomap.github.io Image source: pirobot.org/blog/0015/
• Map from first-person images to actions • Need to learn how to reason about changing observations
• Add explicit Camera Projection and Differentiable Mapping • Reason about the instruction on a static map • Automatically handle changing first-person observations
Each pixel in the feature encodes an image neighbourhood Feature Map Input Image
Feature Map Projected (Image Plane in Camera Frame) Features (Map Frame)
Semantic Map (time ) Semantic Map (time ) Projected features (time )
Recognized airplane Inferred goal location 1x1 Filter 9x9 Filter Grounding Map Semantic Map Goal Map go to the left LSTM side of plane
• Output the velocity command , given Grounding and Goal maps • Sent to quadcopter’s flight - controller Grounding Map Goal Map Perceptron Yaw rate Forward velocity
Modified variant of DAgger Trade convergence guarantees for speed and memory efficiency Instruction Image Ground truth trajectory Agent Oracle Action Ground truth action
3500 Instructions + Environments Ground-truth trajectories 63 Landmarks 252 Possible Tasks T otal number of rollouts: 3500 oracle 2000 policy Go to right side of mushroom
87.87 100 83.47 80 60 28.67 40 20 0 GSMN NN with no Oracle (Ours) Mapping Outperform standard NN with no mapping Very close to oracle performance
Feature Mapping Extraction Image Features Semantic Map Instruction Embedding Go o to to th the e left left sid side e of of LSTM 1x1 Filter plan plane Action 9x9 MLP Filter Goal Map Grounding Map
Recommend
More recommend