neural map
play

Neural Map Structured Memory for Deep RL Emilio Parisotto - PowerPoint PPT Presentation

Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University Supervised Learning Deep Most deep learning problems are posed as supervised Neural


  1. Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University

  2. Supervised Learning Deep • Most deep learning problems are posed as supervised Neural learning problems. Net • The model is trained to map from an input to an action: • Describe what is in an image. Observation Action • Environment is typically static: • It does not change over time. • Actions are assumed to be independent of another: • E.g. labelling one image does not affect the next one.

  3. Environments are not always well-behaved • Environments are dynamic and change over time: • An autonomous agent has to handle new environments. • Actions can affect the environment with arbitrary time lags: • Buying a stock  years in the future can lose all money • Labels can be expensive/difficult to obtain: • Optimal actuations of a swimming octopus robot.

  4. Reinforcement Learning: Closing the Loop Action • Instead of a label, the agent is provided with a reward signal: • High reward == good behaviour Reward • Reinforcement Learning produces policies: • Behaviors that Map observations to actions • Maximize long-term reward Observation / State • Allows learning purposeful behaviours in dynamic environments.

  5. Deep Reinforcement Learning Action • Use a deep network to do parameterize the policy Deep • Adapt parameters to Reward Neural maximize reward using: Net • Q-learning (Mnih et al., 2013) • Actor-Critic (Mnih et al., 2016) • How well does it work? Observation / State

  6. Deep Reinforcement Learning Chaplot, Lample, AAAI 2017

  7. Current Memory for RL agents Action • Deep RL does extremely well on reactive tasks. • But typically has an effective memory horizon of less than 1 second. Deep Reward Neural • Almost all interesting problems are Net partially observable: • 3D games (with long-term objectives) • Self-driving cars (partial occlusion) • Memory structures will be crucial to scale up to partially-observable tasks. Observation / State

  8. External Memory? Learned • Current memory structures are External Action usually simple: Memory • Add an LSTM layer to the network • Can we learn an agent with a more Deep Reward advanced external memory? Neural • Neural Turing Machines Net (Graves et al, 2014.) • Differentiable Neural Computers (Graves et al, 2016.) Observation / State • Challenge: memory systems are difficult to train, especially using RL.

  9. Why Memory is Challenging: Write Operations Suppose an agent is in a simple maze: • Agent starts at top of map. • An agent is shown a color near its initial state. • This color determines what the correct goal is.

  10. Why Memory is Challenging: Write Operations Suppose an agent is in a simple maze: • Agent starts at top of map. • An agent is shown a color near its initial state. • This color determines what the correct goal is.

  11. Why Memory is Challenging: Write Operations Suppose an agent is in a simple maze: • Agent starts at top of map. • An agent is shown a color near its initial state. • This color determines what the correct goal is.

  12. Why Writing to Memory is Challenging At the start, no a priori knowledge to store color into memory. Needs the following to hold: 1. Write color to memory at start of maze. 2. Never overwrite memory of color over ‘T’ time steps. 3. Find and enter the goal. All conditions must hold or else episode is useless. • Provides little new information to the agent. Solution: Write everything into memory!

  13. Memory Network • A class of structures that were recently shown to learn difficult maze- based memory tasks • These systems just store (key,value) representations for the M last frames • At each time step, they: 1. Perform a read operation over their memory database. 2. Write the latest percept into memory. Oh et al., 2016

  14. Memory Network Difficulties Easy to learn  never need to guess on what to store in memory. • Just store as much as possible! But can be inefficient: • We need M > time horizon of the task (can’t know this a -priori). • We might store a lot of useless/redundant data • Time/space requirements increase with M

  15. Neural Map (Location-Aware Memory) • Writeable memory with a specific inductive bias: • We structure the memory into a WxW grid of K-dim cells. • For every (x,y) in the environment, we write to ( x’,y’) in the WxW grid.

  16. 𝑿 𝑿 𝑳 𝑁 𝑢

  17. 𝑿 𝑿 𝑳 𝑁 𝑢

  18. 𝑿 𝑿 𝑳 𝑁 𝑢

  19. 𝑿 𝑿 𝑳 𝑁 𝑢

  20. 𝑿 𝑿 𝑳 𝑁 𝑢

  21. 𝑿 𝑿 𝑳 𝑁 𝑢

  22. 𝑿 𝑿 𝑳 𝑁 𝑢

  23. 𝑿 𝑿 𝑳 𝑁 𝑢

  24. Neural Map (Location-Aware Memory) • Writeable memory with a specific inductive bias: • We structure the memory into a WxW grid of K-dim cells. • For every (x,y) in the environment, we write to ( x’,y’) in the WxW grid. • Acts as a map that the agent fills out as it explores. • Sparse Write: • Inductive bias prevents the agent from overwriting its memory too often. • Allowing easier credit assignment over time.

  25. Neural Map: Operations • Two read operations: • Global summarization • Context-based retrieval • Sparse write only to agent position. • Both read and write vectors are used to compute policy.

  26. Neural Map: Global Read 𝑠 𝑢 • Reads from the entire neural map using a deep convolutional network. • Produces a vector that 𝑿 provides a global summary. 𝑿 𝑳 𝑁 𝑢

  27. Neural Map: Context Read • Associative read operation.

  28. Neural Map: Context Read • Simple 2x2 memory 𝑵 𝒖 • Color represents memory the agent wrote. 𝑵 𝒖

  29. Neural Map: Context Read 𝑟 𝑢 • Query vector 𝑟 𝑢 from state 𝑡 𝑢 and global read 𝑠 𝑢 (𝑡 𝑢 , r t ) 𝑵 𝒖

  30. Neural Map: Context Read 𝜷 𝒖 𝑟 𝑢 ⨀ • Dot product between query 𝒓 𝒖 and every memory cell. • Produces a similarity 𝜷 𝒖 𝑵 𝒖

  31. Neural Map: Context Read 𝜷 𝒖 𝑟 𝑢 ⨀ • Dot product between query 𝒓 𝒖 and every memory cell. • Produces a similarity 𝜷 𝒖 𝑵 𝒖

  32. Neural Map: Context Read 𝜷 𝒖 𝑟 𝑢 • Dot product between query 𝒓 𝒖 and every ⨀ memory cell. • Produces a similarity 𝜷 𝒖 𝑵 𝒖

  33. Neural Map: Context Read 𝜷 𝒖 𝑟 𝑢 • Dot product between query 𝒓 𝒖 and every ⨀ memory cell. • Produces a similarity 𝜷 𝒖 𝑵 𝒖

  34. Neural Map: Context Read 𝜷 𝒖 𝜷 𝒖 ⊛ 𝑵 𝒖 ⊛ 𝑟 𝑢 • Element-wise product between query- similarities 𝜷 𝒖 and memory cells 𝑵 𝒖 (𝑡 𝑢 , r t ) 𝑵 𝒖

  35. Neural Map: Context Read 𝜷 𝒖 𝜷 𝒖 ⊛ 𝑵 𝒖 } ෍ 𝑑 𝑢 ⊛ 𝑟 𝑢 • Sum over all 4 positions to get context read vector 𝒅 𝒖 (𝑡 𝑢 , r t ) 𝑵 𝒖

  36. Neural Map: Context Read 𝜷 𝒖 𝜷 𝒖 ⊛ 𝑵 𝒖 } ෍ 𝑑 𝑢 ⊛ 𝑟 𝑢 Intuitively: • Return vector 𝒅 𝒖 in memory 𝑵 𝒖 closest to query 𝒓 𝒖 (𝑡 𝑢 , r t ) 𝑵 𝒖

  37. Neural Map: Write • Creates a new k-dim vector to write to the current position in the neural map. • Update the neural map at the current position with this new vector.

  38. Neural Map: Update 𝑵 𝒖+𝟐 𝑥 𝑢+1 𝑵 𝒖

  39. Neural Map: GRU Write Update 𝑵 𝒖+𝟐 ⨀ 𝑥 𝑢+1 𝑵 𝒖 Chung et al., 2014

  40. Neural Map: Output • Output the read vectors and what we wrote. • Use those features to calculate a policy.

  41. Results: Random Maze with Indicator Input State (Partially observable) 3x15x3 Real Map (Not Visible) 3xKxK

  42. Random Maze Results

  43. 2D Maze Visualization

  44. Task: Doom Maze ViZDoom (Kempka et al., 2016)

  45. Doom Maze Results

  46. Egocentric Neural Map • Problem with Neural Map: it requires mapping from (x,y) to ( x’,y’) • Means we need to have already solved localization • Another way might be to get a map which is egocentric: • The agent always writes to the center of the map. • When the agent moves, the entire map moves by the opposite amount.

  47. Conclusion • Designed a novel memory architecture suited for DRL agents. • Spatially structured memory useful for navigation tasks. • Sparse write which simplifies credit assignment. • Demonstrated its ability to store information over long time lags. • Surpassed performance of several previous memory-based agents.

  48. Future Directions • Can we extend to multi-agent domains? • Multiple agents communicating through shared memory. • Can we train an agent to learn how to simultaneously localize and map its environment using the Neural Map? • Solves problem of needing an oracle to supply (x,y) position. • Can we structure neural maps into a multi-scale hierarchy? • Each scale will incorporate longer range information.

  49. Thank you Contact Information: Emilio Parisotto (eparisot@andrew.cmu.edu)

  50. Extra Slides

  51. What does the Neural Map learn to store? Observations True State Context Read Distribution

  52. Neural Map: Summary

Recommend


More recommend