using state predictions for value regularization in
play

Using State Predictions for Value Regularization in Curiosity Driven - PowerPoint PPT Presentation

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning Oliver Richter, , Manuel Fritsche Gino Brunner, Roger Wattenhofer ETH Zurich Distributed Computing


  1. Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning Oliver Richter, , Manuel Fritsche Gino Brunner, Roger Wattenhofer ETH Zurich – Distributed Computing – www.disco.ethz.ch TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AAAA

  2. Base actions on predictions

  3. Reinforcement learning Agent Environment

  4. Reinforcement learning

  5. How to choose the action?

  6. Return value

  7. Value function

  8. Reinforcement learning Agent Environment

  9. Sparse reward settings ? Agent Environment

  10. Agent Environment

  11. Reward the exploration of novel states

  12. Reward the exploration of novel states

  13. How to find novel states? make predictions A

  14. How to find novel states? make predictions get surprised A F

  15. Curiosity prediction reality

  16. Asynchronous Advantage Actor-Critic architecture (A3C) Feature A3C Extractor Network A3C

  17. Adding curiosity Feature A3C Extractor Network 1 Forward Feature Model Extractor 2

  18. Learning good features Feature A3C Extractor Network 1 Forward Feature Model Extractor 2 Inverse Feature Model Extractor 2 Pathak et. al, ICML 2017, A3C + ICM

  19. Good features for all Feature A3C Extractor Network Forward Model Inverse Feature Model Extractor A3C + Pred

  20. Adding Value Prediction Feature A3C Extractor Network Forward A3C Model Network Inverse Feature Model Extractor A3C + Pred + VPC

  21. Value Prediction Consistency

  22. Value Prediction Consistency

  23. Value Prediction Consistency

  24. Let’s see how it works in practice

  25. Rewards per episode

  26. Rewards per episode

  27. Rewards per episode

  28. Rewards per episode

  29. Thinking bigger

  30. Rewards per episode

  31. Rewards per episode

  32. Rewards per episode

  33. Rewards per episode

  34. Doom environment

  35. Doom Setup

  36. Rewards per episode

  37. Rewards per episode

  38. Rewards per episode

  39. Rewards per episode

  40. Question & Answers ?

Recommend


More recommend