per decision option discounting
play

Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, - PowerPoint PPT Presentation

Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, Philippe Hamel, Ann Nowe, Doina Precup Motivation: Agents that reason over long temporal horizons Motivation: Agents that reason over long temporal horizons Horizon depends on


  1. Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, Philippe Hamel, Ann Nowe, Doina Precup

  2. Motivation: Agents that reason over long temporal horizons

  3. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ

  4. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ

  5. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ

  6. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ Large γ-s are inefficient in practice :(

  7. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction?

  8. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction? Options still tied to γ!

  9. Motivation: Agents that reason over long temporal horizons Contribution: Generalize the options framework to let it extend the agent’s horizon. Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction? Options still tied to γ!

  10. The Options Framework Reward model: Transition model:

  11. The Options Framework Reward model: Transition model:

  12. Options with Time Dilation Reward model: Transition model: (2) per-decision (1) decouple

  13. Options with Time Dilation Reward model: Transition model: (2) per-decision (1) decouple

  14. Options with Time Dilation Reward model: Transition model: (2) per-decision (1) decouple γ p controls how much we care about option duration (pseudo-primitive when γ p =1)

  15. Options with Time Dilation Reward model: Transition model: (2) per-decision (1) decouple γ p controls how much we care about option duration (pseudo-primitive when γ p =1) Key intuition: Insulate option time from global time

  16. Primitive Timestep Invariance Ours Classical

  17. Bias-Variance Tradeoff Empirical error (Four Rooms) Analytical variance bound

  18. Bias-Variance Tradeoff Empirical error (Four Rooms) Analytical variance bound Larger γ p can induce less variance!

  19. Bias-Variance Tradeoff Empirical error (Four Rooms) Analytical variance bound Thanks! More at poster #114 :) Larger γ p can induce less variance!

Recommend


More recommend