making deep q learning approaches robust to time
play

Making Deep Q-learning Approaches Robust to Time Discretization - PowerPoint PPT Presentation

Making Deep Q-learning Approaches Robust to Time Discretization Corentin Tallec L eonard Blier Yann Ollivier Universit e Paris-Sud, Facebook AI Research June 4, 2019 C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4,


  1. Making Deep Q-learning Approaches Robust to Time Discretization Corentin Tallec L´ eonard Blier Yann Ollivier Universit´ e Paris-Sud, Facebook AI Research June 4, 2019 C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 1 / 4

  2. Reinforcement Learning in Near Continuous Time What happens when using standard RL methods with small time discretization or high framerate ? Usual RL algorithm + high framerate → failure Scalability limited by algorithms ! Better hardware, sensors, actuators → Worse performance Contributes to lack of robustness of Deep RL: New environment → different framerate → new hyperparameters. Low FPS High FPS C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 2 / 4

  3. Why is near continuous Q-learning failing? There is no continuous time Q-learning As δ t → 0, Q π ( s , a ) → V π ( s ) Q π does not depend on actions when δ t → 0 ⇒ Cannot use Q π to select actions! = There is no continuous time ε -greedy exploration ε -greedy, ε = 1 pendulum: δ t = . 05 δ t = . 0001 C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 3 / 4

  4. Can we solve this? YES To know how: Poster #32 this evening Low FPS High FPS C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 4 / 4

Recommend


More recommend