projections for approximate policy iteration algorithms
play

Projections for Approximate Policy Iteration Algorithms Riad Akrour - PowerPoint PPT Presentation

Projections for Approximate Policy Iteration Algorithms Riad Akrour , Joni Pajarinen, Gerhard Neumann, Jan Peters IAS, TU Darmstadt, Germany ICML19 Entropy Regularization in RL Widespread with actor-critic methods ICML19 Hard vs Soft


  1. Projections for Approximate Policy Iteration Algorithms Riad Akrour , Joni Pajarinen, Gerhard Neumann, Jan Peters IAS, TU Darmstadt, Germany ICML19

  2. Entropy Regularization in RL Widespread with actor-critic methods ICML19

  3. Hard vs Soft Constraints ● Soft constraint (bonus term) Entropy reg. Policy return ● Hard constraint – Harder to optimize, easier to interpret and tune ICML19

  4. Contributions ● Projections hard constraining Shannon entropy of Gaussian or soft-max policies ● Projections that outperform other KL-constrained optimizers used in deep RL ICML19

  5. Results ● Optimizing vs – Deep RL – Projected gradient – Direct policy search ICML19

  6. Results ● Optimizing vs – Deep RL Poster #34 Poster #34 – Projected gradient – Direct policy search ICML19

Recommend


More recommend