Kernel-based Reinforcement Learning in Robust Markov Decision Processes Shiau Hong Lim, Arnaud Autef
Motivation • Robust Markov Decision Process (MDP) framework – Tackle model mismatch and parameter uncertainty – Previously, for state aggregation, performance bound on improved via robust policies: 12/6/2019 Arnaud Autef - ICML 2019 2
Contribution 1. Robust performance bound improvement on extended to the general kernel averager setting 2.Formulation of a practical kernel-based robust algorithm, with empirical results on benchmark tasks 12/6/2019 Arnaud Autef - ICML 2019 3
Kernel-based approach 1.MDP to solve 2.Kernel averager and representative states to approximate the value function: and 12/6/2019 Arnaud Autef - ICML 2019 4
Kernel-based approach 2.Define a non-trivial robust MDP with states = representative states 3.Obtain optimal robust value in 4.Derive in greedy w.r.t , with: 12/6/2019 Arnaud Autef - ICML 2019 5
Theoretical Result Theorem : optimal robust value in , greedy policy w.r.t , optimal value in : ∗ – � � � ∗ – Function approximator limitations � � ∗ Smoothness – � � � � � � 12/6/2019 Arnaud Autef - ICML 2019 6
Practical algorithm 1.Second kernel averager to approximate the MDP model from data 2.Solve with the approximate robust Bellman operator: With Robustness parameter 12/6/2019 Arnaud Autef - ICML 2019 7
Experiments: Acrobot 12/6/2019 Arnaud Autef - ICML 2019 8
Acrobot 12/6/2019 Arnaud Autef - ICML 2019 9
Experiments: Double Pole Balancing 12/6/2019 Arnaud Autef - ICML 2019 10
Double Pole Balancing 12/6/2019 Arnaud Autef - ICML 2019 11
Conclusion • Theoretical performance guarantees for robust kernel-based reinforcement learning in • Significant empirical benefits from robustness, even stronger with model mismatch (real-world settings) 12/6/2019 Arnaud Autef - ICML 2019 12
Thank you! Please come to see our poster tonight Shiau Hong Lim, Arnaud Autef
Recommend
More recommend