Kernel-based Reinforcement Learning in Robust Markov Decision - PowerPoint PPT Presentation

Dec 23, 2023 •375 likes •520 views

Kernel-based Reinforcement Learning in Robust Markov Decision Processes Shiau Hong Lim, Arnaud Autef Motivation Robust Markov Decision Process (MDP) framework Tackle model mismatch and parameter uncertainty Previously, for state

Kernel-based Reinforcement Learning in Robust Markov Decision Processes Shiau Hong Lim, Arnaud Autef
Motivation • Robust Markov Decision Process (MDP) framework – Tackle model mismatch and parameter uncertainty – Previously, for state aggregation, performance bound on improved via robust policies: 12/6/2019 Arnaud Autef - ICML 2019 2
Contribution 1. Robust performance bound improvement on extended to the general kernel averager setting 2.Formulation of a practical kernel-based robust algorithm, with empirical results on benchmark tasks 12/6/2019 Arnaud Autef - ICML 2019 3
Kernel-based approach 1.MDP to solve 2.Kernel averager and representative states to approximate the value function: and 12/6/2019 Arnaud Autef - ICML 2019 4
Kernel-based approach 2.Define a non-trivial robust MDP with states = representative states 3.Obtain optimal robust value in 4.Derive in greedy w.r.t , with: 12/6/2019 Arnaud Autef - ICML 2019 5
Theoretical Result Theorem : optimal robust value in , greedy policy w.r.t , optimal value in : ∗ – � � � ∗ – Function approximator limitations � � ∗ Smoothness – � � � � � � 12/6/2019 Arnaud Autef - ICML 2019 6
Practical algorithm 1.Second kernel averager to approximate the MDP model from data 2.Solve with the approximate robust Bellman operator: With Robustness parameter 12/6/2019 Arnaud Autef - ICML 2019 7
Experiments: Acrobot 12/6/2019 Arnaud Autef - ICML 2019 8
Acrobot 12/6/2019 Arnaud Autef - ICML 2019 9
Experiments: Double Pole Balancing 12/6/2019 Arnaud Autef - ICML 2019 10
Double Pole Balancing 12/6/2019 Arnaud Autef - ICML 2019 11
Conclusion • Theoretical performance guarantees for robust kernel-based reinforcement learning in • Significant empirical benefits from robustness, even stronger with model mismatch (real-world settings) 12/6/2019 Arnaud Autef - ICML 2019 12
Thank you! Please come to see our poster tonight Shiau Hong Lim, Arnaud Autef

Recommend

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr Conor McArdle EE414 - Markov Chains 1/30 Markov Processes A Markov Process is a stochastic process X t with the Markov property : Pr ( X t n x n |

491 views • 30 slides

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models

470 views • 8 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

587 views • 27 slides

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and Simulation-Based Search Outline 1 Reinforcement Learning 2 Simulation-Based Search 3 Planning Under

425 views • 20 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov chains Hidden Markov Models (HMMs) Algorithms: Viterbi, forward, backward, posterior decoding Profile HMMs Baum-Welch algorithm 9001

1.16k views • 87 slides

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

CSCE 471/871 Lecture 3: CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden Markov Models Stephen Scott Markov Chains Stephen Scott Hidden Markov Models Specifying an HMM sscott@cse.unl.edu 1

439 views • 26 slides

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem Inside an RL agent Temporal difference learning Many faces of Reinforcement Learning What is

552 views • 35 slides

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning? Spring 2019 Created:

371 views • 15 slides

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine playing a new game whose rules you dont know; after a hundred or so moves your don t know; after a hundred or so moves, your opponent announces, You

512 views • 30 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov Chains o Definition o Stationary Property o Paths in Markov Chains o Classification of States o Steady States in MCs. Stochastic Processes 2 Markov

781 views • 28 slides

Action Robust Reinforcement Learning and Applications in Continuous Control Chen Tessler *,

Action Robust Reinforcement Learning and Applications in Continuous Control Chen Tessler *, Yonathan Efroni* and Shie Mannor *equal contribution Poster #272 Action Robust Reinforcement Learning and Applications in Continuous Control Robust MDPs

502 views • 8 slides

Machine Learning and Data Mining Reinforcement Learning Markov Decision Processes Kalev Kask

+ Machine Learning and Data Mining Reinforcement Learning Markov Decision Processes Kalev Kask Overview Intro Markov Decision Processes Reinforcement Learning Sarsa Q-learning Exploration vs Exploitation tradeoff 2

1.81k views • 146 slides

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

GESG seminar, 16 October 2015, UFM Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification identification identification of identification of of of switching regimes: switching regimes: switching regimes:

582 views • 27 slides

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Reinforcement Learning and Markov Decision Process Q-Learning Q-Learning Convergence Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler Seto (ss3349) Introduction to Reinforcement Learning and

565 views • 27 slides

Soft Actor-Critic Zikun Chen, Minghan Li Jan. 28, 2020 Soft Actor-Critic: Ofg-Policy Maximum

Soft Actor-Critic Zikun Chen, Minghan Li Jan. 28, 2020 Soft Actor-Critic: Ofg-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine Outline Problem: Sample

1.42k views • 62 slides

Option Pricing with Semi-Markov Switching Lvy Process Financial Mathematics Lunch Talk Yi

Option Pricing with Semi-Markov Switching Lvy Process Financial Mathematics Lunch Talk Yi (Ivy) Zhang Department of Mathematics and Statistics University of Calgary February 26, 2019 Yi (Ivy) Zhang (Universities of Calgary) Option Pricing

516 views • 35 slides

A structural risk-neutral model for pricing and hedging power derivatives FiME Research Centre

Position of the problem Spot model Pricing & hedging Risk premium vs error model Conclusion A structural risk-neutral model for pricing and hedging power derivatives FiME Research Centre Monthly Seminar - Paris Ren e A d, Luciano

1.25k views • 124 slides

REGULARITY FOR SINGULAR RISK-NEUTRAL VALUATION EQUATIONS Kolmogorov Equations in Physics and

REGULARITY FOR SINGULAR RISK-NEUTRAL VALUATION EQUATIONS Kolmogorov Equations in Physics and Finance Modena, Italy September 8-10, 2010 Marco Papi Engineering School - UCBM, Roma ( Italy ) m.papi@unicampus.it (based on joint work with C.

736 views • 39 slides

Specification of Concretization and Symbolization Policies in Symbolic Execution S ebastien

Specification of Concretization and Symbolization Policies in Symbolic Execution S ebastien Bardin joint work with Robin David, Josselin Feist, Laurent Mounier, Marie-Laure Potet, Thanh Dihn Ta, Jean-Yves Marion CEA LIST (Paris-Saclay,

862 views • 61 slides

Poverty Measurement and the Distribution of Deprivations among the Poor Sabina Alkire OPHI,

Poverty Measurement and the Distribution of Deprivations among the Poor Sabina Alkire OPHI, Oxford James E. Foster George Washington University and OPHI, Oxford UNU-WIDER Conference on 'Inequality - measurement, trends, impacts, and

718 views • 44 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

The Robustness of Go A study of Go and its ecosystem Agenda - What does it mean to be robust?

The Robustness of Go A study of Go and its ecosystem Agenda - What does it mean to be robust? - Robust features of Go - Fragile features of Go - Giving up - Well, actually: Erlang - A new hope About me Francesc Campoy (@francesc /

822 views • 57 slides