Lipschitz Continuity in Model-based Reinforcement Learning Kavosh - PowerPoint PPT Presentation

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi*, Dipendra Misra*, Michael L. Littman * denotes equal contribution � 1

Model-based RL value/policy planning acting model experience model learning model learning: T ( s 0 | s, a ) ≈ b T ( s 0 | s, a ) R ( s, a ) ≈ b R ( s, a ) planning: s 1 b s 1 b s 0 s 1 b b T s 2 b s 2 s 2 b s 3 b s 3 s 3 � 2

Compounding Error [Talvitie 2014, Venkatraman et al. 2015] ‣ happens when models are imperfect, which is almost always true ‣ estimation error or partial observability ‣ agnostic setting credit to Matt Cooper for the video github.com/dyelax truth model � 3

Main Takeaway Lipschitz continuity plays a key role in compounding errors and more generally in the theory of model-based RL Given two metric spaces and , a function f : M 1 7! M 2 ( M 1 , d 1 ) ( M 2 , d 2 ) is Lipschitz if the Lipschitz constant defined below is finite: � � f ( s 1 ) , f ( s 2 ) d 2 K d 1 ,d 2 ( f ) := sup d 1 ( s 1 , s 2 ) s 1 ∈ M 1 ,s 2 ∈ M 1 f ( s 1 ) f ( s ) s 1 � 4

Wasserstein Metric [Villani, 2008] in stochastic domains, we need to quantify di ff erence between two distributions µ 2 µ 1 µ 2 µ 1 Z Z W ( µ 1 , µ 2 ) := inf j ( s 1 , s 2 ) d ( s 1 , s 2 ) ds 2 ds 1 j ∈ Λ � 5

Three Theorems ‣ multi-step prediction error ‣ value function estimation error ‣ Lipschitz continuity of value function � 6

Multi-step Prediction Error assume a accurate model: ∆ � b � T ( · | s, a ) , T ( · | s, a ) ≤ ∆ ∀ s ∀ a W K ( b given a accurate model with a Lipschitz constant and a true T ) ∆ model with Lipschitz constant and a state distribution : µ ( s ) K ( T ) n − 1 X � b � T n ( · | µ ) , T n ( · | µ ) ( k ) i δ ( n ) := W ≤ ∆ i =0 � � K ( T ) , K ( b δ :error :prediction horizon k : min T ) n � 7

Value Function Estimation Error value/policy planning acting model experience model learning :Lipschitz constant of reward K ( R ) how inaccurate can the value function be? γ K ( R ) ∆ � � � ≤ � V T ( s ) − V b T ( s ) ∀ s (1 − γ )(1 − γ k ) � � K ( T ) , K ( b k : min T ) � 8

Lipschitz Continuity of Value Function ‣ Generalized VI [Littman and Szepesvári, 96]: a Lipschitz operator ‣ repeat until convergence: Z T ( s 0 | s, a ) f � � Q ( s 0 , · ) ds 0 Q ( s, a ) ← R ( s, a )+ γ ‣ value function is Lipschitz in every iteration (including the fixed point) K ( R ) K ( Q ) ≤ 1 − γ K ( T ) ‣ one implication: value-aware model learning [Farahmand et al, 2017] is equivalent to Wasserstein (will appear in PGMRL workshop later in the conference) � 9

Controlling Lipschitz Constant with Neural Nets for each layer, ensure the weights are in a desired norm ball: Lipschitz constant of entire net is bounded by multiplication of Lipschitz constant of layers � 10

Is Controlling the Lipschitz Constant of Transition Models Useful? ‣ Cartpole (left) and Pendulum (right) ‣ learn a model o ffl ine using random -250 samples -500 -750 -1000 ‣ perform policy gradient using the -1250 model -1500 ‣ test the policy in the environment average return per episode ‣ improved reward (higher is better) by an intermediate Lipschitz value more experiments (including on stochastic domains) in the paper � 11

Contributions: ‣ key role of Lipschitz constant in model-based RL: ‣ compounding error ‣ value function estimation error ‣ Lipschitz continuity of value function ‣ learning stochastic models using EM (skipped, details in the paper) ‣ quantifying Lipschitz constant of neural nets (skipped, details in the paper) ‣ model regularization by controlling the Lipschitz constant ‣ usefulness of Wasserstein for model-based RL (skipped, details in the paper) Questions? � 12

★ ★ ★ ★ References: Littman and Szepesvári, "A Generalized Reinforcement-Learning model: Convergence and Applications", 1996 Villani, "Optimal Transport, Old and New", 2014 Talvitie, "Model Regularization for Stable Sample Rollouts", 2014 Venkatraman, Hebert, and Bagnell, "Improving Multi-Step Prediction of Learned Time Series Models", 2015 � 13

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh - PowerPoint PPT Presentation

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi, Dipendra Misra, Michael L. Littman * denotes equal contribution 1 Model-based RL value/policy planning acting model experience model learning model learning: T

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Continuity and Recovery Planning Continuity and Recovery Planning Continuity and Recovery

Business Continuity Board of Directors August 26, 2013 Agenda Business Continuity

Focus Slide . p.1 . p.2 CONTINUITY . p.2 CONTINUITY . p.2 CONTINUITY

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Continuity Definition (Continuity) A function f is said to be continuous at c if lim x c f ( x

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

Lipschitz continuity properties Raf Cluckers (joint work with G. Comte and F. Loeser)

Advanced Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey

Regularization with Lipschitz Loss Pierre Alquier Sequential, structured, and/or statistical

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

State Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

HTTP SECURITY HEADERS (Protection For Browsers) BIO Emmanuel JK Gbordzor ISO 27001 LI, CISA,

EORTC GCG 55994 Randomized phase III study of neoadjuvant CT followed by surgery vs. concomitant

O utcome and R esource I M pacts Clinical outcomes of FFR CT -guided diagnostic strategies versus

Categories of Timed Stochastic Relations Daniel Brown and Riccardo Pucella PRL, Northeastern

Market Efficiency and Financial Econometrics: A Brief Historical Perspective J.Y. Campbell, A.W.

INTRODUCTION TO TIME-WEIGHTED CMI RESIDENT ROSTERS 1 Roster Report INTRODUCTION TO CONNECTICUT

Addressing Nutrition and Physical Activity through ESSA Implementation June 26, 2017 Addressing

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh - PowerPoint PPT Presentation

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi*, Dipendra Misra*, Michael L. Littman * denotes equal contribution 1 Model-based RL value/policy planning acting model experience model learning model learning: T

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Continuity and Recovery Planning Continuity and Recovery Planning Continuity and Recovery

Business Continuity Board of Directors August 26, 2013 Agenda Business Continuity

Focus Slide . p.1 . p.2 CONTINUITY . p.2 CONTINUITY . p.2 CONTINUITY

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Continuity Definition (Continuity) A function f is said to be continuous at c if lim x c f ( x

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

Lipschitz continuity properties Raf Cluckers (joint work with G. Comte and F. Loeser)

Advanced Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey

Regularization with Lipschitz Loss Pierre Alquier Sequential, structured, and/or statistical

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

State Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

HTTP SECURITY HEADERS (Protection For Browsers) BIO Emmanuel JK Gbordzor ISO 27001 LI, CISA,

EORTC GCG 55994 Randomized phase III study of neoadjuvant CT followed by surgery vs. concomitant

O utcome and R esource I M pacts Clinical outcomes of FFR CT -guided diagnostic strategies versus

Categories of Timed Stochastic Relations Daniel Brown and Riccardo Pucella PRL, Northeastern

Market Efficiency and Financial Econometrics: A Brief Historical Perspective J.Y. Campbell, A.W.

INTRODUCTION TO TIME-WEIGHTED CMI RESIDENT ROSTERS 1 Roster Report INTRODUCTION TO CONNECTICUT

Addressing Nutrition and Physical Activity through ESSA Implementation June 26, 2017 Addressing

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi, Dipendra Misra, Michael L. Littman * denotes equal contribution 1 Model-based RL value/policy planning acting model experience model learning model learning: T