Cautious Adaptation For RL in Safety-Critical Settings - PowerPoint PPT Presentation

3 1 2 Cautious Adaptation For RL in Safety-Critical Settings International Conference on Machine Learning 2020 Jesse Zhang 1 , Brian Cheung 1 , Chelsea Finn 2 , Sergey Levine 1 , Dinesh Jayaraman 3 1

Outline ● Short overview (4 Minutes) ● In-depth talk(11 Minutes) 2

Introduction ● Real-world RL hazardous in safety-critical settings ● Hard to reset from real-life failures ● How to adapt to unseen environments safely? 3

Motivation ● How do humans adapt? 4

Motivation ● Safety-Critical Adaptation (SCA): ○ Pretraining: Sandbox environments ○ Adaptation: Safety-critical target environment 5

Methodology Transfer risk knowledge from prior experience ● Safety-Critical Adaptation (SCA) ● Cautious Adaptation in RL (CARL) 6

Cautious Adaptation in RL (CARL) ● Approach (Model-Based): ○ Pretraining: probabilistic models capture state transition uncertainty 1 ○ Adaptation: utilize uncertainty to safely adapt to new environment (planning cost function modification) 1 PETS (Chua et al., 2018) 7

Environments Tested Cartpole Duckietown Half Cheetah (varying pole lengths) (varying car width) (varying disabled joint) 1 Duckietown (Chevalier-Boisvert et al., 2018) 8

Results (Cartpole) 9

Results (Duckietown Driving) 10

Results (Half-Cheetah) 11

Short Summary ● Capture environment risk with prior experience ○ Probabilistic dynamics models ● Plan with risk in mind for safety-critical adaptation 12

Outline ● Discussion of related works ● Detailed discussion of CARL methodology ● Further analysis of results ○ Comparison to other methods ○ Average reward, # of catastrophic events 13

Related Work ● Risk-Averse RL ○ Conditional Value at Risk ● Model Based RL for Safety ○ Explicit safety constraints �� ● Capturing Uncertainty ○ Meta-learning Rockafellar et al. (2000); Morimura et al. (2010); Borkar & Jain (2010); Chow & Ghavamzadeh (2014); Tamar et al. (2015); Chow et al. Fisac et al (2017); Sadigh & Kapoor (2017); Berkenkamp et al (2017); Ostafew et al (2016); Hakobyan et al (2019); Hanssen & Foss Nagabandi et al (2018); Sæmundsson et al (2018); Finn et al (2017); (2015); Rajeswaran et al. (2016); (2015); Hewing et al (2019); Aswani et al (2013) 14

Model-based RL Preliminaries: PETS Sequence with highest Ensemble of Probabilistic Trajectory sampling for action score is executed Dynamics Models candidate action selection Action Score Over predicted trajectories with actions A Reward for i’th trajectory 15

CARL for Safety Critical Adaptation ● PETS: Ensemble captures stochasticity in single environment ○ CARL: Captures uncertainty induced by variations across environments ● Pretraining: Train PETS ○ Randomly sample domain ○ Dynamics model captures uncertainty about state transitions, reward, and risk ● Adaptation: Unseen domain ○ Risk averse action selection 16

Risk Averse Action Selection Case 1: Low Reward Risk-Aversion, CARL (Reward) ● Select actions that minimize worst-case outcomes ● : worst 𝛿 percentile of predicted trajectories 17

Risk Averse Action Selection Case 2: Catastrophic State Risk-Aversion, CARL (State) ● Avoid catastrophic states directly ● Build state safety cost, g(A) ● Maximize: ● Lagrangian relaxation of constraint minimizing probability of encountering states in a catastrophic set 18

Risk Averse Action Selection Case 2: Catastrophic State Risk-Aversion 19

CARL System Overview 20

Environments Tested Cartpole Duckietown Half Cheetah (varying pole lengths) (varying car width) (varying disabled joint) 1 Duckietown (Chevalier-Boisvert et al., 2018) 21

Experiment Setup ● MB + Finetune: PETS, finetune on test environment RARL: Robust Adversarial Reinforcement Learning 1 ● PPO-MAML: Model-Agnostic Meta Learning 2 ● ● CARL (Reward): Reward-based CARL ● CARL (State): State-based CARL 1 (Pinto et al., 2017) 2 (Finn et al., 2017) 22

(State) 23

(State) 24

(State) 25

Summary ● Safety-Critical Adaptation (SCA) ○ Train on sandbox environments, adapt to safety-critical environments ● CARL and CARL (Reward) ○ Capture source uncertainty, perform risk-averse planning Thank you! 26

Cautious Adaptation For RL in Safety-Critical Settings - PowerPoint PPT Presentation

3 1 2 Cautious Adaptation For RL in Safety-Critical Settings International Conference on Machine Learning 2020 Jesse Zhang 1 , Brian Cheung 1 , Chelsea Finn 2 , Sergey Levine 1 , Dinesh Jayaraman 3 1 Outline Short overview (4 Minutes)

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Cautious label-wise ranking with constraint satisfaction Sbastien Destercke, Yonatan Carlos

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Intersection Safety Intersection Safety Intersection Safety FHWA Safety Focus Areas FHWA Safety

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

AN INTRODUCTION TO BACKGROUND SETTINGS: Allows you to change background BACKGROUND SETTINGS: Allows

Excluded Settings, and the Heightened Scrutiny Process November 4, 2015 Overview Background

Webinar #5: Critical Paths in the Evaluation of Adaptation Projects and Programmes 7 December

CYBER CYBER-SAFETY CYBER CYBER SAFETY SAFETY SAFETY BASICS BASICS Engineering Staff College

Safety Presentation The Silence 1 Safety Presentation SAFETY SAFETY OR 2 Safety

Innovative Climate Financing for Adaptation Mainstreaming Adaptation Financing in Development

Climate Adaptation Intro and Workshop Overview Paul Moss MPCA Adaptation/Mitigation

IUCN Ecosystem based approaches to adaptation and risk reduction and risk reduction 1. What is

Biodiversity, Ecosystem Services and Adaptation and Adaptation Dr Pushpam Kumar Associate

Action 1. Encourage MS to adopt Adaptation Strategies and action plans Action 2. LIFE funding,

Unramified cohomology and Chow groups Alena Pirutka Universit Paris-Sud, ENS Paris May 18,

A nonsmooth Chow-Rashevskis theorem Ermal Feleqi University of Vlora, Albania Optimization,

Real Options Switching Strategies in Dynamic Transport Service Operations Qian-wen Guo a , Joseph

The Bayesian Chow-Liu Algorithm Joe Suzuki Osaka University September 19, 2012 Granada, Spain

Ab Abstract Da Data Types Michael Ball UC Berkeley | Computer Science 88 | Michael Ball |

Invariants of degree 3 and torsion in the Chow group of a versal flag Alexander Merkurjev (UCLA),

: - tag lMnt d ! - din R ECI ) easy ecm ) d- Here in H ) Take I - - M Example - i = elk )

Evolution of White-Box Cryptography: From Table-Based Implementations to Recent Designs