Policy Consolidation for Continual Reinforcement Learning Christos - PowerPoint PPT Presentation

Sep 04, 2023 •635 likes •811 views

Policy Consolidation for Continual Reinforcement Learning Christos Kaplanis 1 , Murray Shanahan 1,2 and Claudia Clopath 1 1 Imperial College London, 2 DeepMind 11th June 2019 Motivation Motivation Catastrophic Forgetting in Artificial Neural

Policy Consolidation for Continual Reinforcement Learning Christos Kaplanis 1 , Murray Shanahan 1,2 and Claudia Clopath 1 1 Imperial College London, 2 DeepMind 11th June 2019
Motivation
Motivation ◮ Catastrophic Forgetting in Artificial Neural Networks
Motivation ◮ Catastrophic Forgetting in Artificial Neural Networks
Motivation ◮ Catastrophic Forgetting in Artificial Neural Networks
Motivation ◮ Catastrophic Forgetting in Artificial Neural Networks
Motivation ◮ Catastrophic Forgetting in Artificial Neural Networks ◮ Agents should cope with
Motivation ◮ Catastrophic Forgetting in Artificial Neural Networks ◮ Agents should cope with ◮ Both discrete and continuous changes to data distribution
Motivation ◮ Catastrophic Forgetting in Artificial Neural Networks ◮ Agents should cope with ◮ Both discrete and continuous changes to data distribution ◮ No prior knowledge of when/how changes occur
Motivation ◮ Catastrophic Forgetting in Artificial Neural Networks ◮ Agents should cope with ◮ Both discrete and continuous changes to data distribution ◮ No prior knowledge of when/how changes occur ◮ Test beds: alternating task, single task and multi-agent RL
Policy Consolidation Train KL distillation agent loss Play game 𝜌 1 𝜌 2 𝜌 3 𝜌 N ... old old old 𝜌 1 𝜌 2 𝜌 3 old 𝜌 N Store Policy Recall Policy
Alternating task experiments [Walker2d-v2, [HalfCheetah-v2, Walker2dBigLeg-v0] HalfCheetahBigLeg-v0] PC PC β = 1 β = 1 4000 6000 β = 5 β = 5 3000 Reward β = 10 Reward β = 10 4000 β = 20 β = 20 2000 β = 50 β = 50 2000 1000 clip=0.2 clip=0.2 clip=0.1 clip=0.1 0 0 clip=0.03 clip=0.03 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Steps 1e7 Steps 1e7 [HumanoidSmallLeg-v0, HumanoidBigLeg-v0] PC 6000 β = 1 β = 5 5000 β = 10 4000 Reward β = 20 3000 β = 50 2000 clip=0.2 clip=0.1 1000 clip=0.03 0 adaptive β 0.0 0.5 1.0 1.5 2.0 Steps 1e7
Single task experiments [Walker2d-v2] [HalfCheetahBigLeg-v0] PC PC β = 1 β = 1 8000 5000 β = 5 β = 5 6000 4000 β = 10 β = 10 Reward Reward β = 20 β = 20 3000 4000 β = 50 β = 50 2000 clip=0.2 clip=0.2 2000 1000 clip=0.1 clip=0.1 0 clip=0.03 clip=0.03 0 adaptive β adaptive β 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Steps Steps 1e7 1e7 [RoboschoolHumanoid-v1] PC β = 1 2500 β = 5 2000 β = 10 Reward 1500 β = 20 β = 50 1000 clip=0.2 500 clip=0.1 clip=0.03 0 adaptive β 0 1 2 3 4 5 Steps 1e7
Multi-agent self-play experiments 1.0 1.0 PC1 β = 0.5 PC2 β = 1.0 0.9 PC3 β = 2.0 0.8 Clip=0.2 β = 5.0 Mean Score Mean Score Clip=0.1 Adaptive β 0.8 0.6 0.7 0.4 0.6 PC vs Clip=0.2 PC vs β = 2.0 0.2 PC vs Clip=0.1 PC vs β = 5.0 0.5 PC vs β = 0.5 PC vs Adaptive β PC vs β = 1.0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Steps 1e8 Steps 1e8 (a) Final model vs. self history (b) PC vs. baselines over training
Future work
Future work ◮ Prioritised consolidation
Future work ◮ Prioritised consolidation ◮ Adapt for off-policy learning

Recommend

Consolidation Chapter 9 Consolidation 1 3/31/2015 Consolidation Model Consolidation Model 0

3/31/2015 Consolidation Chapter 9 Consolidation 1 3/31/2015 Consolidation Model Consolidation Model 0 2 3/31/2015 Consolidation Model Consolidation Model 0

492 views • 16 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

587 views • 27 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem Inside an RL agent Temporal difference learning Many faces of Reinforcement Learning What is

552 views • 35 slides

Consolidation Rate Chapter 9 Consolidation Model 1 4/2/2015 Consolidation Rate

4/2/2015 Consolidation Rate Chapter 9 Consolidation Model 1 4/2/2015 Consolidation Rate 1 1 Coefficient of

361 views • 8 slides

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning? Spring 2019 Created:

371 views • 15 slides

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and Simulation-Based Search Outline 1 Reinforcement Learning 2 Simulation-Based Search 3 Planning Under

425 views • 20 slides

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine playing a new game whose rules you dont know; after a hundred or so moves your don t know; after a hundred or so moves, your opponent announces, You

512 views • 30 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

Continual / Lifelong Learning III: SupSup - Supermasks in Superposition Presenters: Akshata/Zifan

CS 839 Special Topics in AI: Deep Learning Fall 2020 Continual / Lifelong Learning III: SupSup - Supermasks in Superposition Presenters: Akshata/Zifan Scribes: Akshata/Zifan 1 Background Continual learning refers to the ability of learning

225 views • 11 slides

Continuum A Platform for Cost-Aware, Low-Latency Continual Learning Huangshi Tian, Minchen Yu,

Continuum A Platform for Cost-Aware, Low-Latency Continual Learning Huangshi Tian, Minchen Yu, Wei Wang @ HKUST Oct 11, 2018 1 Continual/Online vs. Batch/Offline Learning When fresh data arrive, offline / batch offline learning trains model

618 views • 31 slides

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement & Imitation) Goal: Find Optimal Policy State/Context s t Agent Imitation Learning: Optimize imitation loss Reinforcement Learning: Optimize

547 views • 53 slides

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3.

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic 5. Model Based Planning in Discrete

769 views • 53 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

Adversarial Continual Learning Sayna Ebrahimi Franziska Meier Roberto Calandra Trevor Darrell

Adversarial Continual Learning Sayna Ebrahimi Franziska Meier Roberto Calandra Trevor Darrell Marcus Rohrbach UC Berkeley Facebook AI Research Facebook AI Research UC Berkeley Facebook AI Research What is Continual Learning? Definition:

592 views • 35 slides

Debt Consolidation Presenter: Esteban Arze Debt Consolidation Have you ever wondered if there

Debt Consolidation Presenter: Esteban Arze Debt Consolidation Have you ever wondered if there is a connection between health and wealth? Better decisions Clarity Less stress Less medical bills Debt Consolidation American household debt

629 views • 23 slides

Please note, these are the actual video-recorded proceedings from the live CME event and may

Please note, these are the actual video-recorded proceedings from the live CME event and may include the use of trade names and other raw, unedited content. Current Concepts in Consolidation & Maintenance Therapy for Multiple Myeloma

639 views • 25 slides

Congratulations SKMC Class of 2020! From: The Financial Aid Office Team 1 Before we start the

Congratulations SKMC Class of 2020! From: The Financial Aid Office Team 1 Before we start the presentation.. Changes to federal loans are happening rapidly, affected by the COVID 19 Your payments will automatically stop from March

729 views • 53 slides

Hospital Competition and Hospital Mergers: What the Best Evidence Tells Us Webinar with Zack

Hospital Competition and Hospital Mergers: What the Best Evidence Tells Us Webinar with Zack Cooper (Yale University), Stuart Craig (University of Pennsylvania), Leemore Dafny (Harvard University), and Martin Gaynor (Carnegie Mellon University)

217 views • 11 slides

Flood Management Task Force April 17, 2020 Welcome and Introductions Thanks for attending!

Flood Management Task Force April 17, 2020 Welcome and Introductions Thanks for attending! Please introduce yourself in the chat box. Please mute your line. Unmute your line when you would like to speak during question and

725 views • 25 slides

CoCo: Compact and Optimized Consolidation of Modularized Service Function Chains in NFV Zili Meng

CoCo: Compact and Optimized Consolidation of Modularized Service Function Chains in NFV Zili Meng Jun Bi Haiping Wang Chen Sun Hongxin Hu NFV & Modularization Dedicated Dedicated Dedicated Dedicated NFV: Commodity Hardware Devices

380 views • 27 slides

Objectives Basic principles of lung ultrasound Key lung ultrasound findings Brief

10/9/2019 Point of Care Ultrasound Lung Ultrasound Stephanie Conner MD October 20, 2019 Objectives Basic principles of lung ultrasound Key lung ultrasound findings Brief overview of thoracentesis windows 2 1 10/9/2019

544 views • 21 slides

49.9% Minority Interest in Big River with Clear Path to Consolidation David Burritt President

49.9% Minority Interest in Big River with Clear Path to Consolidation David Burritt President and Chief Executive Officer Kevin Bradley Executive Vice President and Chief Financial Officer Rich Fruehauf Senior Vice President, Strategic

508 views • 18 slides

ALTERNATIVE PAYMENT METHOD (APM) TRENDS Erin Bonney, Manager of Payer-Provider Performance April

COLLECTING AND MEASURING ALTERNATIVE PAYMENT METHOD (APM) TRENDS Erin Bonney, Manager of Payer-Provider Performance April 30, 2020 CENTER FOR HEALTH INFORMATION AND ANALYSIS APM Data Collection and Reporting Cycle Data Collection: March

409 views • 8 slides