The Reinforcement Learning Problem Robert Platt Northeastern - PowerPoint PPT Presentation

Jan 24, 2023 •252 likes •374 views

The Reinforcement Learning Problem Robert Platt Northeastern University Agent Action Agent World Observation Reward On a single time step, agent does the following: 1. observe some information 2. select an action to execute 3. take note

The Reinforcement Learning Problem Robert Platt Northeastern University
Agent Action Agent World Observation Reward On a single time step, agent does the following: 1. observe some information 2. select an action to execute 3. take note of any reward Goal of agent: select actions that maximize sum of expected future rewards.
Example: rat in a maze Move left/right/up/down Agent World Observe position in maze Reward = +1 if get cheese
Example: robot makes coffee Move robot joints Agent World Observe camera image Reward = +1 if coffee in cup
Example: agent plays pong Joystick command Agent World Observe screen pixels Reward = game score
Reinforcement Learning Action Agent World Observation Reward Goal of agent: select actions that maximize sum of expected future rewards. – agent computes a rule for selecting actions to execute
Reinforcement Learning Joystick command Agent World Observe screen pixels Reward = game score Goal of agent: select actions that maximize sum of expected future rewards. – agent computes a rule for selecting actions to execute
Model Free Reinforcement Learning Joystick command Agent World Observe screen pixels Reward = game score Agent learns a strategy for selecting actions based on experience – no prior model of system dynamics, i.e. no prior knowledge of “how the world works” – no prior model of reward, i.e. no prior knowledge of what actions lead to reward
Distinction Relative to Planning Joystick command Agent World Observe screen pixels Reward = game score Agent learns a strategy for selecting actions based on prior model – agent is given a model of system dynamics in advance – agent is “told” which states/actions are rewarding or not
RL vs Planning When to use RL: When to use planning: – hard to model systems – when the system is easily modeled – stochastic systems – when the system is deteriminstic
RL vs Planning When to use RL: When to use planning: – hard to model systems – when the system is easily modeled – stochastic systems – when the system is deteriminstic Ultimately, RL and planning are closely related...

Recommend

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

589 views • 27 slides

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem Inside an RL agent Temporal difference learning Many faces of Reinforcement Learning What is

552 views • 35 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning? Spring 2019 Created:

371 views • 15 slides

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and Simulation-Based Search Outline 1 Reinforcement Learning 2 Simulation-Based Search 3 Planning Under

425 views • 20 slides

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine playing a new game whose rules you dont know; after a hundred or so moves your don t know; after a hundred or so moves, your opponent announces, You

512 views • 30 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Lecture 1: Introduction to Reinforcement Learning Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to Reinforcement Learning Outline 1. Course Logistics 2. What is Reinforcement Learning? 3.

930 views • 67 slides

Cooperative Inverse Reinforcement Learning Dylan Hadfield-Menell CS237: Reinforcement Learning

Cooperative Inverse Reinforcement Learning Dylan Hadfield-Menell CS237: Reinforcement Learning May 31, 2017 The Value Alignment Problem Example taken from Eliezer Yudkowskys NYU talk The Value Alignment Problem The Value Alignment Problem

1.21k views • 84 slides

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Reinforcement Learning and Markov Decision Process Q-Learning Q-Learning Convergence Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler Seto (ss3349) Introduction to Reinforcement Learning and

565 views • 27 slides

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B. Temporal Difference Reinforcement Learning C. PVLV Model D. Cerebellum and Error-driven Learning 2/23/18 COSC 494/594 CCN 2 Sensory-Motor Loop

791 views • 56 slides

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement learning? Agent/Actor + Action + Environment + State + Reward How does reinforcement learning work?

793 views • 31 slides

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement CSCE 496/896 Lecture 7: Learning Learning Consider learning to choose actions, e.g., Stephen Scott Reinforcement Learning Stephen Scott Robot

433 views • 9 slides

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index Basics of Reinforcement Learning Model Based vs Model Free Reinforcement Learning Autonomous Car collision avoidance What is Reinforcement

714 views • 24 slides

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement Learning Animesh Garg Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Richard S. Sutton , Doina

2.19k views • 29 slides

Replugging the Modern Desktop Kay Sievers <kay.sievers@suse.de> David Zeuthen

Replugging the Modern Desktop Kay Sievers <kay.sievers@suse.de> David Zeuthen <davidz@redhat.com> Linux Plumbers Conference Portland, OR, Sept 2009 History Back in the day /sbin/hotplug, scan entire /dev, /proc/scsi/scsi,

808 views • 62 slides

15-410 Democracy is three wolves and a sheep... ...voting on what's for dinner. Exam #1

15-410 Democracy is three wolves and a sheep... ...voting on what's for dinner. Exam #1 Oct. 25, 2006 Dave Eckhardt Dave Eckhardt - 1 - 15-410, F'06 L21_Exam A Word on the Final Exam Disclaimer Disclaimer Past performance is

767 views • 17 slides

Advanced Analytics in Business [D0S07a] Big Data Platforms & Technologies [D0S06a]

Advanced Analytics in Business [D0S07a] Big Data Platforms & Technologies [D0S06a] Unsupervised Learning Anomaly Detection Overview Frequent itemset and association rule mining Other itemset extensions Clustering Anomaly detection 2

1.01k views • 85 slides

Counting Polygon Triangulations is Hard David Eppstein University of California, Irvine

Counting Polygon Triangulations is Hard David Eppstein University of California, Irvine Symposium on Computational Geometry, June 2019 WARNING This is all just a gadget-based reduction. There are no new ideas. Triangulations of n convex points

778 views • 34 slides

WHO AM I? RN since 1979 CNM since 1982 Lawyer since 1991 Currently self-employed as a

Picture yourself as a risk management consultant as we look at case studies today What insights do you have regarding error? What can we do about cognitive bias? How can we effectively learn from preventable poor outcomes? How will

517 views • 17 slides

2 nd semester Topic 64: Grammar: Both, neither, either, nor, so We use either , neither

2 nd semester Topic 64: Grammar: Both, neither, either, nor, so We use either , neither and both when we are talking about two things. either = one OR the other neither = not one and not the other both = the first AND the second Make up

584 views • 5 slides

Milked and Feathered The Regressive Welfare Effects of Canadas Supply Management Regime

Milked and Feathered The Regressive Welfare Effects of Canadas Supply Management Regime Faculty of Agricultural and Food Sciences Seminar University of Manitoba October 15, 2014 Ryan Cardwell, Department of Agribusiness and Agricultural

368 views • 16 slides

TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION Mar co B uchler, Emily Franzini and Greta

TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION Mar co B uchler, Emily Franzini and Greta Franzini TABLE OF CONTENTS 1. Wha t is Selection? 2. Selection techniques 3. Hacking 4. Conclusion and revision 2/29 REMINDER: CURRENT APPROACH 3/29

321 views • 29 slides