Anatomy of an RL agent: model, policy, value function Robert Platt - PowerPoint PPT Presentation

Anatomy of an RL agent: model, policy, value function Robert Platt Northeastern University

Running example: gridworld Gridworld: – agent lives on grid – always occupies a single cell – can move left, right, up, down – gets zero reward unless in “+1” or “-1” cells

States and actions State set: Action set:

Reward function Reward function: Otherwise:

Reward function Reward function: Otherwise: In general:

Reward function Expected reward on this time step given that agent takes action from state Reward function: Otherwise: In general:

Agent Model Transition model: For example:

Agent Model Transition model: For example: – This entire probability distribution can be written as a table over state, action, next state. probability of this transition

Agent Model: Summary State set: Action set: Reward function: Transition model:

Agent Model: Frozen Lake Example Frozen Lake is this 4x4 grid State set: Action set: if Reward function: otherwise Transition model: only one third chance of going in specified direction – one third chance of moving +90deg – one third change of moving -90deg

Agent Model: Recycling Robot Example Example 3.4 in SB, 2 nd Ed.

Policy A policy is a rule for selecting actions: If agent is in this state, then take this action

Policy A policy is a rule for selecting actions: If agent is in this state, then take this action A policy can be stochastic:

Episodic vs Continuing Process Episodic process: execution ends at some point and starts over. – after a fixed number of time steps – upon reaching a terminal state Terminal state Example of an episodic task: – execution ends upon reaching terminal state OR after 15 time steps

Episodic vs Continuing Process Continuing process: execution goes on forever. Example of a continuing task Process doesn’t stop – keep getting rewards

Value Function Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy

Value Function Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy Called the Value Function

Value Function Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy Called the Value Function Why we care about the value function: Because it helps us calculate a good policy – we’ll see how shortly.

Value Function Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy

Value Function Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy what’s wrong with this?

Value Function Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy Two viable alternatives: 1. maximize expected future reward over the next T timesteps (finite horizon): 2. maximize expected discounted future rewards:

Value Function Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy Two viable alternatives: 1. maximize expected future reward over the next T timesteps (finite horizon): Discount factor – 0.9 is a typical value 2. maximize expected discounted future rewards:

Value Function Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy Two viable alternatives: Standard formulation for value function 1. maximize expected future reward over the next T timesteps (finite horizon): – notice this is a function over state 2. maximize expected discounted future rewards:

Optimal policy Value of state when acting according to policy Expected discounted future reward starting at state and acting according to policy Why we care about the value function: because can be used to calculate a good policy.

Value function example 1 Policy: Discount factor: Value fn: 6.9 6.6 7.3 8.1 9 10

Value function example 1 Notice that value function can help us compare two different policies – how? Policy: Discount factor: Value fn: 6.9 6.6 7.3 8.1 9 10

Value function example 1 Policy: Discount factor: Value fn: 1 0.9 0.81 0.73 0.66 10.66

Value function example 1 Policy: Discount factor: Value fn: 6.9 6.6 7.3 8.1 9 10

Value function example 2 Policy: Discount factor: Value fn: 11 10 10 10 10 10

Value function example 3 Policy: Discount factor: Value fn: 7 6 7 8 9 10

Anatomy of an RL agent: model, policy, value function Robert Platt - PowerPoint PPT Presentation

Anatomy of an RL agent: model, policy, value function Robert Platt Northeastern University Running example: gridworld Gridworld: agent lives on grid always occupies a single cell can move left, right, up, down gets zero

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Matthew Tommack, D.O. October 13, 2018 Chest radiography Anatomy, pathology Shoulder

Core Surgical Anatomy Programme Induction to the Programme & the Human Anatomy Unit Revised

Fish Anatomy & Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy & Vision Welcome to Anatomy Emma Deighan Trainer in

Eye Disorders Patrick Sarte Anatomy of the Eye Uveitis Scleritis vs. Episcleritis

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Chapter2 Intelligent Agents 2 20070308 chap2 1 20070308 chap2 What Is An Agent ?

Machine Learning Intro 3/15/17 Recall: The Agent Function We can think of the entire agent, or

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Tutorial 1 How to configure and run the Feed-Flee-Mate game Introduction In this tutorial the

7.3 Learning test sequences/behaviors: General idea See Thornton, C. ; Cohen, O. ; Denzinger, J.

Big Data Myths and Facts: Explaining Digital Transformation to non-IT Professionals Boris Novikov

Caf Scientifique 1 The Leaky Pipeline and Age Chairs: Prof Kelly Mack , AAC&U and University

Application Layer Security with friendly support by P. Laskov, Ph.D., University of Tbingen

Improving quality modelling Stphane Vaucher PhD Defence November 15, 2010 Presentation

APAN Sensor Network WG Nationwide Initiatives of Projects Basuki suhardiman Basuki suhardiman ITB

Software engineering issues David Notkin Autumn Quarter 2008 So far design

Anatomy of an RL agent: model, policy, value function Robert Platt - PowerPoint PPT Presentation

Anatomy of an RL agent: model, policy, value function Robert Platt Northeastern University Running example: gridworld Gridworld: agent lives on grid always occupies a single cell can move left, right, up, down gets zero

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Matthew Tommack, D.O. October 13, 2018 Chest radiography Anatomy, pathology Shoulder

Core Surgical Anatomy Programme Induction to the Programme &amp; the Human Anatomy Unit Revised

Fish Anatomy &amp; Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy &amp; Vision Welcome to Anatomy Emma Deighan Trainer in

Eye Disorders Patrick Sarte Anatomy of the Eye Uveitis Scleritis vs. Episcleritis

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Chapter2 Intelligent Agents 2 20070308 chap2 1 20070308 chap2 What Is An Agent ?

Machine Learning Intro 3/15/17 Recall: The Agent Function We can think of the entire agent, or

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Tutorial 1 How to configure and run the Feed-Flee-Mate game Introduction In this tutorial the

7.3 Learning test sequences/behaviors: General idea See Thornton, C. ; Cohen, O. ; Denzinger, J.

Big Data Myths and Facts: Explaining Digital Transformation to non-IT Professionals Boris Novikov

Caf Scientifique 1 The Leaky Pipeline and Age Chairs: Prof Kelly Mack , AAC&amp;U and University

Application Layer Security with friendly support by P. Laskov, Ph.D., University of Tbingen

Improving quality modelling Stphane Vaucher PhD Defence November 15, 2010 Presentation

APAN Sensor Network WG Nationwide Initiatives of Projects Basuki suhardiman Basuki suhardiman ITB

Software engineering issues David Notkin Autumn Quarter 2008 So far design

Core Surgical Anatomy Programme Induction to the Programme & the Human Anatomy Unit Revised

Fish Anatomy & Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy & Vision Welcome to Anatomy Emma Deighan Trainer in

Caf Scientifique 1 The Leaky Pipeline and Age Chairs: Prof Kelly Mack , AAC&U and University