Reinforcement learning Yifeng Tao School of Computer Science - PowerPoint PPT Presentation

Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1

Learning Paradigms [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 2

Examples of Reinforcement Learning [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 3

Robot in a room [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 4

History of Reinforcement Learning o Roots in the psychology of animal learning (Thorndike,1911). o Another independent thread was the problem of optimal control, and its solution using dynamic programming (Bellman, 1957). o Idea of temporal difference learning (on-line method), e.g., playing board games (Samuel, 1959). o A major breakthrough was the discovery of Q-learning (Watkins, 1989). [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 5

What is special about RL? [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 6

Elements of RL [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 7

Policy o Reward for each step: -0.1 o Reward for each step -2 [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 8

The Precise Goal [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 9

Reinforcement Learning o Train a policy to maximize the discounted, cumulative reward R t0 : o γ : should be a constant between 0 and 1 o Bellman equation (deterministic): o Bellman equation (stochastic): [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 10

Value Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 11

Value Iteration Convergence [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 12

Example: Robot Localization [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 13

Value Iteration Variants o Variant 1: w/ Q(s,a) table à o Variant 2: w/o Q(s,a) table [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 14

Synchronous vs. Asynchronous Value Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 15

Value Iteration Convergence [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 16

Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 17

Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 18

Value Iteration vs. Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 19

Deep Q-Learning [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 20

TD Gammon à Alpha Go [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 21

Playing Atari with Deep RL [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 22

Deep Q-Network (DQN) algorithm o Goal: train Q(s, a) to fit the unknown reward (Q) function. o Then, best policy: o Bellman equation: o Temporal difference error: o Huber loss: o B : a batch of transitions, sampled from the replay memory [Slide from https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html ] Yifeng Tao Carnegie Mellon University 23

Experience Replay [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 24

Alpha Go [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 25

Constructing Genetic Association Database [Slide from Wang et al. ] Yifeng Tao Carnegie Mellon University 26

Constructing Genetic Association Database [Slide from Wang et al. ] Yifeng Tao Carnegie Mellon University 27

Take home message o Reward, value, and policy in reinforcement learning o Value iteration and convergence guarantee o Policy iteration o Deep Q-learning uses neural network to approximate Q-functions Yifeng Tao Carnegie Mellon University 28

References o Matt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html o Eric Xing, Tom Mitchell. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701-06f/ o Adam Paszke. Reinforcement Learning (DQN) Tutorial: https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.ht ml o Haohan Wang et al. 2019: Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning Yifeng Tao Carnegie Mellon University 29

Reinforcement learning Yifeng Tao School of Computer Science - PowerPoint PPT Presentation

Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1 Learning Paradigms [Slide from Matt

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Banking Dynamics and Capital Regulation Jos Vctor Ros Rull Tamon Takamura Yaz Terajima

gsmws An Opportunity for Rural Cellular Service Shaddi Hasan, Kurtis Heimerl, Kate Harrison,

1 Prof. S. Ben-Yaakov , DC-DC Converters [1-4] Example (Cont.) % = 50 % It is 100W ! The

NARUC CENTER FOR PARTNERSHIPS AND INNOVATION INNOVATION WEBINAR SERIES DECEMBER 19, 2019 DREAM

Reinforcement Learning for NLP Graham Neubig Site https://phontron.com/class/nn4nlp2019/ What

Reinforcement Learning-Based SLC Cache Technique for Enhancing SSD Write Performance Sangjin Yoo

Reinforcemen t Learning Read Chapter Exercises

Reinforcement Learning: Part 2 Chris Watkins Department of Computer Science Royal Holloway,