Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1
Learning Paradigms [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 2
Examples of Reinforcement Learning [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 3
Robot in a room [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 4
History of Reinforcement Learning o Roots in the psychology of animal learning (Thorndike,1911). o Another independent thread was the problem of optimal control, and its solution using dynamic programming (Bellman, 1957). o Idea of temporal difference learning (on-line method), e.g., playing board games (Samuel, 1959). o A major breakthrough was the discovery of Q-learning (Watkins, 1989). [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 5
What is special about RL? [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 6
Elements of RL [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 7
Policy o Reward for each step: -0.1 o Reward for each step -2 [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 8
The Precise Goal [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 9
Reinforcement Learning o Train a policy to maximize the discounted, cumulative reward R t0 : o γ : should be a constant between 0 and 1 o Bellman equation (deterministic): o Bellman equation (stochastic): [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 10
Value Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 11
Value Iteration Convergence [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 12
Example: Robot Localization [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 13
Value Iteration Variants o Variant 1: w/ Q(s,a) table à o Variant 2: w/o Q(s,a) table [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 14
Synchronous vs. Asynchronous Value Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 15
Value Iteration Convergence [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 16
Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 17
Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 18
Value Iteration vs. Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 19
Deep Q-Learning [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 20
TD Gammon à Alpha Go [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 21
Playing Atari with Deep RL [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 22
Deep Q-Network (DQN) algorithm o Goal: train Q(s, a) to fit the unknown reward (Q) function. o Then, best policy: o Bellman equation: o Temporal difference error: o Huber loss: o B : a batch of transitions, sampled from the replay memory [Slide from https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html ] Yifeng Tao Carnegie Mellon University 23
Experience Replay [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 24
Alpha Go [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 25
Constructing Genetic Association Database [Slide from Wang et al. ] Yifeng Tao Carnegie Mellon University 26
Constructing Genetic Association Database [Slide from Wang et al. ] Yifeng Tao Carnegie Mellon University 27
Take home message o Reward, value, and policy in reinforcement learning o Value iteration and convergence guarantee o Policy iteration o Deep Q-learning uses neural network to approximate Q-functions Yifeng Tao Carnegie Mellon University 28
References o Matt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html o Eric Xing, Tom Mitchell. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701-06f/ o Adam Paszke. Reinforcement Learning (DQN) Tutorial: https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.ht ml o Haohan Wang et al. 2019: Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning Yifeng Tao Carnegie Mellon University 29
Recommend
More recommend