Reinforcement learning Yifeng Tao School of Computer Science - PowerPoint PPT Presentation
Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1 Learning Paradigms [Slide from Matt
Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1
Learning Paradigms [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 2
Examples of Reinforcement Learning [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 3
Robot in a room [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 4
History of Reinforcement Learning o Roots in the psychology of animal learning (Thorndike,1911). o Another independent thread was the problem of optimal control, and its solution using dynamic programming (Bellman, 1957). o Idea of temporal difference learning (on-line method), e.g., playing board games (Samuel, 1959). o A major breakthrough was the discovery of Q-learning (Watkins, 1989). [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 5
What is special about RL? [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 6
Elements of RL [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 7
Policy o Reward for each step: -0.1 o Reward for each step -2 [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 8
The Precise Goal [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 9
Reinforcement Learning o Train a policy to maximize the discounted, cumulative reward R t0 : o γ : should be a constant between 0 and 1 o Bellman equation (deterministic): o Bellman equation (stochastic): [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 10
Value Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 11
Value Iteration Convergence [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 12
Example: Robot Localization [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 13
Value Iteration Variants o Variant 1: w/ Q(s,a) table à o Variant 2: w/o Q(s,a) table [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 14
Synchronous vs. Asynchronous Value Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 15
Value Iteration Convergence [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 16
Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 17
Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 18
Value Iteration vs. Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 19
Deep Q-Learning [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 20
TD Gammon à Alpha Go [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 21
Playing Atari with Deep RL [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 22
Deep Q-Network (DQN) algorithm o Goal: train Q(s, a) to fit the unknown reward (Q) function. o Then, best policy: o Bellman equation: o Temporal difference error: o Huber loss: o B : a batch of transitions, sampled from the replay memory [Slide from https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html ] Yifeng Tao Carnegie Mellon University 23
Experience Replay [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 24
Alpha Go [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 25
Constructing Genetic Association Database [Slide from Wang et al. ] Yifeng Tao Carnegie Mellon University 26
Constructing Genetic Association Database [Slide from Wang et al. ] Yifeng Tao Carnegie Mellon University 27
Take home message o Reward, value, and policy in reinforcement learning o Value iteration and convergence guarantee o Policy iteration o Deep Q-learning uses neural network to approximate Q-functions Yifeng Tao Carnegie Mellon University 28
References o Matt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html o Eric Xing, Tom Mitchell. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701-06f/ o Adam Paszke. Reinforcement Learning (DQN) Tutorial: https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.ht ml o Haohan Wang et al. 2019: Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning Yifeng Tao Carnegie Mellon University 29
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.