reinforcement learning
play

Reinforcement learning Yifeng Tao School of Computer Science - PowerPoint PPT Presentation

Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1 Learning Paradigms [Slide from Matt


  1. Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1

  2. Learning Paradigms [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 2

  3. Examples of Reinforcement Learning [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 3

  4. Robot in a room [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 4

  5. History of Reinforcement Learning o Roots in the psychology of animal learning (Thorndike,1911). o Another independent thread was the problem of optimal control, and its solution using dynamic programming (Bellman, 1957). o Idea of temporal difference learning (on-line method), e.g., playing board games (Samuel, 1959). o A major breakthrough was the discovery of Q-learning (Watkins, 1989). [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 5

  6. What is special about RL? [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 6

  7. Elements of RL [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 7

  8. Policy o Reward for each step: -0.1 o Reward for each step -2 [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 8

  9. The Precise Goal [Slide from Eric Xing ] Yifeng Tao Carnegie Mellon University 9

  10. Reinforcement Learning o Train a policy to maximize the discounted, cumulative reward R t0 : o γ : should be a constant between 0 and 1 o Bellman equation (deterministic): o Bellman equation (stochastic): [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 10

  11. Value Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 11

  12. Value Iteration Convergence [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 12

  13. Example: Robot Localization [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 13

  14. Value Iteration Variants o Variant 1: w/ Q(s,a) table à o Variant 2: w/o Q(s,a) table [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 14

  15. Synchronous vs. Asynchronous Value Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 15

  16. Value Iteration Convergence [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 16

  17. Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 17

  18. Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 18

  19. Value Iteration vs. Policy Iteration [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 19

  20. Deep Q-Learning [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 20

  21. TD Gammon à Alpha Go [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 21

  22. Playing Atari with Deep RL [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 22

  23. Deep Q-Network (DQN) algorithm o Goal: train Q(s, a) to fit the unknown reward (Q) function. o Then, best policy: o Bellman equation: o Temporal difference error: o Huber loss: o B : a batch of transitions, sampled from the replay memory [Slide from https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html ] Yifeng Tao Carnegie Mellon University 23

  24. Experience Replay [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 24

  25. Alpha Go [Slide from Matt Gormley ] Yifeng Tao Carnegie Mellon University 25

  26. Constructing Genetic Association Database [Slide from Wang et al. ] Yifeng Tao Carnegie Mellon University 26

  27. Constructing Genetic Association Database [Slide from Wang et al. ] Yifeng Tao Carnegie Mellon University 27

  28. Take home message o Reward, value, and policy in reinforcement learning o Value iteration and convergence guarantee o Policy iteration o Deep Q-learning uses neural network to approximate Q-functions Yifeng Tao Carnegie Mellon University 28

  29. References o Matt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html o Eric Xing, Tom Mitchell. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701-06f/ o Adam Paszke. Reinforcement Learning (DQN) Tutorial: https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.ht ml o Haohan Wang et al. 2019: Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning Yifeng Tao Carnegie Mellon University 29

Recommend


More recommend