ds595 cs525 reinforcement learning introduction logistics
play

DS595/CS525: Reinforcement Learning --Introduction & Logistics - PowerPoint PPT Presentation

This lecture will be recorded! Welcome to DS595/CS525: Reinforcement Learning --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm 8:50pm THURSDAY Zoom Lecture Fall 2020 Who am I? Yanhua Li , PhD Assistant Professor Computer


  1. This lecture will be recorded! Welcome to DS595/CS525: Reinforcement Learning --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm –8:50pm THURSDAY Zoom Lecture Fall 2020

  2. Who am I? Yanhua Li , PhD Assistant Professor Computer Science & Data Science PhD, Computer Science, U of Minnesota, 2013 PhD, Electrical Engineering, BUPT, 2009 Research Interests: Big data analytics, Artificial Intelligence, Spatio-temporal Data Mining, Smart Cities; Industrial Experience: Bell-Labs, Microsoft Research http://users.wpi.edu/~yli15/index.html

  3. Teaching Assistant Yingxue Zhang PhD Student with WPI Data Science Program

  4. What is this course about? v A advanced DS/CS course (primarily) for graduates v CS/DS Ph.D students in AI, DM, ML and related areas; v then, other Ph.D students or MS students with v Experience in Machine Learning, or equivalent knowledge. v Sufficient programming experience in python is expected so that you are comfortable to undertake the course projects. 4

  5. Topics for today v What is reinforcement learning? v Difference from Supervised and unsupervised machine learning? v Application stories. Break v Topics to be covered in this course. v Course logistics 5

  6. ? Reinforcement Learning What is it? ????? Let’s see some more examples

  7. Why (Deep) Reinforcement Learning? AlphaGo Mar. 2016

  8. Why (Deep) Reinforcement Learning? AlphaStar: Mastering the Real-Time Strategy Game StarCraft II Apr. 2019

  9. Why (Deep) Reinforcement Learning? MineRL competition: Minecraft ObtainDiamond task. Jun-Oct. 2019. http://minerl.io/competition/

  10. Beyond Games -> Intelligent Agents Autonomous vehicles Intelligent and autonomous agents are as good as or doing better than human.

  11. Beyond Games -> Robot Control Unmanned Aircraft Drone Control Intelligent and autonomous agents are as good as or doing better than human.

  12. Beyond Games -> Robot Control Industrial Robots Robot Control Intelligent and autonomous agents are as good as or doing better than human.

  13. ? Reinforcement Learning What is it? Training intelligent agents?

  14. Reinforcement Learning What is it? Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment to maximize some notion of cumulative reward. (From Wikipedia)

  15. Scenario of Reinforcement Learning Action Observation State Change the environment Agent Reward Environment

  16. Scenario of Reinforcement Learning Action Observation State Change the environment Agent Don’t do Reward that Environment

  17. Scenario of Reinforcement Learning Agent learns to take actions maximizing expected reward. Action Observation State Change the environment Agent Reward Thank you. Environment https://yoast.com/how-to-clean-site-structure/

  18. Reinforcement Learning ≈ Looking for a Function Actor/Policy Observation Action Action = Function Function π ( Observation ) output input Used to pick the Reward best function Environment

  19. Learning to play Go Action Observation Reward Next Move Environment

  20. Agent learns to take Learning to play Go actions maximizing expected reward. Action Observation Reward reward = 0 in most cases If win, reward = 1 If loss, reward = -1 Environment

  21. Example: Playing Video Games • Space invader

  22. Example: Playing Video Game Start with observation 𝑡 1 Observation 𝑡 3 Observation 𝑡 2 Obtain reward Obtain reward 𝑠 2 � 5 𝑠 1 � 0 Action 𝑏 2 � �fire� Action 𝑏 1 � �right� (kill an alien) Usually there is some randomness in the environment

  23. Example: Playing Video Game Start with observation 𝑡 1 Observation 𝑡 � Observation 𝑡 � This is an episode . After many turns Game Over Learn to maximize the (spaceship destroyed) expected cumulative reward per episode Obtain reward 𝑠 � Action 𝑏 �

  24. ? Reinforcement Learning vs Machine Learning Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

  25. Branches of Machine Learning Supervised Unsupervised Learning Learning Machine Learning Reinforcement Learning From David Silver’s Slides

  26. ? Discussion ? Topics for today v What is reinforcement learning? v Difference from other machine learning paradigms? v Application stories. v Topics to be covered in this course. v Course logistics 26

  27. ? Other AI problems? v AI Planning v Supervised learning v Unsupervised learning v Imitation learning (inverse reinforcement learning) 27

  28. RL involves 4 key aspects RL involves 4 key aspects 1. Optimization. 2. Exploration. 1. Optimization. 2. Exploration. v Goal is to find an optimal way to make decisions, with v Goal is to find an optimal way maximized total cumulated to make decisions, with rewards maximized total cumulated rewards 4. Delayed consequences 2. Generalization. 4. Delayed consequences 2. Generalization. v Programming all possibilities v Programming is not possible. all possibilities is not possible. $5 $20 28 $5 $20 28

  29. AI planning vs RL • AI planning: – Optimization – Generalization – No Exploration – Delayed consequences • Computes good sequence of decisions • But given model of how decisions impact world 29

  30. AI planning vs RL • AI planning: – Optimization: – Objective: Reward (e.g., likelihood of winning the game) – Generalization – Apply for all possible scenarios – No Exploration – Delayed consequences – A good move may lead to winning the game after multi-steps. • Computes good sequence of decisions • But given model of how decisions impact world 30

  31. ? Supervised Learning vs RL • Supervised Learning: – Optimization – Generalization – No Exploration – No Delayed consequences • Learns from experience • But provided correct labels 31

  32. ? Supervised Learning vs RL • Supervised Learning: – Optimization – Objective: Minimize the classification loss – Generalization – From training data to testing data – No Exploration – No Delayed consequences • Learns from experience • But provided correct labels 32

  33. Unsupervised Learning vs RL • Unsupervised Learning: – Optimization – Generalization – No Exploration – No Delayed consequences • Learns from experience • But no labels from world 33

  34. Unsupervised Learning vs RL • Unsupervised Learning: – Optimization – e.g., k-means, – objective: minimize within-cluster distance – Generalization – e.g., k-means, – New data have the same clusters (centroids) – No Exploration – No Delayed consequences • Learns from experience • But no labels from world 34

  35. Imitation Learning vs RL • Imitation Learning: – Optimization – Generalization – No Exploration – Delayed consequences • Learns from experience of others • Assumes input demos of good policies 35

  36. Taxi driver passenger-seeking strategy Expert drivers’ decision-making strategy leads to high hourly income Path 1 Paths Pa We Weathe Da Day of a Tr Traffic r we week alon al ong pat ath Rainy Week day Moderate # 1 Clear Week day Light # 2 Path 2 Rainy Weekend Light # 3 Path 3 Imitation learning Output: Input: Reward function Expert driver’s R(path)=f(weather, day of week, traffic) trajectories

  37. Imitation Learning vs RL • Reinforcement Learning: Given experts demonstration, inversely infer experts’ reward function. – Optimization –Objective: maximize the likelihood of the observed data – Generalization –New data from the expert matches the learned reward function – No Exploration – Delayed consequences –The same as RL • Learns from experience of others • Assumes input demos of good policies 37

  38. Reinforcement Learning zation. • Imitation Learning: ng ties – Optimization ssible. • Cumulative reward – Generalization • To all scenarios – Exploration • Evaluate the reward of different choices/actions – Delayed consequences • Sparse reward • No data collected initially. • Learning as collecting data through exploration 38

  39. Branches of Machine Learning AI planning Supervised Unsupervised Learning Learning Machine Learning Reinforcement Learning Imitation learning From David Silver’s Slides

  40. Topics for today v What is reinforcement learning? v Difference from Supervised and unsupervised machine learning? v Application stories. v Topics to be covered in this course. v Course logistics 40

  41. Many Faces of Reinforcement Learning Teaching Assistant Computer Science Engineering Neuroscience Machine Learning Optimal Reward Control System Reinforcement Learning Operations Classical/Operant Research Conditioning Bounded Mathematics Psychology Rationality Economics From David Silver’s Slides

  42. Why Now? ������������������������������� �������� Intelligent Agents

  43. Why Now? � AI Challenges

Recommend


More recommend