CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1, [Sze] Chapter 1 University of Waterloo CS885 Spring 2018 Pascal Poupart 1

Outline • Introduction to Reinforcement Learning • Course website and logistics University of Waterloo CS885 Spring 2018 Pascal Poupart 2

Machine Learning • Traditional computer science – Program computer for every task • New paradigm – Provide examples to machine – Machine learns to accomplish a task based on the examples University of Waterloo CS885 Spring 2018 Pascal Poupart 3

Machine Learning • Success mostly due to supervised learning – Bottleneck: need lots of labeled data • Alternatives – Unsupervised learning, semi-supervised learning – Reinforcement Learning University of Waterloo CS885 Spring 2018 Pascal Poupart 4

What is Reinforcement Learning? • Reinforcement learning is also known as – Optimal control – Approximate dynamic programming – Neuro-dynamic programming • Wikipedia: reinforcement learning is an area of machine learning inspired by behavioural psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward . University of Waterloo CS885 Spring 2018 Pascal Poupart 5

Animal Psychology • Negative reinforcements: – Pain and hunger • Positive reinforcements: – Pleasure and food • Reinforcements used to train animals • Let’s do the same with computers! University of Waterloo CS885 Spring 2018 Pascal Poupart 6

Reinforcement Learning Problem Agent State Action Reward Environment Goal: Learn to choose actions that maximize rewards University of Waterloo CS885 Spring 2018 Pascal Poupart 7

RL Examples • Game playing (go, atari, backgammon) • Operations research (pricing, vehicle routing) • Elevator scheduling • Helicopter control • Spoken dialog systems • Data center energy optimization • Self-managing network systems • Autonomous vehicles • Computational finance University of Waterloo CS885 Spring 2018 Pascal Poupart 8

Operations research • Example: vehicle routing • Agent: vehicle routing software • Environment: stochastic demand • State: vehicle location, capacity and depot requests • Action: vehicle route • Reward: - travel costs University of Waterloo CS885 Spring 2018 Pascal Poupart 9

Robotic Control • Example: helicopter control • Agent: controller • Environment: helicopter • State: position, orientation, velocity and angular velocity • Action: collective pitch, cyclic pitch, tail rotor control • Reward: - deviation from desired trajectory • 2008 (Andrew Ng): automated helicopter wins acrobatic competition against humans University of Waterloo CS885 Spring 2018 Pascal Poupart 10

Game Playing • Example: Go (one of the oldest and hardest board games) • Agent: player • Environment: opponent • State: board configuration • Action: next stone location • Reward: +1 win / -1 loose • 2016: AlphaGo defeats top player Lee Sedol (4-1) – Game 2 move 37: AlphaGo plays unexpected move (odds 1/10,000) University of Waterloo CS885 Spring 2018 Pascal Poupart 11

Conversational agent • Agent: virtual assistant • Environment: user • State: conversation history • Action: next utterance • Reward: points based on task completion, user satisfaction, etc. • Today: active area of research University of Waterloo CS885 Spring 2018 Pascal Poupart 12

Computational Finance • Automated trading • Agent: trading software • Environment: other traders • State: price history • Action: buy/sell/hold • Reward: amount of profit Example: how to purchase a large # of shares in a short period of time without affecting the price University of Waterloo CS885 Spring 2018 Pascal Poupart 13

Reinforcement Learning • Comprehensive, but challenging form of machine learning – Stochastic environment – Incomplete model – Interdependent sequence of decisions – No supervision – Partial and delayed feedback • Long term goal : lifelong machine learning University of Waterloo CS885 Spring 2018 Pascal Poupart 14

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1, [Sze] Chapter 1 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Introduction to Reinforcement Learning Course website and

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7,

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar]

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put]

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

\ Task Scheduling in High-Performance Computing Thomas McSweeney School of Mathematics The

SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background

Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V.

Deep Reinforcement Learning Prof. Kuan-Ting Lai 2020/3/5 Course Requirements Kaggle-style

Reinforcement Learning: Basic models and algorithms Optimal decisions, Part VII Christos

55% Didactic Instruction Ongoing Training/PM 71% Lectures Monthly Feedback 1 10/6/2017

Presentation of WebCT usage in deploying quiz assignments Ivica.Matotek@CARNet.hr

Mat MattNet tNet: : Modu Modular Atten lar Attention tion Network for Referring Network for