A Quick Look at the Reinforcement Learning course A. LAZARIC ( - PowerPoint PPT Presentation

A Quick Look at the “Reinforcement Learning” course A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA SequeL – INRIA Lille MVA-RL Course

Why A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 2/24

Why: Important Problems A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 3/24

Why: Important Problems ◮ Autonomous robotics ◮ Elder care ◮ Exploration of unknown/dangerous environments ◮ Robotics for entertainment A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 4/24

Why: Important Problems ◮ Autonomous robotics ◮ Financial applications ◮ Trading execution algorithms ◮ Portfolio management ◮ Option pricing A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 5/24

Why: Important Problems ◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Energy grid integration ◮ Maintenance scheduling ◮ Energy market regulation ◮ Energy production management A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 6/24

Why: Important Problems ◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Web advertising ◮ Product recommendation ◮ Date matching A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 7/24

Why: Important Problems ◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Bike sharing optimization ◮ Social applications ◮ Election campaign ◮ ER service optimization ◮ Resource distribution optimization A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 8/24

Why: Important Problems ◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Social applications ◮ And many more... A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 9/24

What A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 10/24

What: Decision-Making under Uncertainty Environment action / state / actuation perception Agent A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 11/24

How: Reinforcement Learning Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them ( trial–and–error ). In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards ( delayed reward ). “An introduction to reinforcement learning”, Sutton and Barto (1998). A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 12/24

How: the Course Environment action / state / actuation perception Agent Formal and rigorous approach to the RL’s way to decision-making under uncertainty A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 13/24

What: the Highlights of the Course How do we formalize the agent-environment interaction? Markov Decision Process and Policy A Markov decision process (MDP) is represented by the tuple M = � X , A , r , p � where X is the state space, A is the action space, r : X × A → [ 0 , B ] is the reward function, p is the dynamics. At time t ∈ N a decision rule π t : X → A is a mapping from states to actions and a policy (strategy, plan) is a sequence of decision rules π = ( π 0 , π 1 , π 2 , . . . ) . The Bellman equations � V π ( x ) = r ( x , π ( x )) + γ p ( y | x , π ( x )) V π ( y ) , y � � � V ∗ ( x ) = max p ( y | x , a ) V ∗ ( y ) r ( x , a ) + γ . a ∈ A y A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 14/24

What: the Highlights of the Course How do we solve an MDP? Dynamic Programming Value Iteration V k + 1 = T V k Policy Iteration ◮ Evaluate : given π k compute V π k . ◮ Improve : given V π k compute π k + 1 = greedy ( V π k ) A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 15/24

What: the Highlights of the Course How do we solve an MDP “online”? Q-learning Given a observed transition x , a , x ′ , r update � � a ′ Q k ( x ′ , a ′ ) Q k + 1 ( x , a ) = ( 1 − α ) Q k ( x , a ) + α r + max . A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 16/24

What: the Highlights of the Course How do we effectively trade-off exploration and exploitation? Multi-arm Bandit Given K arms we define the regret over n rounds of a bandit strategy as n n � � R n = X i ∗ , t − X I t , t . t = 1 t = 1 For the UCB strategy we can prove � b 2 R n ≤ log ( n ) . ∆ i i � = i ∗ A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 17/24

What: the Highlights of the Course How do we solve a “huge” MDP? Approximate Dynamic Programming Approximate Value Iteration V k + 1 = � ˆ T ˆ V k Approximate Policy Iteration ◮ Evaluate : given π k compute ˆ V π k . ◮ Improve : given ˆ V π k compute ˆ π k + 1 ≈ greedy ( ˆ V π k ) A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 18/24

What: the Highlights of the Course How “sample-efficient” are these algorithms? Sample Complexity of LSPI � C ρ log ( 1 /δ ) f ∈F || V ∗ − f || 2 ,ρ + || V π K − V ∗ || 2 ,ρ ≤ inf . 1 − γ n A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 19/24

See you on Tue at 11h in C103! S. des Conférences S. Visio DSI S. Renaudeau C518 Amphi Tocqueville Bretécher Uderzo Amphi Marie Curie S. des Comm. Fonteneau 131 bis 131 FCD 132 1 3 3 135 Amphi 121 Condorcet Amphi 109 Amphi e-media A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 20/24

Who Lectures Practical Sessions Alessandro LAZARIC Emilie KAUFMANN SequeL Team Telecom ParisTech INRIA-Lille Nord Europe emilie.kaufmann@telecom-paristech.fr alessandro.lazaric@inria.fr perso.telecom-paristech.fr/˜kaufmann/ researchers.lille.inria.fr/˜lazaric/ A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 21/24

When/What/Where Date Topic Classroom 01/10 Intro/MDP C103 08/10 Dynamic Programming C103 15/10 RL Algorithms C103 22/10 TP on DP and RL C109 29/10 Multi-arm Bandit (1) C103 05/11 TP on Bandit C109 12/11 Multi-arm Bandit (2) C103 19/11 TP on Bandit C109 26/11 Approximate DP C103 03/12 Sample Complexity of ADP C103 10/12 TP on ADP C109 17/12 Guest lectures + Internships C103 (TBC) 14/01 Evaluation C103 (TBC) Lectures are from 11am to 1pm, TP should be from 11am to 1:15pm. A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 22/24

Evaluation ◮ Papers review + oral presentation ◮ Projects ◮ Stages ◮ PhD A. LAZARIC – Introduction to Reinforcement Learning Sept 27, 2013 - 23/24

Reinforcement Learning Alessandro Lazaric alessandro.lazaric@inria.fr sequel.lille.inria.fr

A Quick Look at the Reinforcement Learning course A. LAZARIC ( - PowerPoint PPT Presentation

A Quick Look at the Reinforcement Learning course A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Why A. LAZARIC Introduction to Reinforcement Learning Sept 27, 2013 - 2/24

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

A Quick Look at the Reinforcement Learning course A. LAZARIC ( SequeL Team @INRIA-Lille )

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Machine Learning, Reinforcement Learning Machine Learning: A quick retrospective AI Class 25

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

DATA SCIENCE AT SCALE Mar arketin ing op optimization for media ia plan lannin ing Data

The Early Israelite Monarchy in Text and Archaeology The Days of David and Solomon Dr. Kyle

An Introduction To Fossils Dr Liam Herringshaw Palaeontologist, Hidden Horizons Director,

1 & 2 Samuel Series Lesson #023 September 1, 2015 Dean Bible Ministries

trt t ttst r

(In)Security of IoT Pascal Lafourcade Chaire de Confiance Num erique 15th March 2016 1 / 19

Outreach and Training Initjatjves Laurent Verstraete (IAS) Frdric Fleuret (LLR) On Behalf of

The Rela/vis/c Quantum World A lecture series on Rela/vity

A Quick Look at the Reinforcement Learning course A. LAZARIC ( - PowerPoint PPT Presentation

A Quick Look at the Reinforcement Learning course A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Why A. LAZARIC Introduction to Reinforcement Learning Sept 27, 2013 - 2/24

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

A Quick Look at the Reinforcement Learning course A. LAZARIC ( SequeL Team @INRIA-Lille )

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Machine Learning, Reinforcement Learning Machine Learning: A quick retrospective AI Class 25

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

DATA SCIENCE AT SCALE Mar arketin ing op optimization for media ia plan lannin ing Data

The Early Israelite Monarchy in Text and Archaeology The Days of David and Solomon Dr. Kyle

An Introduction To Fossils Dr Liam Herringshaw Palaeontologist, Hidden Horizons Director,

1 &amp; 2 Samuel Series Lesson #023 September 1, 2015 Dean Bible Ministries

trt t ttst r

(In)Security of IoT Pascal Lafourcade Chaire de Confiance Num erique 15th March 2016 1 / 19

Outreach and Training Initjatjves Laurent Verstraete (IAS) Frdric Fleuret (LLR) On Behalf of

The Rela/vis/c Quantum World A lecture series on Rela/vity

1 & 2 Samuel Series Lesson #023 September 1, 2015 Dean Bible Ministries