Reinforcement Learning a gentle introduction & industrial - PowerPoint PPT Presentation

Reinforcement Learning a gentle introduction & industrial application Christian Hidber

Learning learning from children FOLIE 2 REINFORCEMENT LEARNING

The game: demo FOLIE 3 REINFORCEMENT LEARNING

The game: setup game engine Goal: maximize sum of rewards Step reward actions game state learner FOLIE 4 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

The game: positive feedback game engine Step reward actions game state learner FOLIE 5 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

The game: negative feedback game engine Step reward actions game state learner FOLIE 6 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

the learned stuff => policy game engine Step reward actions game state Policy learner (rules learned, how to play the game) FOLIE 7 REINFORCEMENT LEARNING

policy improvement => learning game engine Step reward actions game state Policy learner (rules learned, how to play the game) FOLIE 8 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

policy improvement => learning game engine Step reward actions game state Policy learner (rules learned, how to play the game) FOLIE 9 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

Reinforcement learning game engine Key idea: continuously improve policy to increase total reward Step reward actions game state Policy RL algorithm (rules learned, how to play the game) FOLIE 10 REINFORCEMENT LEARNING "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-NC

Episode 1 : play with 1st policy (random) Policy (rules learned, how to play the game) Action from State Policy Next State Reward -1 1 2 3 4 5 6 7 Step # FOLIE 11 REINFORCEMENT LEARNING

Episode 1 : play with 1st policy (random) Policy (rules learned, how to play the game) Action from State Policy Next State Reward 100 1 2 3 4 5 6 7 Step # FOLIE 12 REINFORCEMENT LEARNING

Episode 1 : play with 1st policy (random) Policy (rules learned, how to play the game) Action from State Policy Next State Reward -50 Episode Over Episode Over 1 2 3 4 5 6 7 Step # FOLIE 13 REINFORCEMENT LEARNING

Episode 1 : improve 1st policy for state in step 3 Policy (rules learned, how to play the game) Action from State Policy -50 Episode Over 1 2 3 4 5 6 7 Step # FOLIE 14 REINFORCEMENT LEARNING

Episode 1 : improve 1st policy for state in step 2 Policy (rules learned, how to play the game) Action from State Policy 50 (=+100 -50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 15 REINFORCEMENT LEARNING

Episode 1 : improve 1st policy for state in step 1 Policy (rules learned, how to play the game) Action from State Policy 49 (=-1 +100 -50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 16 REINFORCEMENT LEARNING M3, OCTOBER 2018

Episode 2 : play with 2nd policy Policy (rules learned, how to play the game) Already learned: go left is ok Action from State Policy Next State Reward -1 1 2 3 4 5 6 7 Step # FOLIE 17 REINFORCEMENT LEARNING

Episode 2 : play with 2nd policy Policy (rules learned, how to play the game) Already learned: go left is ok Action from State Policy Next State Reward 100 1 2 3 4 5 6 7 Step # FOLIE 18 REINFORCEMENT LEARNING

Already learned: Episode 2 : play with 2nd policy Policy don’t go up (rules learned, how to play the game) Action from State Policy Next State Reward 100 1 2 3 4 5 6 7 Step # FOLIE 19 REINFORCEMENT LEARNING

Episode 2 : play with 2nd policy Policy (rules learned, how to play the game) Action from State Policy Next State Reward -1 1 2 3 4 5 6 7 Step # FOLIE 20 REINFORCEMENT LEARNING

Episode 2 : play with 2nd policy Policy (rules learned, how to play the game) Action from State Policy Next State Reward -50 Episode Over Episode Over 1 2 3 4 5 6 7 Step # FOLIE 21 REINFORCEMENT LEARNING

Episode 2 : improve 2nd policy for state in step 5 Policy (rules learned, how to play the game) Action from State Policy -50 (=-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 22 REINFORCEMENT LEARNING

Episode 2 : improve 2nd policy for state in step 4 Policy (rules learned, how to play the game) Action from State Policy -51 (=-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 23 REINFORCEMENT LEARNING

Episode 2 : improve 2nd policy for state in step 3 Policy (rules learned, how to play the game) Action from State Policy 49 (=+100-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 24 REINFORCEMENT LEARNING

Episode 2 : improve 2nd policy for state in step 2 Policy (rules learned, how to play the game) Action from State Policy 149 (=+100+100-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 25 REINFORCEMENT LEARNING

Episode 2 : improve 2nd policy for state in step 2 Policy (rules learned, how to play the game) Action from State Policy = some running average of old and new value 149 (=+100+100-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 26 REINFORCEMENT LEARNING

Episode 2 : improve 2nd policy for state in step 1 Policy (rules learned, how to play the game) Action from State Policy 148 (=-1+100+100-1-50) Future Reward (sum of all rewards from current state until ‘game over’) Episode Over 1 2 3 4 5 6 7 Step # FOLIE 27 REINFORCEMENT LEARNING

So far ….. a policy is a map from states to action probabilities FOLIE 28 REINFORCEMENT LEARNING

…updated by the reinforcement learning algorithm a policy is a map from states to action probabilities FOLIE 29 REINFORCEMENT LEARNING

…updated by the reinforcement learning algorithm a policy is a map from states to action probabilities FOLIE 30 REINFORCEMENT LEARNING

After many, many episodes, for each state… FOLIE 31 REINFORCEMENT LEARNING

Algorithm sketch Policy (rules learned, how to play the game) Initialize table with random action probabilities for each state Repeat play episode with policy given by table Record (state 1 ,action 1 ,reward 1 ),(state 2 ,action 2 ,reward 2 ),…. for episode For each step i compute FutureReward i = reward i + reward i+1 +… update table[state i ] s.t. • action i becomes for state i more likely if FutureReward i is “high” • action i becomes for state i less likely if FutureReward i is “low” FOLIE 32 REINFORCEMENT LEARNING

The game: demo FOLIE 35 REINFORCEMENT LEARNING

The bad news: nice idea, but… FOLIE 36 «Image" licensed according to CC BY-SA REINFORCEMENT LEARNING

The bad news: nice idea, but… too many states… too many actions • Too much memory needed • Too much time FOLIE 37 «Image" licensed according to CC BY-SA REINFORCEMENT LEARNING

The solution Policy (rules learned, how to play the game) Idea: Replace lookup table with a neural network that approximates the action probabilities contained in the table Instead of Table[state] = action probabilities Do NeuralNet( state ) ~ action probablities Change to “play episode with policy given by NeuralNet” Change to “update weights of NeuralNet” FOLIE 38 REINFORCEMENT LEARNING

Reinforcement Learning a gentle introduction & industrial - PowerPoint PPT Presentation

Reinforcement Learning a gentle introduction & industrial application Christian Hidber Learning learning from children FOLIE 2 REINFORCEMENT LEARNING The game: demo FOLIE 3 REINFORCEMENT LEARNING The game: setup game engine Goal:

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Health Care Spending and Utilization: Evidence from Perinatal Care in Arkansas C A I T L I N C A

Quality Improvement Techniques for Public Health: Useful in Addressing the Substance Use Disorder

Integrated Disease Surveillance Programme Under umbrella of National Rural Health Mission

Public Data Analytics: The New Essential Service Francesco Mureddu Director - The Lisbon Council

Client Alert CMS Announces Bundled Payments for Care Improvement initiative Contact Attorney

2018 SALES EXCEED CHF 7 BILLION FOR THE FIRST TIME SIKA MADE BINDING OFFER TO ACQUIRE PAREX SIKA

Improving Educational Outcomes and New Mexico Response to Yazzie and Martinez v State of New

Community Participation in Research STREAM Clinical Trial [INSERT NAME OF COUNTRY] 2 Why do we