ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH - PowerPoint PPT Presentation

ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH Sebastian Heinz, CEO Oliver Guggenbühl, Consultant Zürich, 18th June 2019

AGENDA R-Users Zurich meetup COMPANY PROFILE 1 2 INTRODUCTION TO REINFORCEMENT LEARNING 3 THEORETICAL OVERVIEW 4 IMPLEMENTATION IN R 5 SUPER MARIO AI USE CASE 6 QUESTIONS 18.06.19 R-Users Zurich meetup

STATWORX COMPANY PROFILE Information Services Project approach 18.06.19 R-Users Zurich meetup

STATWORX Facts and figures CLIENT EXCERPT COMPANY PROFILE STATWORX is a consulting company for data science, machine learning, and AI located in Frankfurt, Vienna and Zurich. We support our customers in the development and implementation of data science and machine learning projects as well as data driven products. 2011 3 40 FOUNDED OFFICES EMPLOYEES 200+ 50+ 1000+ DATA SCIENCE INDUSTRY DATA ACADEMY PROJECTS CUSTOMERS PARTICIPANTS TOOL STACK & PARTNERS 18.06.19 R-Users Zurich meetup

END-2-END DATA CONSULTING We support our customers along the whole process of data driven decision making DATA STRATEGY DATA ENGINEERING DATA SCIENCE DATA OPERATIONS DATA ACADEMY Ideation, communication Design and implementation Development, training and Implementation and Planning and execution of and steering of data of data pipelines, storages evaluation of machine operation of machine customer specific data products and strategies and architectures learning and AI models learning models science trainings Defining the company's Building data pipelines and Building predictive models Efficient operations of Knowledge transfer and overall data strategy putting models into production that drive and create value modes and products expertise building 18.06.19 R-Users Zurich meetup

INTRODUCTION REINFORCEMENT LEARNING Where is Reinforcement Learning being used? A brief history of Data Science What distinguishes Reinforcement Learning from Supervised & Unsupervised Learning? 18.06.19 R-Users Zurich meetup

INTRODUCTION Reinforcement Learning is currently one of the hottest ML topics 18.06.19 R-Users Zurich meetup

A BRIEF HISTORY OF DATA SCIENCE The history of Data Science and AI 2018 2014 Google AI TensorFlow 2010 OpenAI’s DotA-2 1999 Deep 1997 Learning 1987 1957 Amazon Google AWS CNN AlphaGo Reinforcement Data Science vs. Networks Learning 2015 Statistics (Tuckey) Kaggle Platform First RNN Nvidia GPU 2012 First NIPS Rosenblatt Networks Conference Perceptron 2006 1998 1989 1962 AI „Hope“ AI „Winter“ AI „Rise“ 18.06.19 R-Users Zurich meetup

MACHINE LEARNING OVERVIEW Machine Learning Applications REINFORCEMENT SUPERVISED LEARNING UNSUPERVISED LEARNING LEARNING Classification Regression Clustering Anomaly Detection Dynamic Environments Agent Environment 18.06.19 R-Users Zurich meetup

INTRODUCTION What is Reinforcement Learning? „Instead of relying on a set of (labelled or unlabelled) training data, Reinforcement Learning relies on being able to monitor the response of the actions taken by the agent.“ 18.06.19 R-Users Zurich meetup

THEORY REINFORCEMENT LEARNING How does Reinforcement Learning work? The Gridworld problem as an example 18.06.19 R-Users Zurich meetup

THEORY How does Reinforcement Learning work? Agent Environment 18.06.19 R-Users Zurich meetup

THEORY How does Reinforcement Learning work? Agent State s t Action a t Environment 18.06.19 R-Users Zurich meetup

THEORY How does Reinforcement Learning work? Agent State s t Reward r t Action a t r t+1 Environment s t+1 18.06.19 R-Users Zurich meetup

THEORY How does Reinforcement Learning work? Agent State s t Reward r t Action a t r t+1 Environment s t+1 Agent tries to maximize his reward by choosing appropriate actions at a given state of the environment. 18.06.19 R-Users Zurich meetup

EXAMPLE: GRIDWORLD Reinforcement Learning Use Case Starting Field Goal Field 18.06.19 R-Users Zurich meetup

EXAMPLE: GRIDWORLD Reinforcement Learning Use Case Ideal return: -6 (6 steps to complete the episode) 18.06.19 R-Users Zurich meetup

Q-LEARNING Determining the optimal policy for an environment 𝑅 𝑡, 𝑏 = 𝑠 + γmax a’ Q(s’, a’) immediate discount future Q-Value reward factor reward Immediate reward Discount factor Future reward Q-Value • the immediate reward of a • Steers the rate of • the maximum possible • represents the maximum possible action taken, as considering future rewards future reward after possible reward at the end defined by the environment transitioning to the next of the game for action a in • Small values promote state s’ by choosing action • there might be no decisions that generate state s a immediate rewards, but immediate reward • does so for each possible only delayed rewards • Larger values favor action a for the current decisions that generate state s future reward 18.06.19 R-Users Zurich meetup

Q-LEARNING Determining the optimal policy for an environment 𝑅 14, 𝑠𝑗𝑕ℎ𝑢 = 1 + 0.9 ・ 0 Assuming that the discount factor γ = 0.9 and the final reward r = 1: 18.06.19 R-Users Zurich meetup

IMPLEMENTATION IN R THE reinforcelearn PACKAGE The reinforcelearn Package Live Demo Advanced Functionalities 18.06.19 R-Users Zurich meetup

THE reinforcelearn PACKAGE Reinforcement Learning implementation in R reinforcelearn package: library(reinforcelearn) # Create an environment The reinforcelearn package offers easy tools to create env <- makeEnvironment() environments, agents and let them interact. # Create an agent agent <- makeAgent() # Let the agent interact with the environment interact(env, agent) 18.06.19 R-Users Zurich meetup

THE reinforcelearn PACKAGE Reinforcement Learning implementation in R Gridworld in reinforcelearn : library(reinforcelearn) # Create an environment The Gridworld environment can be easily created with only a env <- makeEnvironment( "gridworld" , few lines of code: shape = c(4, 4), goal.states = 15, initial.state = 0) 18.06.19 R-Users Zurich meetup

THE reinforcelearn PACKAGE Reinforcement Learning implementation in R Gridworld in reinforcelearn : library(reinforcelearn) # Set up the agent The agent consists of several parts: policy <- makePolicy( ”epsilon.greedy”, • The policy defines the type of decision rules. epsilon = 0.1 ) val.fun <- makeValueFunction( "table" ) • The value function determines how the current state of the algorithm <- makeAlgorithm( "qlearning" ) agent is to be evaluated. • The algorithm determines how the optimal policy is to be # Create the agent found and learnt. agent <- makeAgent(policy = policy, val.fun = val.fun, algorithm = algorithm) 18.06.19 R-Users Zurich meetup

THE reinforcelearn PACKAGE Reinforcement Learning implementation in R Interaction: library(reinforcelearn) # Let the agent interact with the environment Once the environment and agent have been created we can let interact(env, agent, them interact with the reinforcelearn::interact() n.episodes = 500, visualize = TRUE, function. learn = TRUE) 18.06.19 R-Users Zurich meetup

LIVE DEMO Reinforcement Learning in action o m e D e v i L 18.06.19 R-Users Zurich meetup

OPENAI GYMS Advanced functionalities with OpenAI gyms Using OpenAI gyms in reinforcelearn : library(reinforcelearn) library(reticulate) reinforcelearn allows for easy access to gym environments created by OpenAI. # Create an environment env <- makeEnvironment( "gym" , gym.name = "SpaceInvaders—v0" ) 18.06.19 R-Users Zurich meetup

NEURAL NETWORKS Advanced functionalities with Keras library(reinforcelearn) library(keras) Using neural networks in reinforcelearn : env <- makeEnvironment( "gridworld" , shape =c(4, 4), reinforcelearn allows for easy integration of neural goal.states = 15) networks made in keras into your value function. model <- keras_model_sequential() %>% layer_dense(units = 4, input_shape = 1, activation = "linear" ) %>% compile(optimizer = optimizer_sgd(lr = 0.1), loss = "mae" ) policy <- makePolicy( "epsilon.greedy", epsilon = 0.2) algorithm <- makeAlgorithm("qlearning") val.fun <- makeValueFunction("neural.network", model = model) agent <- makeAgent(policy, val.fun, algorithm) interact(env, agent, n.episodes = 100) 18.06.19 R-Users Zurich meetup

USE CASE DEMONSTRATION SUPER MARIO BROS. AI Overview States, Actions & Rewards Training and Results 18.06.19 R-Users Zurich meetup

GYM-SUPER-MARIO-BROS There is a great gym for Super Mario Bros (NES) GYM-SUPER-MARIO-BROS An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) using the nes-py emulator. GAME MODES Standard Downsample Pixel Rectangle 18.06.19 R-Users Zurich meetup

ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH - PowerPoint PPT Presentation

ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH Sebastian Heinz, CEO Oliver Guggenbhl, Consultant Zrich, 18th June 2019 AGENDA R-Users Zurich meetup COMPANY PROFILE 1 2 INTRODUCTION TO REINFORCEMENT LEARNING 3

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Inflation Targeting Lars E.O. Svensson www.larseosvensson.net May 2012 Lars E.O. Svensson

The Promise of FinTech Something New Under the Sun? Slides to accompany a speech given by

Probabilit y densit y f u nctions STATISTIC AL TH IN K IN G IN P YTH ON ( PAR T 1 ) J u stin

Discussion of Policy Rules and Economic Performance by Nikolkso-Rzevskyy, Papell &

GPP 501 Microeconomic Analysis for Public Policy Fall 2017 Given by Kevin Milligan Vancouver

Computational Social Choice: Spring 2017 Ulle Endriss Institute for Logic, Language and

Optimization with Online and Massive Data Yinyu Ye K.T. Li Chair Professor of Engineering

L ECTURE 31: T ASK A LLOCATION 4 T EACHER : G IANNI A. D I C ARO T YPES OF A UCTIONS FOR T ASK A

ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH - PowerPoint PPT Presentation

ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH Sebastian Heinz, CEO Oliver Guggenbhl, Consultant Zrich, 18th June 2019 AGENDA R-Users Zurich meetup COMPANY PROFILE 1 2 INTRODUCTION TO REINFORCEMENT LEARNING 3

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Inflation Targeting Lars E.O. Svensson www.larseosvensson.net May 2012 Lars E.O. Svensson

The Promise of FinTech Something New Under the Sun? Slides to accompany a speech given by

Probabilit y densit y f u nctions STATISTIC AL TH IN K IN G IN P YTH ON ( PAR T 1 ) J u stin

Discussion of Policy Rules and Economic Performance by Nikolkso-Rzevskyy, Papell &amp;

GPP 501 Microeconomic Analysis for Public Policy Fall 2017 Given by Kevin Milligan Vancouver

Computational Social Choice: Spring 2017 Ulle Endriss Institute for Logic, Language and

Optimization with Online and Massive Data Yinyu Ye K.T. Li Chair Professor of Engineering

L ECTURE 31: T ASK A LLOCATION 4 T EACHER : G IANNI A. D I C ARO T YPES OF A UCTIONS FOR T ASK A

Discussion of Policy Rules and Economic Performance by Nikolkso-Rzevskyy, Papell &