zurich r user group reinforcement learning using r
play

ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH - PowerPoint PPT Presentation

ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH Sebastian Heinz, CEO Oliver Guggenbhl, Consultant Zrich, 18th June 2019 AGENDA R-Users Zurich meetup COMPANY PROFILE 1 2 INTRODUCTION TO REINFORCEMENT LEARNING 3


  1. ZURICH R-USER GROUP REINFORCEMENT LEARNING USING R STATWORX GmbH Sebastian Heinz, CEO Oliver Guggenbühl, Consultant Zürich, 18th June 2019

  2. AGENDA R-Users Zurich meetup COMPANY PROFILE 1 2 INTRODUCTION TO REINFORCEMENT LEARNING 3 THEORETICAL OVERVIEW 4 IMPLEMENTATION IN R 5 SUPER MARIO AI USE CASE 6 QUESTIONS 18.06.19 R-Users Zurich meetup

  3. STATWORX COMPANY PROFILE Information Services Project approach 18.06.19 R-Users Zurich meetup

  4. STATWORX Facts and figures CLIENT EXCERPT COMPANY PROFILE STATWORX is a consulting company for data science, machine learning, and AI located in Frankfurt, Vienna and Zurich. We support our customers in the development and implementation of data science and machine learning projects as well as data driven products. 2011 3 40 FOUNDED OFFICES EMPLOYEES 200+ 50+ 1000+ DATA SCIENCE INDUSTRY DATA ACADEMY PROJECTS CUSTOMERS PARTICIPANTS TOOL STACK & PARTNERS 18.06.19 R-Users Zurich meetup

  5. END-2-END DATA CONSULTING We support our customers along the whole process of data driven decision making DATA STRATEGY DATA ENGINEERING DATA SCIENCE DATA OPERATIONS DATA ACADEMY Ideation, communication Design and implementation Development, training and Implementation and Planning and execution of and steering of data of data pipelines, storages evaluation of machine operation of machine customer specific data products and strategies and architectures learning and AI models learning models science trainings Defining the company's Building data pipelines and Building predictive models Efficient operations of Knowledge transfer and overall data strategy putting models into production that drive and create value modes and products expertise building 18.06.19 R-Users Zurich meetup

  6. INTRODUCTION REINFORCEMENT LEARNING Where is Reinforcement Learning being used? A brief history of Data Science What distinguishes Reinforcement Learning from Supervised & Unsupervised Learning? 18.06.19 R-Users Zurich meetup

  7. INTRODUCTION Reinforcement Learning is currently one of the hottest ML topics 18.06.19 R-Users Zurich meetup

  8. A BRIEF HISTORY OF DATA SCIENCE The history of Data Science and AI 2018 2014 Google AI TensorFlow 2010 OpenAI’s DotA-2 1999 Deep 1997 Learning 1987 1957 Amazon Google AWS CNN AlphaGo Reinforcement Data Science vs. Networks Learning 2015 Statistics (Tuckey) Kaggle Platform First RNN Nvidia GPU 2012 First NIPS Rosenblatt Networks Conference Perceptron 2006 1998 1989 1962 AI „Hope“ AI „Winter“ AI „Rise“ 18.06.19 R-Users Zurich meetup

  9. MACHINE LEARNING OVERVIEW Machine Learning Applications REINFORCEMENT SUPERVISED LEARNING UNSUPERVISED LEARNING LEARNING Classification Regression Clustering Anomaly Detection Dynamic Environments Agent Environment 18.06.19 R-Users Zurich meetup

  10. INTRODUCTION What is Reinforcement Learning? „Instead of relying on a set of (labelled or unlabelled) training data, Reinforcement Learning relies on being able to monitor the response of the actions taken by the agent.“ 18.06.19 R-Users Zurich meetup

  11. THEORY REINFORCEMENT LEARNING How does Reinforcement Learning work? The Gridworld problem as an example 18.06.19 R-Users Zurich meetup

  12. THEORY How does Reinforcement Learning work? Agent Environment 18.06.19 R-Users Zurich meetup

  13. THEORY How does Reinforcement Learning work? Agent State s t Action a t Environment 18.06.19 R-Users Zurich meetup

  14. THEORY How does Reinforcement Learning work? Agent State s t Reward r t Action a t r t+1 Environment s t+1 18.06.19 R-Users Zurich meetup

  15. THEORY How does Reinforcement Learning work? Agent State s t Reward r t Action a t r t+1 Environment s t+1 Agent tries to maximize his reward by choosing appropriate actions at a given state of the environment. 18.06.19 R-Users Zurich meetup

  16. EXAMPLE: GRIDWORLD Reinforcement Learning Use Case Starting Field Goal Field 18.06.19 R-Users Zurich meetup

  17. EXAMPLE: GRIDWORLD Reinforcement Learning Use Case Starting Field Goal Field 18.06.19 R-Users Zurich meetup

  18. EXAMPLE: GRIDWORLD Reinforcement Learning Use Case Ideal return: -6 (6 steps to complete the episode) 18.06.19 R-Users Zurich meetup

  19. Q-LEARNING Determining the optimal policy for an environment 𝑅 𝑡, 𝑏 = 𝑠 + γmax a’ Q(s’, a’) immediate discount future Q-Value reward factor reward Immediate reward Discount factor Future reward Q-Value • the immediate reward of a • Steers the rate of • the maximum possible • represents the maximum possible action taken, as considering future rewards future reward after possible reward at the end defined by the environment transitioning to the next of the game for action a in • Small values promote state s’ by choosing action • there might be no decisions that generate state s a immediate rewards, but immediate reward • does so for each possible only delayed rewards • Larger values favor action a for the current decisions that generate state s future reward 18.06.19 R-Users Zurich meetup

  20. Q-LEARNING Determining the optimal policy for an environment 𝑅 14, 𝑠𝑗𝑕ℎ𝑢 = 1 + 0.9 ・ 0 Assuming that the discount factor γ = 0.9 and the final reward r = 1: 18.06.19 R-Users Zurich meetup

  21. IMPLEMENTATION IN R THE reinforcelearn PACKAGE The reinforcelearn Package Live Demo Advanced Functionalities 18.06.19 R-Users Zurich meetup

  22. THE reinforcelearn PACKAGE Reinforcement Learning implementation in R reinforcelearn package: library(reinforcelearn) # Create an environment The reinforcelearn package offers easy tools to create env <- makeEnvironment() environments, agents and let them interact. # Create an agent agent <- makeAgent() # Let the agent interact with the environment interact(env, agent) 18.06.19 R-Users Zurich meetup

  23. THE reinforcelearn PACKAGE Reinforcement Learning implementation in R Gridworld in reinforcelearn : library(reinforcelearn) # Create an environment The Gridworld environment can be easily created with only a env <- makeEnvironment( "gridworld" , few lines of code: shape = c(4, 4), goal.states = 15, initial.state = 0) 18.06.19 R-Users Zurich meetup

  24. THE reinforcelearn PACKAGE Reinforcement Learning implementation in R Gridworld in reinforcelearn : library(reinforcelearn) # Set up the agent The agent consists of several parts: policy <- makePolicy( ”epsilon.greedy”, • The policy defines the type of decision rules. epsilon = 0.1 ) val.fun <- makeValueFunction( "table" ) • The value function determines how the current state of the algorithm <- makeAlgorithm( "qlearning" ) agent is to be evaluated. • The algorithm determines how the optimal policy is to be # Create the agent found and learnt. agent <- makeAgent(policy = policy, val.fun = val.fun, algorithm = algorithm) 18.06.19 R-Users Zurich meetup

  25. THE reinforcelearn PACKAGE Reinforcement Learning implementation in R Interaction: library(reinforcelearn) # Let the agent interact with the environment Once the environment and agent have been created we can let interact(env, agent, them interact with the reinforcelearn::interact() n.episodes = 500, visualize = TRUE, function. learn = TRUE) 18.06.19 R-Users Zurich meetup

  26. LIVE DEMO Reinforcement Learning in action o m e D e v i L 18.06.19 R-Users Zurich meetup

  27. OPENAI GYMS Advanced functionalities with OpenAI gyms Using OpenAI gyms in reinforcelearn : library(reinforcelearn) library(reticulate) reinforcelearn allows for easy access to gym environments created by OpenAI. # Create an environment env <- makeEnvironment( "gym" , gym.name = "SpaceInvaders—v0" ) 18.06.19 R-Users Zurich meetup

  28. NEURAL NETWORKS Advanced functionalities with Keras library(reinforcelearn) library(keras) Using neural networks in reinforcelearn : env <- makeEnvironment( "gridworld" , shape =c(4, 4), reinforcelearn allows for easy integration of neural goal.states = 15) networks made in keras into your value function. model <- keras_model_sequential() %>% layer_dense(units = 4, input_shape = 1, activation = "linear" ) %>% compile(optimizer = optimizer_sgd(lr = 0.1), loss = "mae" ) policy <- makePolicy( "epsilon.greedy", epsilon = 0.2) algorithm <- makeAlgorithm("qlearning") val.fun <- makeValueFunction("neural.network", model = model) agent <- makeAgent(policy, val.fun, algorithm) interact(env, agent, n.episodes = 100) 18.06.19 R-Users Zurich meetup

  29. USE CASE DEMONSTRATION SUPER MARIO BROS. AI Overview States, Actions & Rewards Training and Results 18.06.19 R-Users Zurich meetup

  30. GYM-SUPER-MARIO-BROS There is a great gym for Super Mario Bros (NES) GYM-SUPER-MARIO-BROS An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) using the nes-py emulator. GAME MODES Standard Downsample Pixel Rectangle 18.06.19 R-Users Zurich meetup

Recommend


More recommend