Learning in Robotic Systems Robotic Agents @ Allegheny College - PowerPoint PPT Presentation

Learning in Robotic Systems Robotic Agents @ Allegheny College Janyl Jumadinova November 27, 2019 Janyl Jumadinova Learning in Robotic Systems November 27, 2019 1 / 13

Reinforcement Learning Basic idea: Receive feedback in the form of rewards. Agent’s utility is defined by the reward function. Must (learn to) act so as to maximize expected rewards. Janyl Jumadinova Learning in Robotic Systems November 27, 2019 2 / 13

Reinforcement Learning Agents can use: model-based learning: model the other agents and compute optimal action based on this model and knowledge of the reward structure (the agent attempts to learn a model of its environment), or model-free: directly learn the expected utility (probability · payoff) of actions in a given state. Janyl Jumadinova Learning in Robotic Systems November 27, 2019 3 / 13

Model-free reinforcement learning Idea : learn how to act without explicitly learning the transition probabilities P ( s ′ | s , a ) Janyl Jumadinova Learning in Robotic Systems November 27, 2019 4 / 13

Model-free reinforcement learning Idea : learn how to act without explicitly learning the transition probabilities P ( s ′ | s , a ) Q-learning : learn an action-utility function Q ( s , a ) that tells us the value of doing action a in state s V ( s ) = max a Q ( s , a ) Q ( s , a ) ← Q ( s , a ) + α ( R ( s ) + γ max ′ a Q ( s ′ , a ′ )), α - alpha : Learning Rate: Extent to which the Q-values are being updated in every iteration. γ - gamma : Discount Rate: How much importance we want to give to future rewards. Selected action (policy): π ( s ) = argmax a Q ( s , a ) Janyl Jumadinova Learning in Robotic Systems November 27, 2019 4 / 13

Q-learning At each step s , choose the action a which maximizes the function Q ( s , a ) – Q is the estimated utility function - it tells us how good an action is given a certain state Q ( s , a ) = immediate reward for making an action + best utility ( Q ) for the resulting state Janyl Jumadinova Learning in Robotic Systems November 27, 2019 5 / 13

Gym’s Taxi Environment https://github.com/openai/gym/blob/master/gym/envs/toy_ text/taxi.py 5x5 grid = 25 possible locations Four locations for pick up and drop off: R - (0,0): 0, G - (0,4): 1, Y - (4,0): 2, B - (4,3): 3 Passenger’s state of being inside the taxi 5 x 5 x 5 x 4 = 500 possible states ( state space ) Janyl Jumadinova Learning in Robotic Systems November 27, 2019 6 / 13

Gym’s Taxi Environment Filled square is the taxi Yellow square: taxi is without a passenger Green square: taxi with a passenger Pipe —: a wall Blue letter: current passenger pick up location Purple letter: current destination Janyl Jumadinova Learning in Robotic Systems November 27, 2019 7 / 13

Gym’s Taxi Environment Six possible actions: 0 = south, 1 = north, 2 = east, 3 = west, 4 = pickup, 5 = dropoff ( action space ) Penalty of -1 for hit walls Reward of +20 for a successful drop off Reward (penalty) of -1 for every time-step it takes Reward (penalty) of -10 for wrong pick up and drop off actions Janyl Jumadinova Learning in Robotic Systems November 27, 2019 8 / 13

Gym’s Taxi Environment Problem Statement (from gym documentation) There are 4 locations (labeled by different letters). The task of the taxi robot is to pick up the passenger at one location and drop off a passenger at another. The taxi robot receives +20 points for a successful drop-off and loses 1 point for every time-step it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions. Janyl Jumadinova Learning in Robotic Systems November 27, 2019 9 / 13

The Reward Table An initial reward table, called ‘P‘, is created when the Taxi environment is initialized. P table is a matrix with rows = number of states and columns = number of actions, giving states x actions matrix This dictionary has the structure action: [(probability, nextstate, reward, done)]. env.P[328] {0: [(1.0, 428, -1, False)], 1: [(1.0, 228, -1, False)], 2: [(1.0, 348, -1, False)], 3: [(1.0, 328, -1, False)], 4: [(1.0, 328, -10, False)], 5: [(1.0, 328, -10, False)]} Janyl Jumadinova Learning in Robotic Systems November 27, 2019 10 / 13

Gym’s Taxi Environment In this environment, the probability is always 1.0. The nextstate is the state the taxi would be in if it takes the action at this index of the dictionary. All the movement actions have a -1 reward. Each successfull dropoff is the end of an episode done flag is used to indicate when the taxi has successfully dropped off a passenger in the right location. — –¿ WALL: can’t pass through, will remain in the same position if tries to move through wall Janyl Jumadinova Learning in Robotic Systems November 27, 2019 11 / 13

Gym’s Taxi Environment The current newest version of gym forcefully stops the environment in 200 steps ( https://github.com/openai/gym/wiki/FAQ ). To avoid this, use env = gym.make("MountainCar-v0").env In state 328, the pickup/dropoff actions have -10 reward. If the taxi was in a state where the taxi has a passenger and is on top of the right destination, we would see a reward of 20 at the dropoff action (5). Janyl Jumadinova Learning in Robotic Systems November 27, 2019 12 / 13

Exploration vs. Exploitation Exploration : change to a different random strategy. Exploitation : keep selecting the best strategy so far. Janyl Jumadinova Learning in Robotic Systems November 27, 2019 13 / 13

Exploration vs. Exploitation Exploration : change to a different random strategy. Exploitation : keep selecting the best strategy so far. epsilon : Probability of selecting random action instead of the ’optimal’ action Janyl Jumadinova Learning in Robotic Systems November 27, 2019 13 / 13

Exploration vs. Exploitation Exploration : change to a different random strategy. Exploitation : keep selecting the best strategy so far. epsilon : Probability of selecting random action instead of the ’optimal’ action 1 TODO: How do changes to epsilon influence the performance of reinforcement learning? 2 TODO: What about alpha, gamma, episodes ? Janyl Jumadinova Learning in Robotic Systems November 27, 2019 13 / 13

Learning in Robotic Systems Robotic Agents @ Allegheny College - PowerPoint PPT Presentation

Learning in Robotic Systems Robotic Agents @ Allegheny College Janyl Jumadinova November 27, 2019 Janyl Jumadinova Learning in Robotic Systems November 27, 2019 1 / 13 Reinforcement Learning Basic idea: Receive feedback in the form of

Self-Supervised Deep Learning for Robotic Grasping Lars Berscheid | KUKA Roboter GmbH | 10/10/2017

Deep Robotic Learning Sergey Levine UC Berkeley Google Brain robotic state low-level

Kronnika Presentation August, 2020 2 2 What is Robotic Process Automation ? Robotic

Robotic assembly projects in JAXA Hiroki Kato Daichi Hirano Keisuke Watanabe Daisuke Joudoi

Robotic Navigation Unit Team 42 Robotic Navigation Unit Dr. Crassidis Faculty Mentor

RVFuzzer: Finding Input Validation Bugs in Robotic Vehicles through Control-Guided Testing Taegyu

Molecule Screen and Cell Quality Molecule Screen and Cell Quality Assessment Assessment

Under the Robotic Knife: A Verifiable Controller for use of Multiple Robotic Arms in Surgery

State of the Art in Robotics Robotic Agents @ Allegheny College Janyl Jumadinova December 3,

Nonlinear Equations Nonlinear system of equations Robotic arms

Robotic Agents (CMPSC 311) Lego EV3 Robots Janyl Jumadinova September 3, 2019 Janyl Jumadinova

A Robotic Auto-Focus System based on Deep Reinforcement Learning Xiaofan Yu, Runze Yu, Jingsong

Autonomous Robotic Projects at Cyber Physical Systems Group Ol Olive iver Hftberger, Vi

Autonomous Robotic Vehicle Project University of Alberta Static Judging Presentation - RoboSub

Assembling wall panels with robotic technologies Frans van Gassel & Pascal Schrijver

Autonomous Transport Ft. Leonard Wood Edward Mottern, Robotic Research, LLC. Director of Program

Gym: A VNF Testing Framework - Design and Prototype Insights Prof. Christian Rothenberg Raphael

2020 Budget Hearing 9/17/19 1 2020 Budgets 2020 Budgets advertised for the following four

Reproducibility and Replicability in Deep Reinforcement Learning (and Other Deep Learning

Overview of Program PROGRAM ITEMS: Large space: New Gym Existing gym size for large middle

Updatable Security Views Nate Foster Benjamin Pierce Steve Zdancewic University of Pennsylvania

2019 Obstetrics and Gynecology Update: What Does the Evidence Tell Us? October 16-18, 2019

Process Mining in Healthcare Ronny Mans Introduction This talk: Applicability of Process

MITO 8; ENGOT Ov-1 A phase III international multicenter randomized study testing the effect on

Learning in Robotic Systems Robotic Agents @ Allegheny College - PowerPoint PPT Presentation

Learning in Robotic Systems Robotic Agents @ Allegheny College Janyl Jumadinova November 27, 2019 Janyl Jumadinova Learning in Robotic Systems November 27, 2019 1 / 13 Reinforcement Learning Basic idea: Receive feedback in the form of

Self-Supervised Deep Learning for Robotic Grasping Lars Berscheid | KUKA Roboter GmbH | 10/10/2017

Deep Robotic Learning Sergey Levine UC Berkeley Google Brain robotic state low-level

Kronnika Presentation August, 2020 2 2 What is Robotic Process Automation ? Robotic

Robotic assembly projects in JAXA Hiroki Kato Daichi Hirano Keisuke Watanabe Daisuke Joudoi

Robotic Navigation Unit Team 42 Robotic Navigation Unit Dr. Crassidis Faculty Mentor

RVFuzzer: Finding Input Validation Bugs in Robotic Vehicles through Control-Guided Testing Taegyu

Molecule Screen and Cell Quality Molecule Screen and Cell Quality Assessment Assessment

Under the Robotic Knife: A Verifiable Controller for use of Multiple Robotic Arms in Surgery

State of the Art in Robotics Robotic Agents @ Allegheny College Janyl Jumadinova December 3,

Nonlinear Equations Nonlinear system of equations Robotic arms

Robotic Agents (CMPSC 311) Lego EV3 Robots Janyl Jumadinova September 3, 2019 Janyl Jumadinova

A Robotic Auto-Focus System based on Deep Reinforcement Learning Xiaofan Yu, Runze Yu, Jingsong

Autonomous Robotic Projects at Cyber Physical Systems Group Ol Olive iver Hftberger, Vi

Autonomous Robotic Vehicle Project University of Alberta Static Judging Presentation - RoboSub

Assembling wall panels with robotic technologies Frans van Gassel &amp; Pascal Schrijver

Autonomous Transport Ft. Leonard Wood Edward Mottern, Robotic Research, LLC. Director of Program

Gym: A VNF Testing Framework - Design and Prototype Insights Prof. Christian Rothenberg Raphael

2020 Budget Hearing 9/17/19 1 2020 Budgets 2020 Budgets advertised for the following four

Reproducibility and Replicability in Deep Reinforcement Learning (and Other Deep Learning

Overview of Program PROGRAM ITEMS: Large space: New Gym Existing gym size for large middle

Updatable Security Views Nate Foster Benjamin Pierce Steve Zdancewic University of Pennsylvania

2019 Obstetrics and Gynecology Update: What Does the Evidence Tell Us? October 16-18, 2019

Process Mining in Healthcare Ronny Mans Introduction This talk: Applicability of Process

MITO 8; ENGOT Ov-1 A phase III international multicenter randomized study testing the effect on

Assembling wall panels with robotic technologies Frans van Gassel & Pascal Schrijver