Grounded Action Transformation for Robot Learning in Simulation - PowerPoint PPT Presentation

Grounded Action Transformation for Robot Learning in Simulation Josiah Hanna and Peter Stone Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 1

Reinforcement Learning for Physical Robots Learning on physical robots: Not data-efficient. Requires supervision. Manual resets. Robots break. Wear and tear make learning non-stationary. Not an exhaustive list... Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 2

Reinforcement Learning in Simulation Learning in simulation: Thousands of trials in parallel. No supervision and automatic resets. Robots never break or wear out. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 3

Reinforcement Learning in Simulation Learning in simulation: Thousands of trials in parallel. No supervision and automatic resets. Robots never break or wear out. Policies learned in simulation often fail in the real world. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 3

Notation Environment E = �S , A , c , P � Robot in state s ∈ S chooses action a ∈ A according to policy π . Parameterized π θ denoted θ Environment, E , responds with a new state S t +1 ∼ P ( ·| s , a ). Cost function c defines a scalar cost for each ( s , a ). Goal is to find θ which minimizes: � L � � J ( θ ) := E S 1 , A 1 ,..., S L , A L c ( S t , A t ) t =1 Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 4

Learning in Simulation Simulator E sim = �S , A , c , P sim � . Identical to E but different dynamics (transition function). Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 5

Learning in Simulation Simulator E sim = �S , A , c , P sim � . Identical to E but different dynamics (transition function). J sim ( θ ′ ) > J sim ( θ 0 ) � J ( θ ′ ) > J ( θ 0 ) Goal: Learn θ in simulation that also works on physical robot. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 5

Grounded Simulation Learning Grounded Simulation Learning (GSL) is a framework for robot learning in simulation by modifying the simulator with real world data so that policies learned in simulation work in the real world [ ? ]. 1 Execute θ 0 on physical robot. 2 Ground simulator so θ 0 produces similar trajectories in simulation. 3 Optimize J sim ( θ ) to find better θ ′ . 4 Test θ ′ on the physical robot. 5 θ 0 := θ ′ and repeat. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 6

Grounded Simulation Learning Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 7

Grounding the Simulator Assume P sim is parameterized by φ . d : Any measure of similarity between state transition distributions Robot executes θ 0 and records dataset D of ( S t , A t , S t +1 ) transitions. φ ⋆ = argmin � d ( P ( ·| S t , A t ) , P φ ( ·| S t , A t )) φ ( S t , A t , S t +1 ) ∈D Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 8

Grounding the Simulator Assume P sim is parameterized by φ . d : Any measure of similarity between state transition distributions Robot executes θ 0 and records dataset D of ( S t , A t , S t +1 ) transitions. φ ⋆ = argmin � d ( P ( ·| S t , A t ) , P φ ( ·| S t , A t )) φ ( S t , A t , S t +1 ) ∈D How to define φ ? Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 8

Advantages of GSL 1 No random-access simulation modification required. 2 Leaves underlying policy optimization unchanged. 3 Efficient simulator modification. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 9

Guided Grounded Simulation Learning Farchy et al. presented a GSL algorithm and demonstrated a 26.7% improvement in walk speed on a Nao. Two limitations of existing approach: 1 Modification relied on assumption that desired joint positions achieved instantaneously in simulation. 2 Used expert knowledge to select which components of θ could be learned. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 10

Grounded Action Transformations Goal: Eliminate simulator-dependent assumption of earlier work. φ ⋆ = argmin � d ( P ( ·| S t , A t ) , P φ ( ·| S t , A t )) φ ( S t , A t , S t +1 ) ∈D Replace robot’s action a t with an action that produces a more “realistic” transition. Learn this action as a function g φ ( s t , a t ). Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 11

Grounded Action Transformation Figure : Modifiable simulator induced by gat . Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 12

Grounded Action Transformation X : the set of robot joint configurations. Learn two functions: Robot’s dynamics: f : S × A → X Simulator’s inverse dynamics: f − 1 sim : S × X → A . a t := f − 1 Replace robot’s action a t with ˆ sim ( s t , f ( s t , a t )). Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 13

Grounded Action Transformations Figure : Modifiable simulator induced by gat . Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 14

GAT Implementation f and f − 1 sim learned with supervised learning. Record sequence S t , A t , ... on robot and in simulation. Supervised learning of g : f − 1 sim : ( S t , A t ) → X t +1 f : ( S t , X t +1 ) → A t Smooth modified actions: g ( s t , a t ) := α f − 1 sim ( s t , f ( s t , a t )) + (1 − α ) a t Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 15

Supervised Implementation Forward model trained with 15 real world trajectories of 2000 time-steps. Inverse model trained with 50 simulated trajectories of 1000 time-steps. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 16

Empirical Results Applied GAT to learning fast bipedal walks for the Nao robot. Task: Walk forward towards a target. θ 0 : University of New South Wales Walk Engine. Simulator: SimSpark Robocup3D Simulator and OSRF Gazebo Simulator. Policy optimization with cma-es stochastic search method. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 17

Empirical Results (a) Softbank Nao (b) Gazebo Nao (c) SimSpark Nao Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 18

Empirical Results Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 19

Empirical Results Simulation to Nao: Method Velocity (cm/s) % Improve Initial policy 19.52 0.0 SimSpark, first iteration 26.27 34.58 SimSpark, second iteration 27.97 43.27 Gazebo, first iteration 26.89 37.76 SimSpark to Gazebo: Method % Improve Failures Best Gen. No Ground 11.094 7 1.33 Noise-Envelope 18.93 5 6.6 22.48 1 2.67 gat Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 20

Conclusion Contributions: 1 Introduced Grounded Action Transformations algorithm for simulation transfer. 2 Improved walk speed of Nao robot by over 40 % compared to state-of-the-art walk engine. Future Work: Extending to other robotics tasks and platforms. When does grounding actions work and when does it not? Reformulating learning g : f and f − 1 sim minimize one-step error but we actually care about error over sequences of states and actions. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 21

Thanks for your attention! Questions? Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 22

Alon Farchy, Samuel Barrett, Patrick MacAlpine, and Peter Stone. Humanoid robots learning to walk faster: From the real world to simulation and back. In Twelth International Conference on Autonomous Agents and Multiagent Systems , 2013. Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 22

Grounded Action Transformation for Robot Learning in Simulation - PowerPoint PPT Presentation

Grounded Action Transformation for Robot Learning in Simulation Josiah Hanna and Peter Stone Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 1 Reinforcement Learning for Physical Robots Learning on

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

Robothlon Team competition, each team programs a robot for each event Events Robot

Robot sensors A robot can be defined as an intelligent link between perception and action

Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog Jesse

Robot behaviour and control A robot can be defined as an intelligent link between perception

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Outline Introduction Definition History Features When should Grounded Theory be used? Types

TAKE TAKE GROUNDED GROUNDED DECISIONS DECISIONS Farm Modelling Statistic based, gamification

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Establishing a Korean Robot Ethics Charter 2007. 4. 14 Robot Division, Ministry of Commerce,

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

Robot Localization Localization Robot and and Kalman Filters Filters Kalman Rudy Negenborn

? 1 1/31/2012 Every robot maps to a point in Every robot maps to a point in its configuration

I Inequality: Is privilege the lit I i il th problem? problem? Max Rashbrooke and Lisa

Housing, Taxes, and Racial Wealth Inequality Login at: https://results.zoom.us/j/873308801 or dial

Security Objectives and Design Information Security Management Dr Hans Georg Schaathun

Public Consultation Event For the Development of a Municipal Solid Waste Management Strategic

Boiler Monitoring For Superior Performance NE Biomass Hea:ng

Environment with Extraordinary Impact on Road Safety Alvin Poi Wai Hoong Malaysian Institute of

Trans-African Hydro- Meteorological Observatory Prof. dr.

bp midstream partners Focused on safe operations 3Q 2020 financial results Delivering financial

Grounded Action Transformation for Robot Learning in Simulation - PowerPoint PPT Presentation

Grounded Action Transformation for Robot Learning in Simulation Josiah Hanna and Peter Stone Josiah Hanna and Peter Stone Grounded Action Transformation for Robot Learning in Simulation 1 Reinforcement Learning for Physical Robots Learning on

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

Robothlon Team competition, each team programs a robot for each event Events Robot

Robot sensors A robot can be defined as an intelligent link between perception and action

Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog Jesse

Robot behaviour and control A robot can be defined as an intelligent link between perception

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Outline Introduction Definition History Features When should Grounded Theory be used? Types

TAKE TAKE GROUNDED GROUNDED DECISIONS DECISIONS Farm Modelling Statistic based, gamification

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Establishing a Korean Robot Ethics Charter 2007. 4. 14 Robot Division, Ministry of Commerce,

Out line Robot ics Percept ion Robot ics Planning Reading: R&amp;N Sect .

Robot Localization Localization Robot and and Kalman Filters Filters Kalman Rudy Negenborn

? 1 1/31/2012 Every robot maps to a point in Every robot maps to a point in its configuration

I Inequality: Is privilege the lit I i il th problem? problem? Max Rashbrooke and Lisa

Housing, Taxes, and Racial Wealth Inequality Login at: https://results.zoom.us/j/873308801 or dial

Security Objectives and Design Information Security Management Dr Hans Georg Schaathun

Public Consultation Event For the Development of a Municipal Solid Waste Management Strategic

Boiler Monitoring For Superior Performance NE Biomass Hea:ng

Environment with Extraordinary Impact on Road Safety Alvin Poi Wai Hoong Malaysian Institute of

Trans-African Hydro- Meteorological Observatory Prof. dr.

bp midstream partners Focused on safe operations 3Q 2020 financial results Delivering financial

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .