Progressive Nets for Simulation to Robot Transfer Raia Hadsell

Skepticism Let’s acknowledge a few difficulties with deep learning and robotics: 1. Robot-domain data does not present itself in this form: Complex Environments - RAIA HADSELL

Deep RL to the rescue? Continuous Deep Q-Learning with Model-based Acceleration. Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine. ICML 2016. Asynchronous Methods for Deep Reinforcement Learning. Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu Control of Memory, Active Perception, and Action in Minecraft. Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, and Honglak Lee However, deep RL is very data inefficient Complex Environments - RAIA HADSELL

Skepticism Let’s acknowledge a few difficulties with deep learning and robotics: 2. Robot-domain data does not present itself in this quantity : Complex Environments - RAIA HADSELL

Simulation to the rescue? https://www.youtube.com/watch?v=3WXd4vC3lbQ Complex Environments - RAIA HADSELL

Simulation to the rescue? Deep learning and deep RL likes simulators: ● Training ● Algorithms ● Hyperparameters ● Speed However… There is a Reality Gap! We aren’t interested in simulation unless learning can transfer to target domain, and transfer is hard, especially for deep learning. Complex Environments - RAIA HADSELL

Transfer + continual learning Continual + Transfer learning can bridge reality gap and ameliorate data inefficiency ● ● Unfortunately, neural networks are not well-suited to continual learning Catastrophic forgetting from fine-tuning ■ ■ Policy interference from multi-task learning Complex Environments - RAIA HADSELL

Progressive Neural Networks In collaboration with: Andrei Rusu Neil C. Guillaume Hubert Soyer James Koray Razvan Rabinowitz Desjardins Kirkpatrick Kavukcuoglu Pascanu arxiv.org/abs/1606.04671 Complex Environments - RAIA HADSELL

Progressive Neural Networks � � Complex Environments - RAIA HADSELL

Progressive Neural Networks � 1 � 2 � 1 � 2 a a Complex Environments - RAIA HADSELL

Progressive Neural Networks � 1 � 2 � 3 � 1 � 2 � 3 a a a a a a Complex Environments - RAIA HADSELL

Progressive Neural Networks Advantages 1. No catastrophic forgetting of previous tasks - by design. 2. Deep, compositional feature transfer from all previous tasks and layers 3. Added capacity for learning task-specific features 4. Provides framework for analysis of transferred features Complex Environments - RAIA HADSELL

Progressive Neural Networks Disadvantages 1. Requires knowledge of task boundaries 2. Quadratic parameter growth! However, sensitivity analysis shows that successive columns use much less capacity. Complex Environments - RAIA HADSELL

Experimental setup All training is with Asynchronous Advantage Actor-Critic (A3C) [mnih et al., 2016] � 1 � 2 � 2 � 2 � 2 � 2 � 1 � 2 � 2 � 2 � 1 � 2 � 2 � 1 a a a a Progressive Net: Baseline 1: Baseline 2: Baseline 3: Baseline 4: column 1 trained column trained on column trained on column trained on column 1 random, on A, column 2 on task B A, top layer fine- A, all layers fine- column 2 trained task B tuned on B tuned on B on task B Complex Environments - RAIA HADSELL

Pong Soup Pong → white Pong Pong → horiz-flip Pong Presentation Title — SPEAKER

Analysis, 2 methods 1. Average Perturbation Sensitivity Inject Gaussian noise and measure drop in performance Pong to Noisy Pong Noise injected at column1 (blue) or column 2 (green) Complex Environments - RAIA HADSELL

Analysis 2. Average Fisher Sensitivity ● Compute modified diagonal Fisher matrix : network policy with respect to normalized activations of each layer AFS is computed for layer i , column k , and feature m . ● Complex Environments - RAIA HADSELL

Pong Soup - Analysis pong h-flip pong zoom fc fc conv 2 conv 2 conv 1 conv 1 Complex Environments - RAIA HADSELL

Pong Soup - Analysis pong noisy noisy pong fc fc conv 2 conv 2 conv 1 conv 1 Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot � 1 � 1 Column 1: Reacher task with random start, fixed target, trained with Mujoco model of Jaco arm. Input : RGB only 128 Output : joint velocities (6 DOF) Network : ConvNet + LSTM + softmax output Learning : Asynchronous advantage actor-critic (A3C); 16 threads Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot � 1 � 1 128 Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot � 1 � 2 � 1 � 2 128 16 Reacher task: random start, fixed target Input: RGB images Output: joint velocities (6 DOF) Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot � 1 � 2 � 1 � 2 Column 2: Reacher task with random start, random target, trained with real Jaco arm. Input : proprioception + target XYZ 128 16 Output : joint velocities (6 DOF) Network : MLP + LSTM + softmax output Learning : Asynchronous advantage actor-critic (A3C); 1 thread Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot � 1 � 1 128 https://www.youtube.com/watch?v=tXISbTOesMY Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot � 1 � 2 � 1 � 2 128 16 Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot � 1 � 2 � 1 � 2 128 16 https://www.youtube.com/watch?v=YZz5Io_ipi8 Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot Column 3: ‘Catch’, trained with real Jaco arm. � 1 � 2 � 3 � 1 � 2 � 3 128 16 16 https://www.youtube.com/watch?v=qzMTPzbPV0c Complex Environments - RAIA HADSELL

Progressive nets from simulation to robot Column 4: ‘Catch the bee’, trained with real Jaco arm. � 1 � 2 � 3 � 4 � 1 � 2 � 3 � 4 128 16 16 16 https://www.youtube.com/watch?v=JkXhlIWsUA0 Complex Environments - RAIA HADSELL

What’s next? Scaling up Progressive Networks ● Compression / Brain Damage / Complementary Learning ○ Limiting Model Growth with Sharing of Lateral Connections ○ Automating the progression ● Eliminating the need for manual switch points while keeping model growth in check ○ Meta-controller making use old policies in new situations ● Fast adaptation to new tasks using the fact that old policies are NOT forgotten. ○ Thank you Presentation Title — SPEAKER

Progressive Nets for Simulation to Robot Transfer Raia Hadsell - PowerPoint PPT Presentation

Progressive Nets for Simulation to Robot Transfer Raia Hadsell Skepticism Lets acknowledge a few difficulties with deep learning and robotics: 1. Robot-domain data does not present itself in this form: Complex Environments - RAIA HADSELL

Safe model-based learning for robot control Breaking your robot is only fun in simulation Felix

Simulation and Visualization Tool Design for Robot Software Zhou Lu, Tjalling Ran and Jan

The progressive kiln process. Simulation, quality, energy and drying cost considerations

Robot Motion Planning and Multi-Agent Simulation COMP 790-058 (Fall 2013) Dinesh Manocha

Simulation for Real Boqing Gong Tecent AI Lab Department of Computer Science An

A Steering Server for Collaborative Simulation of Quantitative Petri Nets Mostafa Herajy 1 and

EXPSPACE-Complete Variant of Countdown Games, and Simulation on Succinct One-Counter Nets car 1 ,

Robothlon Team competition, each team programs a robot for each event Events Robot

Progressive metaheuristics for high-dimensional radiative transfer model inversion Application to

Probabilistic Online Prediction of Robot Actions Results based on Physics Simulation

MULTI-AGENT NAVIGATION MULTI-AGENT NAVIGATION Why do it? Autonomous cars Robot

SnakeSIM : a Snake Robot Simulation Framework for Perception-Driven Obstacle-Aided Locomotion

2.1.1 URDF: Introduction Gijs van der Hoorn URDF? robot model storage format? simulation

Sanaz Taleghani 1 Qazvin Islamic Azad University, Iran Future of Rescue Robot Simulation

Simulation of a Conjugate Heat Transfer using a preCICE Coupling Library Dehee Kim a , Jongtae

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Mobile Cross-Platform Development from a Progressive Perspective Web App Progressive Web App N

Boosting Simulation Performance with Python Eran Friedman How to use Discrete-Event Simulation

Grounded Action Transformation for Robot Learning in Simulation Josiah Hanna and Peter Stone

Incident Mobilization Incident Mobilization (R- -T T- -S) Nets S) Nets (R Mobilization

Incident Mobilization Incident Mobilization (R- -T T- -S) Nets S) Nets (R Mobilization

Progressive Examples R.B. Lenin (rblenin@daiict.ac.in) Autumn 2007 R.B. Lenin

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model Paul