Object-Level Reinforcement Learning William Agnew Pedro Domingos

Outline > Motivation – Deep RL Algorithms are Incredibly Sample Inefficient > Our Approach – Unsupervised Learning of Object-Level Representations – Object Dynamics Modelling – Estimating Reward and Value Function with a Linear Regressor – Planning with UCT > Results – Atari Games – Interpretability – Transfer Learning

Motivation > Deep RL algorithms achieve great performance > But at the cost of many samples: – DQN[1] trained on 50 million frames, or 38 days of gameplay > Samples can be very expensive; ex. robots > Humans learn to play these games in mere minutes > Can we do better?

Motivation > How do humans learn and act in physical environments? – View the world in terms of objects – Model object dynamics – Use dynamics to plan > Current state of the art: neural network outputs actions from current state – Reinventing objects, modelling, and planning all with a neural network > Can we make a RL agent that explicitly does this?

Object-Level Reinforcement Learning > We use a small number of powerful yet general principles to develop an object-level agent: – The world may be represented and modelled in terms of objects – All else equal, simpler representations are more accurate (Ockham’s razor)

Object-Level Reinforcement Learning > Our agent uses these principals to: – Learn an object-level world representation from pixels with no supervision – Learn predictive object dynamics models – Learn state-action value and rewards with a simple and interpretable approximator – Plan best actions using predictive models and value/reward approximators

Unsupervised Learning of Objects > How do we learn a mapping from pixels to objects with no supervision? > Segmentation algorithms can over- or undersegment

Unsupervised Learning of Objects > We can oversegment without losing information > But the state representation may be very large

Unsupervised Learning of Objects > How can well tell if two segments are part of the same object? > Two segments are part of the same object if they behave in the same way

Unsupervised Learning of Objects > Use a simple algorithm to produce an over segmentation > Simple and fallible object tracking > Train models on segment dynamics > Combine segments > Represent state as object absolute and relative positions, velocities, accelerations, and estimated contacts

Modelling Object Dynamics > Model object acceleration at each timestep > One model per object, per dimension > Discretize accelerations, use multiclass classifiers to output a probability distribution over possible accelerations > Also learn when and where objects will appear/disappear

Estimating State-Action Value and Reward > Monte Carlo estimation of state-action values from experiences > Fit a linear regressor from object-level representation to state-action value or reward > Value functions for most takes in physical world can be easily represented as a linear combination of collisions and relative positions – Go to an object – Touch an object – Avoid an object

Estimating State-Action Value and Reward > A linear regression on object-level features is readily interpretable-we can easily understand the learned policy > Object-level representation is also easy to reason about > Reinforcement learner won’t have bad edge case behavior on known states (known objects), recognizes when it has encountered new states (new objects)

Planning > Given a model of the world and an estimator for state-action value and reward, how do we choose the best next action to take? > Use UCT to find the best future action, approximating playouts with the value estimator > Similar approach as used in AlphaGo[2]

Results: Learning Object-Level Representations

Results: Learning Curves Evaluation methodology and DQN data from [3]

Results: Learning Curves

Results: Interpretability Linear state value estimator weights on object pair relative distance Object Pair Ball-Paddle Ball-Background Paddle-Background Vertical Dimension 0.020 -0.040 0.101 Horizontal Dimension -0.057 0.003 0.029

Conclusion > We developed a model-based reinforcement learner that is over two orders of magnitude more sample efficient than DQN while achieving comperable performance > Our agent achieves vastly better results than DQN on transfer learning tasks > Our agent is also interpretable > Modelling and planning are well studied fields: we believe this paradigm will extend well to more complex domains

References 1. Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529. 2. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search. nature , 529 (7587), 484-489. 3. Machado, Marlos C., et al. "Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents." arXiv preprint arXiv:1709.06009 (2017).

Object-Level Reinforcement Learning William Agnew Pedro Domingos - PowerPoint PPT Presentation

Object-Level Reinforcement Learning William Agnew Pedro Domingos Outline > Motivation Deep RL Algorithms are Incredibly Sample Inefficient > Our Approach Unsupervised Learning of Object-Level Representations Object Dynamics

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

A SOLVENT-FREE COMPOSITE SOLID ELECTROLYTES OF Li 2 CO 3 Al 2 O 3 SYSTEM PREPARED VIA WATER

How to address the Committee: Submit a Public Comment Form to the City Secretary prior to the

Metro Highway Program May 2016 426,980 jobs $80.7 billion in economic Measure R output $27.1

Local Advisory Committee Meeting #3 November 23, 2016 AGENDA Recap of June 1 Toronto LAC

Cambridge Core ngyn Katalin Terleti Cgkpvisel Scientific Knowledge Services

Oregon Administrative Rule changes related to ODDS Employment and DSA services Effective 11/01/19

INFORMATION FOR TEACHERS Before you visit To ensure your visit is as smooth as possible,

Facebook Changing the Face of Voting: How the Internet and Social Networking Sites Affected