Evaluating the Performance of Reinforcement Learning Algorithms - PowerPoint PPT Presentation

Evaluating the Performance of Reinforcement Learning Algorithms Scott Jordan , Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas

Why do we care? Performance evaluations: 1. Justify novel algorithms or enhancements 2. Tell us what algorithms to use If done correctly: • Can identify solved problems • Place emphasis on areas that need more research

RL Algorithms for the Real-world Want: 1. High levels of performance 2. No expert knowledge required As a result: 1. less time tuning algorithms 2. More time solving harder problems

Algorithm Performance Evaluations Typical evaluation procedure: 1. Tune each algorithm’s hyperparameters (e.g., policy structure, learning rate) 2. Run several trials of using tune parameters 3. Report performance (metrics, learning curve, etc.) Need a new evaluation procedure! This does not fit our needs: • Ignores the difficulty of applying algorithms

Evaluation Pipeline �� Account for Balance importance difficulty applying of each environment an algorithm

A General Evaluation Question Which algorithm(s) perform well across a wide variety of environments with little or no environment–specific tuning? Existing evaluation procedures cannot answer this question We develop techniques for: 1. Sampling performance metrics that reflect knowledge of how to use the algorithm 2. Normalizing scores to account for the intrinsic difficulties of each environment 3. Balancing the importance of each environment in the aggregate measure 4. Computing uncertainty over the whole process

Sampling Performance Without Tuning • Formalize knowledge to use an algorithm Complete algorithm definition

Sampling Performance Without Tuning An algorithm is complete on an environment, when defined such that the only required input to the algorithm is the environment. No 𝑌 ∼ alg (𝑁) hyperparameters! Performance algorithm environment sample

Making Complete Algorithm Definitions • Open research question! methods M a n u a l random sampling smart heuristics adaptive methods

Performance of Complete Algorithms Well tuned Better algorithm Can measure improvements in usability! Diverging runs

Comparisons Over Multiple Environments Problem: • No common measure of performance Desired normalization properties: • Same scale and center • Capture intrinsic difficulty Use cumulative distribution function

Normalizing Scores

Normalizing Scores Which algorithm to normalize against? Large change in difficulty Use weighted combination of all CDFs Small change in difficulty

Aggregating Performance Measures • Need to weight normalization ℳ 𝒝 functions 𝑟 $ E 𝐺 ! (𝑦 𝑛 ) 𝑎 ! = # 𝑟 " # • Need to weight environments " $ • Avoid unintentional bias in weightings Aggregate Environment Normalization Normalization performance weights weights function Use game theory! of algorithm x

Normalization Two-Player Game Algorithm Use 𝑟 from equilibrium solution to evaluate each algorithm 𝑍 Player q Player p 𝑌 Gridworld, E 𝐺 " (𝑌 𝑁 ) 𝑟 max min 𝑞 𝑞 𝑟 Chain, Cart-Pole, 𝑁 Mountain Car, Executing Acrobot Algorithm Normalizing Bicycle Executing Environment Distribution Algorithm Environment

Quantifying Uncertainty �� Sources of uncertainty Confidence intervals

Quantifying Uncertainty Valid for any distribution Assumptions of normality No guarantee Adapt step sizes Lots of hyperparameters

Takeaways No need to tune Can measure Reliable estimates improvement in hyperparameters of uncertainty usability

Acknowledgements Yash Chandak Daniel Cohen Mengxue Zhang Prof. Philip S. Thomas

Ques Questions ns? Scott Jordan sjordan@cs.umass.edu http://cics.umass.edu/sjordan | @UMassScott

Evaluating the Performance of Reinforcement Learning Algorithms - PowerPoint PPT Presentation

Evaluating the Performance of Reinforcement Learning Algorithms Scott Jordan , Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas Why do we care? Performance evaluations: 1. Justify novel algorithms or enhancements 2. Tell us what

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Pl Plan fo for T Today Strategic behavior related to transaction fees in Bitcoin

How to Perfect Your Legacy Strategy: A Mainframe Modernization Case Study Galen Silvestri, Senior

W HERE TO F IND C O -O PS ? C AREER C ONNECT C ARREER C ONNECT C ONTINUED . C HARLES R IVER D

Is there always an algorithm? An introduction to computability theory. Computer Science

EECS 541 Computer Systems Design Laboratory Syllabus and Introduction Prasad Kulkarni Department

Interpolation by Polynomials with Symmetries on the Imaginary Axis Izchak Lewkowicz ECE

Ports, Protocols, and Processes: a Programming Paradigm? Peter Grogono Computer Science and

Querying Probabilistic XML Databases Sept. 21 st 2012 Asma Souihli Network and Computer Science