evaluating the performance of reinforcement learning
play

Evaluating the Performance of Reinforcement Learning Algorithms - PowerPoint PPT Presentation

Evaluating the Performance of Reinforcement Learning Algorithms Scott Jordan , Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas Why do we care? Performance evaluations: 1. Justify novel algorithms or enhancements 2. Tell us what


  1. Evaluating the Performance of Reinforcement Learning Algorithms Scott Jordan , Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas

  2. Why do we care? Performance evaluations: 1. Justify novel algorithms or enhancements 2. Tell us what algorithms to use If done correctly: • Can identify solved problems • Place emphasis on areas that need more research

  3. RL Algorithms for the Real-world Want: 1. High levels of performance 2. No expert knowledge required As a result: 1. less time tuning algorithms 2. More time solving harder problems

  4. Algorithm Performance Evaluations Typical evaluation procedure: 1. Tune each algorithm’s hyperparameters (e.g., policy structure, learning rate) 2. Run several trials of using tune parameters 3. Report performance (metrics, learning curve, etc.) Need a new evaluation procedure! This does not fit our needs: • Ignores the difficulty of applying algorithms

  5. Evaluation Pipeline ��������������� ���������������� �������������� ���������� ��������� ������������������ ������������� ����������� ����������� ������������ Account for Balance importance difficulty applying of each environment an algorithm

  6. A General Evaluation Question Which algorithm(s) perform well across a wide variety of environments with little or no environment–specific tuning? Existing evaluation procedures cannot answer this question We develop techniques for: 1. Sampling performance metrics that reflect knowledge of how to use the algorithm 2. Normalizing scores to account for the intrinsic difficulties of each environment 3. Balancing the importance of each environment in the aggregate measure 4. Computing uncertainty over the whole process

  7. Sampling Performance Without Tuning • Formalize knowledge to use an algorithm Complete algorithm definition

  8. Sampling Performance Without Tuning An algorithm is complete on an environment, when defined such that the only required input to the algorithm is the environment. No 𝑌 ∼ alg (𝑁) hyperparameters! Performance algorithm environment sample

  9. Making Complete Algorithm Definitions • Open research question! methods M a n u a l random sampling smart heuristics adaptive methods

  10. Performance of Complete Algorithms Well tuned Better algorithm Can measure improvements in usability! Diverging runs

  11. Comparisons Over Multiple Environments Problem: • No common measure of performance Desired normalization properties: • Same scale and center • Capture intrinsic difficulty Use cumulative distribution function

  12. Normalizing Scores

  13. Normalizing Scores

  14. Normalizing Scores

  15. Normalizing Scores Which algorithm to normalize against? Large change in difficulty Use weighted combination of all CDFs Small change in difficulty

  16. Aggregating Performance Measures • Need to weight normalization ℳ 𝒝 functions 𝑟 $ E 𝐺 ! (𝑦 𝑛 ) 𝑎 ! = # 𝑟 " # • Need to weight environments " $ • Avoid unintentional bias in weightings Aggregate Environment Normalization Normalization performance weights weights function Use game theory! of algorithm x

  17. Normalization Two-Player Game Algorithm Use 𝑟 from equilibrium solution to evaluate each algorithm 𝑍 Player q Player p 𝑌 Gridworld, E 𝐺 " (𝑌 𝑁 ) 𝑟 max min 𝑞 𝑞 𝑟 Chain, Cart-Pole, 𝑁 Mountain Car, Executing Acrobot Algorithm Normalizing Bicycle Executing Environment Distribution Algorithm Environment

  18. Quantifying Uncertainty ��������������� ���������������� ���������� ������������� ���������� ������� ���������� ������������������� �������� ������������ ����� ������������ Sources of uncertainty Confidence intervals

  19. Quantifying Uncertainty Valid for any distribution Assumptions of normality No guarantee Adapt step sizes Lots of hyperparameters

  20. Takeaways No need to tune Can measure Reliable estimates improvement in hyperparameters of uncertainty usability

  21. Acknowledgements Yash Chandak Daniel Cohen Mengxue Zhang Prof. Philip S. Thomas

  22. Ques Questions ns? Scott Jordan sjordan@cs.umass.edu http://cics.umass.edu/sjordan | @UMassScott

Recommend


More recommend