Policy Certificates: Towards Accountable Reinforcement Learning - PowerPoint PPT Presentation

Policy Certificates: Towards Accountable Reinforcement Learning Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill CMU Google Research Stanford University

Minimax-Optimal PAC Bounds Key contribution: new algorithm for episodic tabular MDPs with PAC Bound High-prob. Regret Bound S: #states, A: #actions, H: episode length, T: #episodes, ϵ: accuracy

Minimax-Optimal PAC Bounds Key contribution: new algorithm for episodic tabular MDPs with PAC Bound High-prob. Regret Bound First minimax-optimal! (for small ϵ) S: #states, A: #actions, H: episode length, T: #episodes, ϵ: accuracy Prior work: [DLB ‘17] [DB ‘15]

<latexit sha1_base64="30xMli8LGXUeEmLEMSjzuAbUbRA=">ACX3icbVHLTgIxFC3jG1+jroybRmJigiEzg0aXqAtZ+gAlgYF0StGzsP2jpFM5v8BpeuXLnVrR3ARMGbtjk59x721MvElyBZb3mjJnZufmFxaX8srq2rq5sXmrwlhSVqehCGXDI4oJHrA6cBCsEUlGfE+wO69/nuXvnphUPAxqMIiY65P7gPc4JaCpjkla6lFCcnNabTu1FBdxC9gzDPs2ry/O3MSxrAOcrTQZSavtci1Np3T2kRb9HGlSxDdtJ2uadsyCVbKGgaeBPQYFNI7Ljvne6oY09lkAVBClmrYVgZsQCZwKluZbsWIRoX1yz5oaBsRnyk2GV0nxnma6uBdKvQPAQ/Z3RUJ8pQa+p5U+gQc1mcvI/3OU6gupifHQO3ETHkQxsICOpvdigSHEmdm4yWjIAYaECq5fgCmD0QSCvpL8toZe9KHaXDrlOxybk6LFQqY48W0Q7aRfvIRseogqroEtURS/oA32ir9ybsWCsGeZIauTGNVvoTxjb34GntLY=</latexit> Minimax-Optimal PAC Bounds Key contribution: new algorithm for episodic tabular MDPs with PAC Bound High-prob. Regret Bound First minimax-optimal! (for small ϵ) Matches existing + improves for large H S: #states, A: #actions, H: episode length, T: #episodes, ϵ: accuracy Prior work: √ √ H 3 T + S 2 AH 2 SAH 2 T + [AOM ‘17] [DLB ‘17] [DB ‘15]

Motivation: Need for Accountability in Online RL current episode Even with PAC + regret bounds: expected return in next episode during learning unknown

Motivation: Need for Accountability in Online RL How good will my treatment be? Is it the best possible?

Our Proposal: Algorithms output policy certificates before each episode

Algorithms with policy certificates Natural extension of model-based optimistic algorithms 1. UCB on optimal value function 2. Greedy Policy 3. LCB on value function of current policy 4. Output certificate 0

Algorithms with Policy Certificates Natural extension of model-based optimistic algorithms 1. UCB on optimal value function 2. Greedy Policy 3. LCB on value function of current policy 4. Output certificate 0

Symbiosis of Optimism and Certificates Certificates: • Challenge: random • Insight from optimism: at known rate

Symbiosis of Optimism and Certificates Certificates: Optimism: • Challenge: random • Challenge: exploration bonus depends on • Insight from optimism: • Insight from certificates: at known rate bound by

Symbiosis of Optimism and Certificates Certificates: Optimism: • Challenge: random • Challenge: exploration bonus depends on • Insight from optimism: • Insight from certificates: at known rate bound by More accountable algorithms Better exploration bonuses yield through accurate policy certificates minimax-optimal PAC & regret bounds

Policy Certificates: Towards Accountable Reinforcement Learning - PowerPoint PPT Presentation

Policy Certificates: Towards Accountable Reinforcement Learning Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill CMU Google Research Stanford University Minimax-Optimal PAC

Accountable Care Systems Andrew Bland SPG, 26th January 2018 Development of Accountable

PKI and Certificate Security Outline Motivation Certificates Public Key Infrastructure

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3.

ACCOUNTABLE HEALTH COMMUNITIES (AHC) MODEL CENTERS FOR MEDICARE AND MEDICAID FUNDING OPPORTUNITY

The Accountable Net: The Accountable Net: Peer Production of Governance Peer Production of

Information Made Information Made Accountable Accountable The Data Projection Model Michel

Simplification & Integration www.apprenticeships.org.uk/certificates Apprenticeship

Certificates for cs.washington.edu 1 Certificates for GMail Important fields: Testing SSL

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles Yue Zhao Maciej K.

Cold atoms in 2D optical lattices under staggered rotation Cristiane MORAIS SMITH Institute for

Audience Outlook Monitor: Museums and Galleries Tandi Palmer Williams June 2020 Many people

Opportunities and Challenges IP-SOC DAYS Shanghai - Sep 13 th 2018 www.allegrodvt.com Allegro DVT

Robust Configuration Management <draft-cole-netconf-robust-config-00.txt> Robert G. Cole 1

Our Ability to Improve The 22nd Princeton Conference, Session VII Quality: Has the health care

WebKit, HTML5 media and GStreamer on multiple platforms Spreading GStreamer awesome in WebKit

HOM Transfer Function Measurements and other topics Larry Doolittle, LBNL ICFA Workshop on High

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Policy Certificates: Towards Accountable Reinforcement Learning - PowerPoint PPT Presentation

Policy Certificates: Towards Accountable Reinforcement Learning Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill CMU Google Research Stanford University Minimax-Optimal PAC

Accountable Care Systems Andrew Bland SPG, 26th January 2018 Development of Accountable

PKI and Certificate Security Outline Motivation Certificates Public Key Infrastructure

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3.

ACCOUNTABLE HEALTH COMMUNITIES (AHC) MODEL CENTERS FOR MEDICARE AND MEDICAID FUNDING OPPORTUNITY

The Accountable Net: The Accountable Net: Peer Production of Governance Peer Production of

Information Made Information Made Accountable Accountable The Data Projection Model Michel

Simplification &amp; Integration www.apprenticeships.org.uk/certificates Apprenticeship

Certificates for cs.washington.edu 1 Certificates for GMail Important fields: Testing SSL

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles Yue Zhao Maciej K.

Cold atoms in 2D optical lattices under staggered rotation Cristiane MORAIS SMITH Institute for

Audience Outlook Monitor: Museums and Galleries Tandi Palmer Williams June 2020 Many people

Opportunities and Challenges IP-SOC DAYS Shanghai - Sep 13 th 2018 www.allegrodvt.com Allegro DVT

Robust Configuration Management &lt;draft-cole-netconf-robust-config-00.txt&gt; Robert G. Cole 1

Our Ability to Improve The 22nd Princeton Conference, Session VII Quality: Has the health care

WebKit, HTML5 media and GStreamer on multiple platforms Spreading GStreamer awesome in WebKit

HOM Transfer Function Measurements and other topics Larry Doolittle, LBNL ICFA Workshop on High

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Simplification & Integration www.apprenticeships.org.uk/certificates Apprenticeship

Robust Configuration Management <draft-cole-netconf-robust-config-00.txt> Robert G. Cole 1