Extrapolating Beyond Suboptimal Demonstrations via Inverse - PowerPoint PPT Presentation

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown*, Wonjoon Goo*, Prabhat Nagarajan, and Scott Niekum

Inverse Reinforcement Learning Current approaches … 1. Can’t do better than the demonstrator. We find a reward function that explains the ranking, allowing for extrapolation. 2. Are hard to scale to complex problems.

Inverse Reinforcement Learning IRL via Ranked Current approaches … Demonstrations 1. Can’t do better than the demonstrator. We find a reward function that explains the ranking, allowing for extrapolation. 2. Are hard to scale to complex problems.

Inverse Reinforcement Learning IRL via Ranked Current approaches … Demonstrations 1. Can’t do better than the demonstrator. We find a reward function that explains the ranking, allowing for extrapolation. 2. Are hard to scale to complex problems. Inverse Reinforcement Learning becomes standard binary classification.

Trajectory-ranked Reward Extrapolation (T-REX)

Trajectory-ranked Reward Extrapolation (T-REX) Given ranked demonstrations How do we train the reward function ?

Trajectory-ranked Reward Extrapolation (T-REX)

Trajectory-ranked Reward Extrapolation (T-REX) We subsample trajectories to create a large dataset of weakly labeled pairs!

Trajectory-ranked Reward Extrapolation (T-REX) • Simple: • IRL as binary classification. • No human supervision during policy learning. • No inner-loop MDP solver. • No inference time data collection (e.g. GAIL). • No action labels required.

Trajectory-ranked Reward Extrapolation (T-REX) • Simple: • IRL as binary classification. • No human supervision during policy learning. • No inner-loop MDP solver. • No inference time data collection (e.g. GAIL). • No action labels required. • Scales to high-dimensional tasks (e.g. Atari games)

Trajectory-ranked Reward Extrapolation (T-REX) • Simple: • IRL as binary classification. • No human supervision during policy learning. • No inner-loop MDP solver. • No inference time data collection (e.g. GAIL). • No action labels required. • Scales to high-dimensional tasks (e.g. Atari games) • Can produce policies much better than demonstrator

T-REX Policy Performance

T-REX on HalfCheetah Best demo (88.97) T-REX (143.40)

Reward Extrapolation T-REX can extrapolate beyond the performance of the best demo HalfCheetah Hopper Ant

Results: Atari Games T-REX ou outperf rform orms b best d demon onstration on on 7 out of 8 g 8 games! s!

T-REX on Enduro Best demo (84) T-REX (520)

Come see our poster @ Pacific Ballroom #47 Human demos / ranking labels Robust to noisy ranking labels Automatic ranking by Reward function visualization watching a learner improve at a task

Come see our poster @ Pacific Ballroom #47 Human demos / ranking labels Robust to noisy ranking labels Automatic ranking by Reward function visualization watching a learner improve T-REX at a task

Extrapolating Beyond Suboptimal Demonstrations via Inverse - PowerPoint PPT Presentation

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum Inverse Reinforcement Learning Current approaches 1. Cant do better

Medicare Demonstrations Medicare Demonstrations and Clinical Integration and Clinical

CS 730/830: Intro AI 1 handout: slides Are We Done? Beyond A* Suboptimal Search Anytime

Extrapolating with density surface models Laura Mannocci Workshop on spatial models for distance

Skill discovery from unstructured demonstrations Skill discovery from unstructured demonstrations

Applying the Cost of Capital Approach to Extrapolating an Implied Volatility Surface August 1,

Extrapolating across levels of biological organization and how mechanistic models can help

402.02.03.05 Module Assembly Extrapolating Experience from Phase 1 Upgrade Matthew Jones,

Extrapolating Solar Dynamo Models throughout the Heliosphere Taylor Cox Bridgewater College

Visualization & Situational Awareness Demonstrations Project Presentation for EPIC Symposium

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Sustainable Cultivation of Suboptimal Lands in Pulau Burung District of Riau Province Najmul

Excitation of Optimal and Suboptimal Currents Miloslav Martin Capek 1 nek 1 Petr Kadlec 2

Spati tial al socioec economi onomic indic icat ators ors of suboptimal ptimal space e

Novel PCSK9 Outcomes in Perspective: Lessons Suboptimal from FOURIER & Statin Therapy

Suboptimal immune reconstitution despite HIV suppression on ART Connie J. Kim, MSc, PhD

Discovering Exis7ng Systems: T-Rex G. Cugola E.

Introduction Designing, Analyzing and Maintaining Millions of LOC: Is it sustainable?

Web Archiving Dr. Marc Spaniol Dr. Marc Spaniol Saarbrcken, May 27, 2010 Databases and

Parsing COMP 520: Compiler Design (4 credits) Alexander Krolik alexander.krolik@mail.mcgill.ca

Computer Supported Modeling and Reasoning David Basin, Achim D. Brucker, Jan-Georg Smaus, and

61A Lecture 34 Announcements Integer Examples (continued) A Very Interesting Number The

REX Data Handling Project Planning Adam Lyon SCD/REX/DH NUCOMP 2012 May 16 Plan till we

Ozone in the Tropical Tropopause Layer (TTL) over the Western Pacific Eric Hintsa, Fred Moore,

Extrapolating Beyond Suboptimal Demonstrations via Inverse - PowerPoint PPT Presentation

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown*, Wonjoon Goo*, Prabhat Nagarajan, and Scott Niekum Inverse Reinforcement Learning Current approaches 1. Cant do better

Medicare Demonstrations Medicare Demonstrations and Clinical Integration and Clinical

CS 730/830: Intro AI 1 handout: slides Are We Done? Beyond A* Suboptimal Search Anytime

Extrapolating with density surface models Laura Mannocci Workshop on spatial models for distance

Skill discovery from unstructured demonstrations Skill discovery from unstructured demonstrations

Applying the Cost of Capital Approach to Extrapolating an Implied Volatility Surface August 1,

Extrapolating across levels of biological organization and how mechanistic models can help

402.02.03.05 Module Assembly Extrapolating Experience from Phase 1 Upgrade Matthew Jones,

Extrapolating Solar Dynamo Models throughout the Heliosphere Taylor Cox Bridgewater College

Visualization &amp; Situational Awareness Demonstrations Project Presentation for EPIC Symposium

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Sustainable Cultivation of Suboptimal Lands in Pulau Burung District of Riau Province Najmul

Excitation of Optimal and Suboptimal Currents Miloslav Martin Capek 1 nek 1 Petr Kadlec 2

Spati tial al socioec economi onomic indic icat ators ors of suboptimal ptimal space e

Novel PCSK9 Outcomes in Perspective: Lessons Suboptimal from FOURIER &amp; Statin Therapy

Suboptimal immune reconstitution despite HIV suppression on ART Connie J. Kim, MSc, PhD

Discovering Exis7ng Systems: T-Rex G. Cugola E.

Introduction Designing, Analyzing and Maintaining Millions of LOC: Is it sustainable?

Web Archiving Dr. Marc Spaniol Dr. Marc Spaniol Saarbrcken, May 27, 2010 Databases and

Parsing COMP 520: Compiler Design (4 credits) Alexander Krolik alexander.krolik@mail.mcgill.ca

Computer Supported Modeling and Reasoning David Basin, Achim D. Brucker, Jan-Georg Smaus, and

61A Lecture 34 Announcements Integer Examples (continued) A Very Interesting Number The

REX Data Handling Project Planning Adam Lyon SCD/REX/DH NUCOMP 2012 May 16 Plan till we

Ozone in the Tropical Tropopause Layer (TTL) over the Western Pacific Eric Hintsa, Fred Moore,

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum Inverse Reinforcement Learning Current approaches 1. Cant do better

Visualization & Situational Awareness Demonstrations Project Presentation for EPIC Symposium

Novel PCSK9 Outcomes in Perspective: Lessons Suboptimal from FOURIER & Statin Therapy