Random Expert Distillation For Imitation Learning Ruohan - PowerPoint PPT Presentation

Mar 06, 2024 •157 likes •275 views

Random Expert Distillation For Imitation Learning Ruohan Wang, Carlo Ciliberto, Pierluigi Amadori, Yiannis Demiris ICML 2019 Imitation Learning Teacher Student Policy learning from

Random ¡Expert ¡Distillation ¡For ¡ Imitation ¡Learning Ruohan Wang, ¡Carlo ¡Ciliberto, ¡Pierluigi Amadori, ¡Yiannis ¡Demiris ICML ¡2019
Imitation ¡Learning Teacher Student ⁃ Policy ¡learning ¡from ¡a ¡limited set ¡of ¡expert ¡demonstrations ⁃ Intuitive ¡& ¡efficient ¡skills ¡ transfer ⁃ Captures ¡styles ¡& ¡preferences
Inverse ¡Reinforcement ¡Learning - Generative ¡Adversarial ¡ Expert ¡Trajectories Agent ¡Trajectories Imitation ¡Learning ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ (Ho ¡et ¡al., ¡2015) - Optimization ¡challenges Reward ¡Function - Training ¡instability - sample ¡inefficiency RL ¡Algorithm Agent ¡Policy
Random ¡Expert ¡Distillation ¡(RED) - Directly ¡learns ¡a ¡reward ¡ Expert ¡Trajectories function ¡with ¡Random ¡ Network ¡Distillation ¡(RND) ¡ (Burda et ¡al., ¡2018) Reward ¡Function - Considers ¡how ¡“similar” ¡is ¡ the ¡agent ¡to ¡the ¡expert, ¡ instead ¡of ¡how ¡“different” RL ¡Algorithm Agent ¡Policy
Reward ¡Function , ¡ and ¡ 𝑔 . : ¡ℝ 1 → ℝ 3 Over ¡expert ¡trajectories ¡ 𝐸 = {𝑡 % ¡, 𝑏 % } ¡ %*+ = . 𝜄 ∗ = min . ||𝑔 . 𝑡, 𝑏 ¡− 𝑔 ;3< 𝑡, 𝑏 || = Define ¡the ¡reward ¡as = ) . ∗ 𝑡, 𝑏 ¡− 𝑔 𝑠 𝑡, 𝑏 = exp ¡ (−𝜏||𝑔 ;3< 𝑡, 𝑏 || = The ¡reward ¡asymptotically estimates ¡the ¡support of ¡the ¡expert ¡policy
Mujoco Experiments Hopper HalfCheetah Walker2d Reacher Ant GAIL 3614.2 ¡± 7.2 4515.7 ¡± 549.5 4878.0 ¡± 2848.3 -‑32.4 ¡± 39.8 3186.8 ¡± 903.6 GMMIL 3309.3 ¡± 26.3 3464.2 ¡± 476.5 2967.1 ¡± 702.0 -‑11.89 ¡± 5.27 991 ± 2.6 RED 3626.0 ¡± 4.3 3072.0 ¡± 84.7 4481.4 ¡± 20.9 -‑10.43 ¡± 5.2 3552.8 ¡± 348.7 Image ¡ref: ¡https://creativestudio2019spring.files.wordpress.com/2019/02/openaigym.png
Training ¡Stability ¡& ¡Sample ¡Efficiency Hopper Reacher
Driving ¡Task Average Best BC 1033 ¡± 474 1956 GAIL 795 ¡± 395 1576 GMMIL 2024 ¡± 981 3624 RED 4825 ¡± 1552 7485 Expert 7485 ¡± 0 7485
Reward ¡function ¡penalizes ¡dangerous ¡driving
Summary ⁃ Random ¡Expert ¡Distillation ¡is ¡a ¡new ¡framework ¡for ¡imitation ¡learning, ¡ using ¡the ¡estimated ¡support ¡of ¡the ¡expert ¡policy ¡as ¡reward. ⁃ Our ¡results ¡suggest ¡that ¡RED ¡is ¡viable, ¡robust ¡and ¡attains ¡good ¡ performance. ⁃ Future ¡works: ¡combining ¡different ¡sources ¡of ¡expert ¡information ¡for ¡ more ¡robust ¡algorithms.
Thank ¡you ⁃ Code: ¡https://github.com/RuohanW/RED ⁃ Check ¡out ¡our ¡poster: Pacific ¡Ballroom ¡#39 6:30 ¡to ¡9:00 ¡pm ¡today

Recommend

Why do imitation and analogy fail? Why do imitation and analogy fail? Imitation Imitation

Why do imitation and analogy fail? Why do imitation and analogy fail? Imitation Imitation Children do imitate some things Children do imitate some things Children say things they Children say things they ve never

370 views • 33 slides

Imitation Learning Initial Concept and Approaches Nguyen, Thi Linh Chi Outline Motivation

Imitation Learning Initial Concept and Approaches Nguyen, Thi Linh Chi Outline Motivation Basics and Definition Approaches & Examples Conclusion Nguyen, Thi Linh Chi Imitation Learning 2 Motivation Imitation Learning

587 views • 26 slides

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell & Geoff Gordon

Reduction of Imitation Learning to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell & Geoff Gordon Imitation Learning Machine Expert Learning Policy Demonstrations Algorithm 2 Imitation Learning Many

811 views • 46 slides

Distillation. Optimal operation using simple control structures Sigurd Skogestad, NTNU, Trondheim

Distillation. Optimal operation using simple control structures Sigurd Skogestad, NTNU, Trondheim EFCE Working Group on Separations, Gteborg, Sweden, June 2019 Distillation is part of the future 1. Its a myth that distillation is bad in

652 views • 53 slides

Complex distillation systems. Theory and models. Pio Aguirre INGAR Santa Fe-Argentina Outline

Complex distillation systems. Theory and models. Pio Aguirre INGAR Santa Fe-Argentina Outline 1.- Introduction. 2.- Theory in simple columns design. 3.- Reversible distillation columns and sequences. 4.- Optimal synthesis distillation

720 views • 59 slides

Effective Topic Distillation Effective Topic Distillation with Key Resource Pre- -selection

Effective Topic Distillation Effective Topic Distillation with Key Resource Pre- -selection selection with Key Resource Pre Yiqun Liu, Min Zhang and Shaoping Ma State Key Lab of Intelligent Tech. & Sys. Tsinghua University, Beijing,

266 views • 22 slides

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement & Imitation) Goal: Find Optimal Policy State/Context s t Agent Imitation Learning: Optimize imitation loss Reinforcement Learning: Optimize

547 views • 53 slides

FAIC Foreign Accent Imitation Corpus Sara Neuhauser University of Jena, Germany IAFPA 2011

FAIC Foreign Accent Imitation Corpus FAIC Foreign Accent Imitation Corpus Sara Neuhauser University of Jena, Germany IAFPA 2011 Vienna, 24.28.07.2011 FAIC Foreign Accent Imitation Corpus Outline 1 Background Preliminary study

595 views • 21 slides

Imitation Theory and Experimental Evidence Joerg Oechssler University of Heidelberg

Imitation Theory and Experimental Evidence Joerg Oechssler University of Heidelberg Imitation is relevant Neoplan (Germany) vs. Zhongwei (P.R. China) Imitation is prevalent Zeiss Ikon Contax II (1936) vs. Nikon I (1948) Imitator

655 views • 46 slides

Imitation as a Stepping Stone to Innovation Amy Jocelyn Glass Texas A&M University Shift

Imitation as a Stepping Stone to Innovation Amy Jocelyn Glass Texas A&M University Shift from Imitation to Innovation Countries such as Korea, China, and Taiwan shifting from imitation to innovation. Product cycle literature

323 views • 18 slides

Kevin Warwick Coventry University T urings Imitation Game T urings Imitation Game Kevin

Kevin Warwick Coventry University T urings Imitation Game T urings Imitation Game Kevin Warwick Kevin Warwick 25th September 2015 25th September 2015 Man is an Unoriginal Animal Man is an Unoriginal Animal

659 views • 38 slides

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number is a number chosen as if by chance from some specified distribution such that selection of a large set of these numbers reproduces the underlying

720 views • 12 slides

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR Trajectory Optimization Paper Imitation Learning Supervised Learning Dagger How to solve Optimal Control Problems? Sequential Quadratic

750 views • 31 slides

One-Shot Imitation Learning Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas

One-Shot Imitation Learning Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba Motivation & Problem - Imitation Learning commonly applied to isolated tasks - Desire:

555 views • 21 slides

Implicit Imitation in Multiagent Reinforcement Learning Bob Price and Craig Boutilier Slide 1

Implicit Imitation in Multiagent Reinforcement Learning Bob Price and Craig Boutilier Slide 1 ICML-99 Slides: Dana Dahlstrom CSE 254, UCSD 2002.04.23 Overview Learning by imitation entails watching a mentor perform a task. Slide 2

414 views • 14 slides

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations Chih-Hui Ho, Chun Hu,

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations Chih-Hui Ho, Chun Hu, Po-Jung Lai 1 Outline 1. Introduction 2. Related work Generative adversarial imitation learning (GAIL) 3. Proposed method 4. Experiment

692 views • 14 slides

Parallel Coupling of CFD-DEM simulations MUG2018 Gabriele Pozzetti, Xavier Besseron, Alban

Parallel Coupling of CFD-DEM simulations MUG2018 Gabriele Pozzetti, Xavier Besseron, Alban Rousset, Bernhard Peters Luxembourg XDEM Research Centre http://luxdem.uni.lu/ Parallel Coupling of CFD-DEM simulations MUG2018 Outline

794 views • 32 slides

Performing parallel parameter scans on Hopper at NERSC Robert Ryne LBNL Sept 10, 2012 Bringing

Performing parallel parameter scans on Hopper at NERSC Robert Ryne LBNL Sept 10, 2012 Bringing High Performance Computing (HPC) to MAP D&S Incorporating HPC techniques is an integral part of our D&S plans A new account for MAP

81 views • 7 slides

Make Money With Open Source What is Open Source? Community Free software vs. open source

Make Money With Open Source What is Open Source? Community Free software vs. open source Licenses: GPL vs. LGPL vs. MIT/Apache Foundations: Linux, Apache, Eclipse, Similar: Open Data, Open Hardware, Open Knowledge, ... Advantages of OS

508 views • 13 slides

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality Katie

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1 Requirements to Reality Develop RFP Select vendor partner Negotiate SOW

402 views • 23 slides

An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries Dennis Andriesse , Xi Chen

An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries Dennis Andriesse , Xi Chen , Victor van der Veen , Asia Slowinska , Herbert Bos Vrije Universiteit Amsterdam Lastline, Inc. USENIX Security 2016

508 views • 19 slides

DrillSafe Fluids Management and Bulk Mixer Inc. Present The Technology of Drilling Fluids Bulk

DrillSafe Fluids Management and Bulk Mixer Inc. Present The Technology of Drilling Fluids Bulk Mixing and Material Handling IADC Spark Tank December 12, 2018 DrillSafe Fluids Management Mission: Provide high-quality drilling fluids mixing

381 views • 21 slides

MILLIONS OF TRANSACTIONS PER SECOND ON A SINGLE MACHINE CASE FOR A VIRTUALIZED DATABASE AND

MILLIONS OF TRANSACTIONS PER SECOND ON A SINGLE MACHINE CASE FOR A VIRTUALIZED DATABASE AND SCALE-IN ROGER JOHANSSON Who am I? Roger Johansson Actor Model, Scalability, Distributed Systems, C#, Senior Solution Architect Starcounter Go,

683 views • 37 slides

APNA 29th Annual Conference Session 3016.1: October 30, 2015 Amy LaValla DNP, APRN, PMHNP-BC, PHN

APNA 29th Annual Conference Session 3016.1: October 30, 2015 Amy LaValla DNP, APRN, PMHNP-BC, PHN The speaker has no conflicts of interest to disclose Identify why comprehensive fall risk assessment policies are needed Recognize

166 views • 4 slides