CS 285 Instructor: Sergey Levine UC Berkeley Terminology & - PowerPoint PPT Presentation

Supervised Learning of Behaviors CS 285 Instructor: Sergey Levine UC Berkeley

Terminology & notation 1. run away 2. ignore 3. pet

Aside: notation управление Lev Pontryagin Richard Bellman

Imitation Learning supervised training learning data behavioral cloning Images: Bojarski et al. ‘16, NVIDIA

The original deep imitation learning system ALVINN: A utonomous L and V ehicle I n a N eural N etwork 1989

No! Does it work?

Does it work? Yes! Video: Bojarski et al. ‘16, NVIDIA

Why did that work? Bojarski et al. ‘16, NVIDIA

Can we make it work more often? cost stability (more on this later)

Can we make it work more often?

Can we make it work more often? DAgger : D ataset A ggregation Ross et al. ‘11

DAgger Example Ross et al. ‘11

What’s the problem? Ross et al. ‘11

Deep imitation learning in practice

Can we make it work without more data? • DAgger addresses the problem of distributional “drift” • What if our model is so good that it doesn’t drift? • Need to mimic expert behavior very accurately • But don’t overfit!

Why might we fail to fit the expert? 1. Non-Markovian behavior 2. Multimodal behavior behavior depends only behavior depends on on current observation all past observations If we see the same thing Often very unnatural for twice, we do the same thing twice, regardless of what human demonstrators happened before

How can we use the whole history? variable number of frames, too many weights

How can we use the whole history? shared weights RNN state RNN state RNN state Typically, LSTM cells work better here

Aside: why might this work poorly? “causal confusion” see: de Haan et al., “Causal Confusion in Imitation Learning” Question 1: Does including history mitigate causal confusion? Question 2: Can DAgger mitigate causal confusion?

Why might we fail to fit the expert? 1. Non-Markovian behavior 1. Output mixture of 2. Multimodal behavior Gaussians 2. Latent variable models 3. Autoregressive discretization

Why might we fail to fit the expert? 1. Output mixture of Gaussians 2. Latent variable models 3. Autoregressive discretization

Why might we fail to fit the expert? 1. Output mixture of Gaussians 2. Latent variable models 3. Autoregressive discretization Look up some of these: • Conditional variational autoencoder • Normalizing flow/realNVP • Stein variational gradient descent

Why might we fail to fit the expert? dim 2 discrete value sampling 1. Output mixture of Gaussians discrete dim 1 2. Latent variable models (discretized) distribution sampling value over dimension 1 only 3. Autoregressive discretization

Imitation learning: recap supervised training learning data • Often (but not always) insufficient by itself • Distribution mismatch problem • Sometimes works well • Hacks (e.g. left/right images) • Samples from a stable trajectory distribution • Add more on-policy data, e.g. using Dagger • Better models that fit more accurately

A case study: trail following from human demonstration data

Case study 1: trail following as classification

Cost functions, reward functions, and a bit of theory

Imitation learning: what’s the problem? • Humans need to provide data, which is typically finite • Deep learning works best when data is plentiful • Humans are not good at providing some kinds of actions • Humans can learn autonomously; can our machines do the same? • Unlimited data from own experience • Continuous self-improvement

Terminology & notation 1. run away 2. ignore 3. pet

Aside: notation Lev Pontryagin Richard Bellman

Cost functions, reward functions, and a bit of theory

A cost function for imitation? supervised training learning data Ross et al. ‘11

Some analysis

More general analysis For more analysis, see Ross et al. “A Reduction of Imitation Learning and Structured Prediction to No - Regret Online Learning”

Another way to imitate

Another imitation idea

Goal-conditioned behavioral cloning

1. Collect data 2. Train goal conditioned policy

3. Reach goals

Going beyond just imitation? ➢ Start with a random policy ➢ Collect data with random goals ➢ Treat this data as “demonstrations” for the goals that were reached ➢ Use this to improve the policy ➢ Repeat

CS 285 Instructor: Sergey Levine UC Berkeley Terminology & - PowerPoint PPT Presentation

Supervised Learning of Behaviors CS 285 Instructor: Sergey Levine UC Berkeley Terminology & notation 1. run away 2. ignore 3. pet Terminology & notation 1. run away 2. ignore 3. pet Aside: notation Lev

Performa 285 Performa 285 High Alloy Zinc Nickel High Alloy Zinc Nickel Alloy Zinc Automotive

Ichthys LNG Project Ichthys Project Location Abadi WA 285 P Ichthys Field WA 285

I-285 Top End Express Lanes I-285 Westside Express Lanes 1 Unprecedented Growth in Metro

Ichthys LNG Project Ichthys NG roject Ichthys Project Location Abadi WA 285 P Ichthys

BLU-285: A potent and highly selective inhibitor designed to target malignancies driven by KIT and

GIST: imatinib and beyond Clinical activity of BLU-285 in advanced gastrointestinal stromal tumor

Particulate Air Quality Around Wisconsin Frac Sand Mines #285 B A Presentation by Dr. Crispin

Quality Candles ...in a modern design www.diana-candles.com 285 employees Aprox .

the public sector with Lorraine Forrest-Turner governmentevents.co.uk | 0330 0584 285 |

Clinical activity in a Phase 1 study of BLU-285, a potent, highly-selective inhibitor of KIT D816V

Visual disability Low vision 2015 Estimated blind people 2020 Visually impaired 285 M Blind

Southern Companys Demonstration of a 285 MW Coal-Based Transport Gasifier Project Project

Georgia DOT Updates: MMIP and Transform 285/400 January 23, 2018 Tim Matthews, P.E. MMIP

Lanes and I-285 Top End Express Lanes Fulton County Schools Briefing Tim Matthews, P.E.

COST OR PRICE COST OR PRICE REASONABLENESS REASONABLENESS (CPR) (CPR) UH APM A8.285 RCUH

Introduction to Intelligent Transportation Systems (ITS): I-285 Variable Speed Limits Andrew

Evaluating Code Duplication Evaluating Code Duplication Detection Techniques Detection

for COMS 3157 Advanced Programming What you need to know for AP 1. Understanding version control,

3. Case studies of code cloning ER Motivation: model Lots of research in clone

An Empirical Study of Code Clone Genealogies

Cloning Considered Harmful Considered Harmful Cory Kapser and Michael W. Godfrey David R.

Types for Deep/Shallow Cloning Ka Wai Cheng Imperial College London Department of Computing

Objects, Clones and Collections Implementation and simulation with simecol An example

Neural Voice Cloning with a Few Samples Sercan O. Arik, Jitong Chen, Kainan Peng* , Wei Ping,