Trajectory Optimization, Imitation Learning Lecture 14 What will - PowerPoint PPT Presentation

Trajectory Optimization, Imitation Learning Lecture 14

What will you take home today? Recap LQR Trajectory Optimization Paper Imitation Learning Supervised Learning Dagger

How to solve Optimal Control Problems?

Sequential Quadratic Programming

Example – Newton-Raphson Method

Sequential Linear Quadratic Programming

SLQ Algorithm

Linear Dynamical Systems, Quadratic cost – L inear Q uadratic R egulator (LQR)

Trajectory Optimization

What will you take home today? Recap LQR Trajectory Optimization Paper Imitation Learning Supervised Learning Dagger

Assumptions in Optimal Control 1. Known and/or simple System Dynamics 2. Known Cost function

What are approaches for unknown dynamics and/or cost? 1. Learning approaches a. Reinforcement learning Model-based i. Model-based ii. b. Imitation learning Imitate an expert policy i.

Learning to make single predictions versus a sequence of predictions

Running Example: Super Tux Cart from A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning. Ross, Gordon, Bagnell. AIStats. 2011. https://www.youtube.com/watch?feature=oembed&v=V00npNnWzSU

Imitation Learning 1. Useful when dynamics and/or cost are unknown/complex a. We don’t know how the next state will look like. Hard to model. b. We don’t know the cost to go for an action The expected immediate cost of taking action a in state s The expected immediate cost of executing policy pi in state s Cost-to-go = total cost of executing pi over T

Imitation learning – core idea 1. Idea: imitate expert trajectories! a. Bound J for any cost function C based on how well pi mimics expert’s policy

Imitation Learning by Classification Algorithm from - A Course in Machine Learning by Hal Daumé III. Ch. 18

How well does Imitation Learning by Classification work? 1. Depends on a. How good the expert is. b. How much error the classifier makes.

Running Example: Super Tux Cart Figure from ‘Interactive Learning for Sequential Decisions and Predictions’ by Stephane Ross.

Learned behavior influences states and observations Challenge: system dynamics are assumed both unknown and complex, we cannot ● compute dπ and can only sample it by executing π in the system. non-i.i.d. supervised learning problem due to the dependence of the input distribution on ● the policy π itself. Difficult optimization due to dependence which makes problem non-convex Typical assumption in statistics and machine learning: Observations in a sample are independent and identically distributed. This simplifies many methods although not true in many practical settings. Examples are coin flips. Roulette spins When are data samples i.i.d?

Running Example: Super Tux Cart from A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning. Ross, Gordon, Bagnell. AIStats. 2011. https://www.youtube.com/watch?feature=oembed&v=V00npNnWzSU

Another example – Super Mario from A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning. Ross, Gordon, Bagnell. AIStats. 2011. https://www.youtube.com/watch?v=anOI0xZ3kGM

How do we train a policy that can deal with any possible situation? 1. This is impossible since the state/observation space may be prohibitively large and we cannot train on allpossible configurations. If we could, we may just memorize anyway. 2. Goal: Train f to do well on configurations that it encounters itself. 3. Chicken and egg problem: a. We want a policy that does well in a bunch of world configuration. b. What configurations? The ones it encounters/ 4. Solve by iteration: roll out f. Collect data, retrain.

Dataset Aggregation Algorithm (Dagger) Figure and Algorithm from - A Course in Machine Learning by Hal Daumé III. Ch. 18

How well does Dagger work? Theorem from - A Course in Machine Learning by Hal Daumé III. Ch. 18

Running Example: Super Tux Cart from A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning. Ross, Gordon, Bagnell. AIStats. 2011.

Requirements on the Expert 1. Human demonstrations 2. Expensive but exact algorithm that is too slow to run in real time

Trajectory Optimization, Imitation Learning Lecture 14 What will - PowerPoint PPT Presentation

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR Trajectory Optimization Paper Imitation Learning Supervised Learning Dagger How to solve Optimal Control Problems? Sequential Quadratic

Why do imitation and analogy fail? Why do imitation and analogy fail? Imitation Imitation

Moving Object Trajectory Mining Moving Object Trajectory Mining Trajectory decomposition

Lagranto 2.0 Contents An new object - trajectory Tutorial trajectory case study

Imitation Learning Initial Concept and Approaches Nguyen, Thi Linh Chi Outline Motivation

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &

FAIC Foreign Accent Imitation Corpus Sara Neuhauser University of Jena, Germany IAFPA 2011

Imitation Theory and Experimental Evidence Joerg Oechssler University of Heidelberg

Imitation as a Stepping Stone to Innovation Amy Jocelyn Glass Texas A&M University Shift

Kevin Warwick Coventry University T urings Imitation Game T urings Imitation Game Kevin

Pe Pedestria ian n Tra Trajectory jectory Predi redicti ction on Ov Overv rview

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell & Geoff Gordon

Random Expert Distillation For Imitation Learning Ruohan Wang, Carlo

Optimal Control, LQR, Trajectory Optimization Lecture 13 What will you take home today? Intro

One-Shot Imitation Learning Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas

Implicit Imitation in Multiagent Reinforcement Learning Bob Price and Craig Boutilier Slide 1

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations Chih-Hui Ho, Chun Hu,

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of Singapore Joint work with

Economic viewpoint on risk transfer Eric Marsden <eric.marsden@risk-engineering.org> How

Example 10.23 Compute the probability of obtaining a score of 11 on a single roll of two dice.

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima & Fuminobu

What Is Risk? Lets start with your view Javier Go Estrada What is risk in the short

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics & Statistics

Exploring and optimizing the Dutch Research Data Landscape DTL Partner Advisory Committee 28

Trajectory Optimization, Imitation Learning Lecture 14 What will - PowerPoint PPT Presentation

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR Trajectory Optimization Paper Imitation Learning Supervised Learning Dagger How to solve Optimal Control Problems? Sequential Quadratic

Why do imitation and analogy fail? Why do imitation and analogy fail? Imitation Imitation

Moving Object Trajectory Mining Moving Object Trajectory Mining Trajectory decomposition

Lagranto 2.0 Contents An new object - trajectory Tutorial trajectory case study

Imitation Learning Initial Concept and Approaches Nguyen, Thi Linh Chi Outline Motivation

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &amp;

FAIC Foreign Accent Imitation Corpus Sara Neuhauser University of Jena, Germany IAFPA 2011

Imitation Theory and Experimental Evidence Joerg Oechssler University of Heidelberg

Imitation as a Stepping Stone to Innovation Amy Jocelyn Glass Texas A&amp;M University Shift

Kevin Warwick Coventry University T urings Imitation Game T urings Imitation Game Kevin

Pe Pedestria ian n Tra Trajectory jectory Predi redicti ction on Ov Overv rview

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell &amp; Geoff Gordon

Random Expert Distillation For Imitation Learning Ruohan Wang, Carlo

Optimal Control, LQR, Trajectory Optimization Lecture 13 What will you take home today? Intro

One-Shot Imitation Learning Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas

Implicit Imitation in Multiagent Reinforcement Learning Bob Price and Craig Boutilier Slide 1

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations Chih-Hui Ho, Chun Hu,

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of Singapore Joint work with

Economic viewpoint on risk transfer Eric Marsden &lt;eric.marsden@risk-engineering.org&gt; How

Example 10.23 Compute the probability of obtaining a score of 11 on a single roll of two dice.

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima &amp; Fuminobu

What Is Risk? Lets start with your view Javier Go Estrada What is risk in the short

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics &amp; Statistics

Exploring and optimizing the Dutch Research Data Landscape DTL Partner Advisory Committee 28

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &

Imitation as a Stepping Stone to Innovation Amy Jocelyn Glass Texas A&M University Shift

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell & Geoff Gordon

Economic viewpoint on risk transfer Eric Marsden <eric.marsden@risk-engineering.org> How

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima & Fuminobu

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics & Statistics