trajectory optimization imitation learning
play

Trajectory Optimization, Imitation Learning Lecture 14 What will - PowerPoint PPT Presentation

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR Trajectory Optimization Paper Imitation Learning Supervised Learning Dagger How to solve Optimal Control Problems? Sequential Quadratic


  1. Trajectory Optimization, Imitation Learning Lecture 14

  2. What will you take home today? Recap LQR Trajectory Optimization Paper Imitation Learning Supervised Learning Dagger

  3. How to solve Optimal Control Problems?

  4. Sequential Quadratic Programming

  5. Example – Newton-Raphson Method

  6. Sequential Linear Quadratic Programming

  7. SLQ Algorithm

  8. Linear Dynamical Systems, Quadratic cost – L inear Q uadratic R egulator (LQR)

  9. Linear Dynamical Systems, Quadratic cost – L inear Q uadratic R egulator (LQR)

  10. Trajectory Optimization

  11. What will you take home today? Recap LQR Trajectory Optimization Paper Imitation Learning Supervised Learning Dagger

  12. Assumptions in Optimal Control 1. Known and/or simple System Dynamics 2. Known Cost function

  13. What are approaches for unknown dynamics and/or cost? 1. Learning approaches a. Reinforcement learning Model-based i. Model-based ii. b. Imitation learning Imitate an expert policy i.

  14. Learning to make single predictions versus a sequence of predictions

  15. Running Example: Super Tux Cart from A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning. Ross, Gordon, Bagnell. AIStats. 2011. https://www.youtube.com/watch?feature=oembed&v=V00npNnWzSU

  16. Imitation Learning 1. Useful when dynamics and/or cost are unknown/complex a. We don’t know how the next state will look like. Hard to model. b. We don’t know the cost to go for an action The expected immediate cost of taking action a in state s The expected immediate cost of executing policy pi in state s Cost-to-go = total cost of executing pi over T

  17. Imitation learning – core idea 1. Idea: imitate expert trajectories! a. Bound J for any cost function C based on how well pi mimics expert’s policy

  18. Imitation Learning by Classification Algorithm from - A Course in Machine Learning by Hal Daumé III. Ch. 18

  19. How well does Imitation Learning by Classification work? 1. Depends on a. How good the expert is. b. How much error the classifier makes.

  20. Running Example: Super Tux Cart Figure from ‘Interactive Learning for Sequential Decisions and Predictions’ by Stephane Ross.

  21. Learned behavior influences states and observations Challenge: system dynamics are assumed both unknown and complex, we cannot ● compute dπ and can only sample it by executing π in the system. non-i.i.d. supervised learning problem due to the dependence of the input distribution on ● the policy π itself. Difficult optimization due to dependence which makes problem non-convex Typical assumption in statistics and machine learning: Observations in a sample are independent and identically distributed. This simplifies many methods although not true in many practical settings. Examples are coin flips. Roulette spins When are data samples i.i.d?

  22. Running Example: Super Tux Cart from A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning. Ross, Gordon, Bagnell. AIStats. 2011. https://www.youtube.com/watch?feature=oembed&v=V00npNnWzSU

  23. Another example – Super Mario from A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning. Ross, Gordon, Bagnell. AIStats. 2011. https://www.youtube.com/watch?v=anOI0xZ3kGM

  24. How do we train a policy that can deal with any possible situation? 1. This is impossible since the state/observation space may be prohibitively large and we cannot train on allpossible configurations. If we could, we may just memorize anyway. 2. Goal: Train f to do well on configurations that it encounters itself. 3. Chicken and egg problem: a. We want a policy that does well in a bunch of world configuration. b. What configurations? The ones it encounters/ 4. Solve by iteration: roll out f. Collect data, retrain.

  25. Dataset Aggregation Algorithm (Dagger) Figure and Algorithm from - A Course in Machine Learning by Hal Daumé III. Ch. 18

  26. How well does Dagger work? Theorem from - A Course in Machine Learning by Hal Daumé III. Ch. 18

  27. Running Example: Super Tux Cart from A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning. Ross, Gordon, Bagnell. AIStats. 2011.

  28. Requirements on the Expert 1. Human demonstrations 2. Expensive but exact algorithm that is too slow to run in real time

Recommend


More recommend