optimization based control direct collocation methods for
play

Optimization-Based Control: Direct Collocation Methods for - PowerPoint PPT Presentation

Optimization-Based Control: Direct Collocation Methods for Trajectory and Policy Optimization CS 287: Advanced Robotics, Fall 2019 Guest Lecture Igor Mordatch Overview Previously: Locally optimal control (shooting vs. collocation)


  1. Optimization-Based Control: Direct Collocation Methods for Trajectory and Policy Optimization CS 287: Advanced Robotics, Fall 2019 Guest Lecture Igor Mordatch

  2. Overview • Previously: • Locally optimal control (shooting vs. collocation) • Forward dynamics models and shooting (LQR, DDP) • Today: • Direct collocation in detail (open-loop and policies) • inverse dynamics models • Solution methods for collocation problems • Optimization with contacts

  3. Outline • Trajectory optimization and direct collocation • Inverse dynamics model • Numerical optimization for collocation • Optimizing dynamics with contact • Collocation methods for policy learning

  4. shooting collocation

  5. shooting collocation

  6. shooting collocation

  7. Outline • Trajectory optimization and direct collocation • Inverse dynamics model • Numerical optimization for collocation • Optimizing dynamics with contact • Collocation methods for policy learning

  8. Outline • Trajectory optimization and direct collocation • Inverse dynamics model • Numerical optimization for collocation • Optimizing dynamics with contact • Collocation methods for policy learning

  9. (recall Natural Gradient from lec. 6)

  10. Recall Natural Gradient (Lec. 6). Can you see the commonalities? Natural Gradient Consider a standard maximum likelihood problem: n Gradient: n Hessian: n r 2 p ( x ( i ) ; θ ) ⌘ > ⇣ ⌘ ⇣ X r 2 f ( θ ) = r log p ( x ( i ) ; θ ) r log p ( x ( i ) ; θ ) � p ( x ( i ) ; θ ) i Natural gradient: n only keeps the 2 nd term in the Hessian. Benefits: (1) faster to compute (only gradients needed); (2) guaranteed to be negative definite; (3) found to be superior in some experiments; (4) invariant to re-parameterization

  11. Outline • Trajectory optimization and direct collocation • Inverse dynamics model • Numerical optimization for collocation • Optimizing dynamics with contact • Collocation methods for policy learning

  12. Direct Trajectory OpWmizaWon of Rigid Body Dynamical Systems Through Contact Posa and Tedrake, 2012

  13. Outline • Trajectory optimization and direct collocation • Inverse dynamics model • Numerical optimization for collocation • Optimizing dynamics with contact • Collocation methods for policy learning

  14. Recall from Last Lecture: Optimal Control -- Approaches Return feedback policy Return open-loop (e.g. linear or neural net) controls u 0 , u 1 , …, u H shooting collocation

Recommend


More recommend