safe model based learning for robot control
play

Safe model-based learning for robot control Felix Berkenkamp, - PowerPoint PPT Presentation

Safe model-based learning for robot control Felix Berkenkamp, Andreas Krause, Angela P. Schoellig @CDC Workshop on Learning for Control 16 th December 2018 The future of automation Felix Berkenkamp 2 The future of automation Large prior


  1. Safe model-based learning for robot control Felix Berkenkamp, Andreas Krause, Angela P. Schoellig @CDC Workshop on Learning for Control 16 th December 2018

  2. The future of automation Felix Berkenkamp 2

  3. The future of automation Large prior uncertainties, active decision making Need safeand high-pe performancebehavior Felix Berkenkamp 3

  4. Control approach System identification System model Controller design data collection Controlled environments Robustness towards errors Safety constraints Felix Berkenkamp 4

  5. Two approaches Control (Sy Systems) + Models + Feedback + Safety + Worst-case - Learning Systems must learn and adapt - Data Performance limited by system understanding Felix Berkenkamp 5

  6. Reinforcement learning approach System identification Data samples Controller optimization System model Controller design Agent Reward Action State Environment Collecting relevant data for the task (in con ontrol olled environments) Performance typically in expe pectation on Felix Berkenkamp 6

  7. Two approaches Control Machine Learning (Sy Systems) (Data) + Models + Learning + Feedback + Data collection + Safety + Explore / exploit + Worst-case + Average case - Learning - Worst-case Systems must learn and adapt - Data - Safety safety, data efficiency Performance limited by Safety limited by lack of Model-based system understanding system understanding reinforcement learning Felix Berkenkamp 7

  8. Prerequisites for safe reinforcement learning Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 8

  9. Overview Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 9

  10. Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10

  11. Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10

  12. Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10

  13. Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10

  14. Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10

  15. Learning a model Dyna namics Need to quantify model error Model error must decrease with data sub-Gaussian Felix Berkenkamp 10

  16. Gaussian process Felix Berkenkamp 16

  17. Gaussian process Felix Berkenkamp 16

  18. Gaussian process Felix Berkenkamp 16

  19. Gaussian process Felix Berkenkamp 16

  20. Gaussian process Theorem(informally): The model error is contained in the scaled Gaussian process confidence intervals with probability at least jointly for all , time steps, and actively Gaussian Proce cess Optimization in the Bandit Setting: No Regret and Experimental Design selected measurements. N. Srinivas, A. Krause, S. Kakade, M.Seeger, ICML 2010 Felix Berkenkamp 16

  21. A Bayesian dynamics model Dyna namics Felix Berkenkamp 21

  22. A Bayesian dynamics model Dyna namics Felix Berkenkamp 21

  23. A Bayesian dynamics model Dyna namics Felix Berkenkamp 21

  24. A Bayesian dynamics model Dyna namics Felix Berkenkamp 21

  25. Overview Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 25

  26. Safety definition robust, control-invariant prior knowledge unsafe Felix Berkenkamp 26

  27. Safety for learned models Dyna namics Policy + Stabi bility? Region on of attraction on? Felix Berkenkamp 27

  28. Lyapunov functions [A.M. Lyapunov 1892] Felix Berkenkamp 28

  29. Lyapunov functions Felix Berkenkamp 29

  30. Learning Lyapunov functions Finding the right Lyapunov function is difficult! Weights - positive-definite Nonlinearities - trivial nullspace Classification problem The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic c Systems S.M. Richards, F. Berkenkamp, A. Krause, CoRL 2018 Felix Berkenkamp 30

  31. Overview Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 31

  32. Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32

  33. Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32

  34. Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32

  35. Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32

  36. Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32

  37. Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy Theorem (informally): Under suitable conditions unsafe can identify (near-)maximal subset of X on which π is stable, while never leaving the safe set Felix Berkenkamp 32

  38. Illustration of safe learning high Need to safelyexplore! Policy low Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 38

  39. Illustration of safe learning high Policy low Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 39

  40. Model predictive control Makes decisions based on predictions about the future Includes input / state constraints Felix Berkenkamp 40

  41. Model predictive control on a robot https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 41

  42. Model predictive control Problem: True dynamics are unknown! Felix Berkenkamp 42

  43. Prediction under uncertainty Outer approximation contains true dynamics for Learning-based Model Predicti ctive Control for Safe Exploration all time steps with probability at least T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018 Felix Berkenkamp 43

  44. Safe model-based learning framework exploration trajectory first step same Theorem (informally): Under suitable conditions can always guarantee that we are unsafe safety trajectory able to return to the safe set Felix Berkenkamp 44

  45. Exploration via expected performance We design our cost functions to be helpful for optimization Exploration objective: subject to safety constraints Driving too fast Slow down for safety Faster driving after learning Felix Berkenkamp 45

  46. Example https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 46

  47. Example https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 47

  48. Summary Understand model and Define safety, analyze a Algorithm to safely acquire learning dynamics model for safety data and optimize task RKHS S / Gaussi ussian n proc ocesse sses Lyapu puno nov st stabi bility Mod odel predictive con ontrol reliable confidence intervals stability of learned models Uncertainty propagation, safe active learning Safe Mode del-based Reinfor orcement Learning https: ps:// //be berkenkamp. p.me www.las.inf.ethz.ch www.dynsyslab.org Felix Berkenkamp 48

Recommend


More recommend