Safe model-based learning for robot control Felix Berkenkamp, Andreas Krause, Angela P. Schoellig @CDC Workshop on Learning for Control 16 th December 2018
The future of automation Felix Berkenkamp 2
The future of automation Large prior uncertainties, active decision making Need safeand high-pe performancebehavior Felix Berkenkamp 3
Control approach System identification System model Controller design data collection Controlled environments Robustness towards errors Safety constraints Felix Berkenkamp 4
Two approaches Control (Sy Systems) + Models + Feedback + Safety + Worst-case - Learning Systems must learn and adapt - Data Performance limited by system understanding Felix Berkenkamp 5
Reinforcement learning approach System identification Data samples Controller optimization System model Controller design Agent Reward Action State Environment Collecting relevant data for the task (in con ontrol olled environments) Performance typically in expe pectation on Felix Berkenkamp 6
Two approaches Control Machine Learning (Sy Systems) (Data) + Models + Learning + Feedback + Data collection + Safety + Explore / exploit + Worst-case + Average case - Learning - Worst-case Systems must learn and adapt - Data - Safety safety, data efficiency Performance limited by Safety limited by lack of Model-based system understanding system understanding reinforcement learning Felix Berkenkamp 7
Prerequisites for safe reinforcement learning Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 8
Overview Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 9
Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10
Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10
Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10
Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10
Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10
Learning a model Dyna namics Need to quantify model error Model error must decrease with data sub-Gaussian Felix Berkenkamp 10
Gaussian process Felix Berkenkamp 16
Gaussian process Felix Berkenkamp 16
Gaussian process Felix Berkenkamp 16
Gaussian process Felix Berkenkamp 16
Gaussian process Theorem(informally): The model error is contained in the scaled Gaussian process confidence intervals with probability at least jointly for all , time steps, and actively Gaussian Proce cess Optimization in the Bandit Setting: No Regret and Experimental Design selected measurements. N. Srinivas, A. Krause, S. Kakade, M.Seeger, ICML 2010 Felix Berkenkamp 16
A Bayesian dynamics model Dyna namics Felix Berkenkamp 21
A Bayesian dynamics model Dyna namics Felix Berkenkamp 21
A Bayesian dynamics model Dyna namics Felix Berkenkamp 21
A Bayesian dynamics model Dyna namics Felix Berkenkamp 21
Overview Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 25
Safety definition robust, control-invariant prior knowledge unsafe Felix Berkenkamp 26
Safety for learned models Dyna namics Policy + Stabi bility? Region on of attraction on? Felix Berkenkamp 27
Lyapunov functions [A.M. Lyapunov 1892] Felix Berkenkamp 28
Lyapunov functions Felix Berkenkamp 29
Learning Lyapunov functions Finding the right Lyapunov function is difficult! Weights - positive-definite Nonlinearities - trivial nullspace Classification problem The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic c Systems S.M. Richards, F. Berkenkamp, A. Krause, CoRL 2018 Felix Berkenkamp 30
Overview Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 31
Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32
Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32
Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32
Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32
Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32
Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy Theorem (informally): Under suitable conditions unsafe can identify (near-)maximal subset of X on which π is stable, while never leaving the safe set Felix Berkenkamp 32
Illustration of safe learning high Need to safelyexplore! Policy low Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 38
Illustration of safe learning high Policy low Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 39
Model predictive control Makes decisions based on predictions about the future Includes input / state constraints Felix Berkenkamp 40
Model predictive control on a robot https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 41
Model predictive control Problem: True dynamics are unknown! Felix Berkenkamp 42
Prediction under uncertainty Outer approximation contains true dynamics for Learning-based Model Predicti ctive Control for Safe Exploration all time steps with probability at least T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018 Felix Berkenkamp 43
Safe model-based learning framework exploration trajectory first step same Theorem (informally): Under suitable conditions can always guarantee that we are unsafe safety trajectory able to return to the safe set Felix Berkenkamp 44
Exploration via expected performance We design our cost functions to be helpful for optimization Exploration objective: subject to safety constraints Driving too fast Slow down for safety Faster driving after learning Felix Berkenkamp 45
Example https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 46
Example https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 47
Summary Understand model and Define safety, analyze a Algorithm to safely acquire learning dynamics model for safety data and optimize task RKHS S / Gaussi ussian n proc ocesse sses Lyapu puno nov st stabi bility Mod odel predictive con ontrol reliable confidence intervals stability of learned models Uncertainty propagation, safe active learning Safe Mode del-based Reinfor orcement Learning https: ps:// //be berkenkamp. p.me www.las.inf.ethz.ch www.dynsyslab.org Felix Berkenkamp 48
Recommend
More recommend