Safe model-based learning for robot control Breaking your robot is only fun in simulation Felix ix Berkenkamp, Andreas Krause, Angela P. Schoellig @LCCC Workshop on Learning and Adaptation for Sensorimotor Control – Lund University October 2018
The Promise of Robotics = Physical In Interaction Virtual world of data & information. Angela Schoellig 2
The Promise of Robotics = Physical In Interaction Virtual world Virtual world of data & information. Real world Exponential increase in complexity! Angela Schoellig 3
The Real World Is Complex | Robots Today… and Tomorrow Dedic icated Envir ironments Human-centered Envir ironments Manually programmed. Unknown, unpredictable and changing Based on a-priori knowledge. Need safe and high-performance behavior Robots are limited by our under- Robots must safely le learn and adapt standing of the system/environment. Angela Schoellig 4
Characteristics of Robot Learning Robots are fe feedback systems Agent Action Reward Strict safety requirements State Environment Resource constraints (data, payload, communication) Reinforcement t Learning: An In Intr troducti tion R. Sutton, A.G. Barto, 1998 Results to date have been limited to learning sin ingle ta tasks, and demonstrated in sim imula lation or la lab sett ttings. NEXT CHALLENGE: realistic application scenarios — safety, data efficiency, online learning — Angela Schoellig 5
Work at the Dynamic Systems Lab (Prof. Schoellig) Approach Control = scie ience of f feedback Machine theory th (stability, performance, Learnin ing robustness) Research Characteristics Alg lgorithms th that run on real l robots. • Data efficiency • Online adaptation and learning • Safety guarantees during learning in in a clo losed-loop system Angela Schoellig 6
Performance and Safety: Fast Swarm Flight Angela Schoellig 7
Safety: Off-Road Driving Angela Schoellig 8
Prerequisites for safe reinforcement learning Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Safe Model-based Reinforcement Learning Felix Berkenkamp 9
Overview Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Safe Model-based Reinforcement Learning Felix Berkenkamp 10
Learning a model Dynamics Model error must decrease with measurements Need to quantify model error Felix Berkenkamp 11
Gaussian process Felix Berkenkamp 12
Gaussian process Felix Berkenkamp 12
Gaussian process Felix Berkenkamp 12
Gaussian process Felix Berkenkamp 12
Gaussian process Felix Berkenkamp 12
Gaussian process Felix Berkenkamp 12
Gaussian process Felix Berkenkamp 12
A Bayesian dynamics model Dynamics On Kernelized Multi lti-armed Bandits ts Onli line Learning of f Lin inearly Parameterized Contr trol Problems S.R. Chowdhury, A. Gopalan, ICML 2017 Y. Abbasi-Yadkori, PhD thesis 2012 Felix Berkenkamp 19
Samples from the Gaussian process prior state The transition dynamics are correlated! time Felix Berkenkamp 20
Samples from the Gaussian process prior state The transition dynamics are correlated! time Felix Berkenkamp 21
Samples from the Gaussian process prior state The transition dynamics are correlated! time Felix Berkenkamp 22
Overview Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Safe Model-based Reinforcement Learning Felix Berkenkamp 23
Safety definition robust, control-invariant prior knowledge unsafe Felix Berkenkamp 24
Safety for learned models Dynamics Poli licy + Stabil ility? Felix Berkenkamp 25
Lyapunov functions [A.M. Lyapunov 1892] Felix Berkenkamp 26
Lyapunov functions Felix Berkenkamp 27
Region of attraction Safe Model-based Reinforcement t Learning with ith Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy Th Theorem (informally): Under suitable conditions can identify (near-)maximal unsafe subset of X on which π is stable, while never leaving the safe set Felix Berkenkamp 28
Illustration of safe learning Need to sa safely explore! Policy Sa Safe Model-based Rein inforcement Learn Learning wit ith St Stabili lity Gu Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 29
Illustration of safe learning Policy Sa Safe Model-based Rein inforcement Learn Learning wit ith St Stabili lity Gu Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 30
Lyapunov function Finding the right Lyapunov function is difficult! Weights - positive-definite Nonlinearities - trivial nullspace Decision boundary Th The Lyapunov Neural Netw twork: Adapti tive Stability ty Certif tificati tion for Safe Learning of f Dynamic Systems S.M. Richards, F. Berkenkamp, A. Krause, CoRL 2018 Felix Berkenkamp 31
Overview Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Safe Model-based Reinforcement Learning Felix Berkenkamp 32
Model predictive control Makes decisions based on predictions about the future Includes input / state constraints Felix Berkenkamp 33
Model predictive control on a robot Video at https://youtu.be/3xR NmNv5Efk Robust t constr trained le learning-based NMPC enabling reli liable mobile robot t path th tr track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 34
Model predictive control Problem: True dynamics are unknown! Felix Berkenkamp 35
Forward-propagating uncertainty Outer approximation contains true dynamics for all time steps with probability at least Learning-based Model Predictive Contr trol for Safe Explorati tion T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018 Felix Berkenkamp 36
Safe model-based learning framework exploration trajectory first step same Th Theorem (informally): Under suitable conditions can always guarantee that we are safety trajectory unsafe able to return to the safe set Felix Berkenkamp 37
Safe model-based learning framework exploration trajectory first step same Exploration limited by size of safety trajectory unsafe the safe set! Felix Berkenkamp 38
How should we collect data for a control task? Felix Berkenkamp 39
Optimizing expected performance We design our cost functions to be helpful for optimization Exploration objective: Driving too fast Slow down for safety Faster driving after learning Felix Berkenkamp 40
Example Video at https://youtu.be/3xR NmNv5Efk Robust t constr trained le learning-based NMPC enabling reli liable mobile robot t path th tr track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 41
Summary and Outlook Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Gaussia ian processes Lyapunov stabil ility Model l predic ictiv ive control Safe Model-based Rein inforcement Learnin ing https://berkenkamp.me www.dynsyslab.org Felix Berkenkamp 42
Thanks To… My y Team – In Industrial Partners – Funding Agencies www.dynsyslab.org My outstanding collaborators at U of f T (Tim Barfoot) and ETH (Andreas Krause, Raffaello D’Andrea and the whole FMA team). Angela Schoellig 43
Recommend
More recommend