Safe Learning of Regions of Attraction for Uncertain, Nonlinear Systems with Gaussian Processes Felix Berkenkamp, Riccardo Moriconi, Angela P. Schoellig, Andreas Krause @CDC, December 2016
What is control? Modelling Model Control theory Implement Felix Berkenkamp 2
One small assumption… Model Degraded performance ce Instability Felix Berkenkamp 3
What is control? Modelling Model Control theory Implement Felix Berkenkamp 4
Why is learning not commonly used? Because safety matters!
What can go wrong? Modelling Model Control theory Feedback ck Implement Exci citation? Stability? Felix Berkenkamp 6
Problem definition Can we learn about dynamics cs while remaining stable? with Lipschitz continuous Bounded RKHS norm Where is this control policy safe to use? You can experiment, but no system failures! Felix Berkenkamp 7
Challenges with Bayesian learning Exploration (excitation) Stability certifi ficates (robustness) ✓ ✓ Linear systems [L. Jung, SAP’98] Linear controllers [F.Berkenkamp et al, ECC’15] ✓ ? Finite domains [R.I.Brafman et al, JMLR‘02] Nonlinear systems [A.K.Akametalu et al, CDC’14] ? Nonlinear, continuous This paper: Use ideas from sensor placement Lyapunov stability (nonlinear, unce certain systems) with high probability Felix Berkenkamp 8
Region of attraction Felix Berkenkamp 9
Lyapunov functions [A.M. Lyapunov 1966] Felix Berkenkamp 10
What about unknown dynamics? known systems: [R. Bobiti, M. Lazar, CDC 2016] Felix Berkenkamp 11
Gaussian process models high probability confidence intervals Lipschitz continuous Felix Berkenkamp 12
What about unknown dynamics? True system is stable within with high probability! Felix Berkenkamp 13
Exploring the safe set Felix Berkenkamp 14
Challenges with Bayesian learning Exploration (excitation) Stability certifi ficates (robustness) ✓ ✓ Linear systems [L. Jung, SAP’98] Linear controllers [F.Berkenkamp et al, ECC’15] ✓ ? Finite domains [R.I.Brafman et al, JMLR‘02] Nonlinear systems [A.K.Akametalu et al, CDC’14] ? Nonlinear, continuous This paper: Use ideas from sensor placement Lyapunov stability (nonlinear, unce certain systems) with high probability Felix Berkenkamp 15
How to explore? How to actively explore? Do we converge to maximum safe set? The policy is safe: keeps us in Apply Felix Berkenkamp 16
Theoretical result Close-to-optimal measurements: [A.Krause, C.Guestrin , UAI’05] Theorem: Theorem: Theorem: Theorem: Guaranteed to converge to the maximum safe levelset up to a certain accuracy after a Guaranteed to converge to the maximum safe levelset up to a certain accuracy after a Guaranteed to converge to the maximum safe levelset Guaranteed to converge to the maximum safe levelset up to a certain accuracy finite number of data points – without leaving this safe levelset with high probability. finite number of data points Bound depends on • Size of the maximum safe levelset • Information capacity of the Gaussian process model • Accuracy Felix Berkenkamp 17
Inverted pendulum Maximum torque limited! Safe exploration so that the pendulum doesn’t fall. Controller: LQR with prior mean model Quadratic Lyapunov function Felix Berkenkamp 18
Safe learning for an inverted pendulum Felix Berkenkamp 19
Conclusion Can simultaneously learn system dynamics and give stability guarantees Lyapunovstability for nonlinear, unce certain systems (with high probability, discretization) Convergence ce guarantees There is hope for safe fe reinfo force cement learning! Code is open source Example notebooks More safe learning at http://berkenkamp.me Felix Berkenkamp 20
Recommend
More recommend