Safe model-based learning for robot control Felix Berkenkamp, - PowerPoint PPT Presentation

Safe model-based learning for robot control Felix Berkenkamp, Andreas Krause, Angela P. Schoellig @CDC Workshop on Learning for Control 16 th December 2018

The future of automation Felix Berkenkamp 2

The future of automation Large prior uncertainties, active decision making Need safeand high-pe performancebehavior Felix Berkenkamp 3

Control approach System identification System model Controller design data collection Controlled environments Robustness towards errors Safety constraints Felix Berkenkamp 4

Two approaches Control (Sy Systems) + Models + Feedback + Safety + Worst-case - Learning Systems must learn and adapt - Data Performance limited by system understanding Felix Berkenkamp 5

Reinforcement learning approach System identification Data samples Controller optimization System model Controller design Agent Reward Action State Environment Collecting relevant data for the task (in con ontrol olled environments) Performance typically in expe pectation on Felix Berkenkamp 6

Two approaches Control Machine Learning (Sy Systems) (Data) + Models + Learning + Feedback + Data collection + Safety + Explore / exploit + Worst-case + Average case - Learning - Worst-case Systems must learn and adapt - Data - Safety safety, data efficiency Performance limited by Safety limited by lack of Model-based system understanding system understanding reinforcement learning Felix Berkenkamp 7

Prerequisites for safe reinforcement learning Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 8

Overview Understand model errors Define safety, analyze a Algorithm to safely acquire and learning dynamics model for safety data and optimize task Safe Model-based Reinforcement Learning Felix Berkenkamp 9

Learning a model Dyna namics Need to quantify model error Model error must decrease with data Felix Berkenkamp 10

Learning a model Dyna namics Need to quantify model error Model error must decrease with data sub-Gaussian Felix Berkenkamp 10

Gaussian process Felix Berkenkamp 16

Gaussian process Theorem(informally): The model error is contained in the scaled Gaussian process confidence intervals with probability at least jointly for all , time steps, and actively Gaussian Proce cess Optimization in the Bandit Setting: No Regret and Experimental Design selected measurements. N. Srinivas, A. Krause, S. Kakade, M.Seeger, ICML 2010 Felix Berkenkamp 16

A Bayesian dynamics model Dyna namics Felix Berkenkamp 21

Safety definition robust, control-invariant prior knowledge unsafe Felix Berkenkamp 26

Safety for learned models Dyna namics Policy + Stabi bility? Region on of attraction on? Felix Berkenkamp 27

Lyapunov functions [A.M. Lyapunov 1892] Felix Berkenkamp 28

Lyapunov functions Felix Berkenkamp 29

Learning Lyapunov functions Finding the right Lyapunov function is difficult! Weights - positive-definite Nonlinearities - trivial nullspace Classification problem The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic c Systems S.M. Richards, F. Berkenkamp, A. Krause, CoRL 2018 Felix Berkenkamp 30

Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy unsafe Felix Berkenkamp 32

Safety definition Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy Theorem (informally): Under suitable conditions unsafe can identify (near-)maximal subset of X on which π is stable, while never leaving the safe set Felix Berkenkamp 32

Illustration of safe learning high Need to safelyexplore! Policy low Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 38

Illustration of safe learning high Policy low Safe Model-based Reinforceme ment Learning with Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 39

Model predictive control Makes decisions based on predictions about the future Includes input / state constraints Felix Berkenkamp 40

Model predictive control on a robot https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 41

Model predictive control Problem: True dynamics are unknown! Felix Berkenkamp 42

Prediction under uncertainty Outer approximation contains true dynamics for Learning-based Model Predicti ctive Control for Safe Exploration all time steps with probability at least T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018 Felix Berkenkamp 43

Safe model-based learning framework exploration trajectory first step same Theorem (informally): Under suitable conditions can always guarantee that we are unsafe safety trajectory able to return to the safe set Felix Berkenkamp 44

Exploration via expected performance We design our cost functions to be helpful for optimization Exploration objective: subject to safety constraints Driving too fast Slow down for safety Faster driving after learning Felix Berkenkamp 45

Example https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 46

Example https://youtu.be/3xRNmNv5Efk Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 47

Summary Understand model and Define safety, analyze a Algorithm to safely acquire learning dynamics model for safety data and optimize task RKHS S / Gaussi ussian n proc ocesse sses Lyapu puno nov st stabi bility Mod odel predictive con ontrol reliable confidence intervals stability of learned models Uncertainty propagation, safe active learning Safe Mode del-based Reinfor orcement Learning https: ps:// //be berkenkamp. p.me www.las.inf.ethz.ch www.dynsyslab.org Felix Berkenkamp 48

Safe model-based learning for robot control Felix Berkenkamp, - PowerPoint PPT Presentation

Safe model-based learning for robot control Felix Berkenkamp, Andreas Krause, Angela P. Schoellig @CDC Workshop on Learning for Control 16 th December 2018 The future of automation Felix Berkenkamp 2 The future of automation Large prior

Robothlon Team competition, each team programs a robot for each event Events Robot

Safe model-based learning for robot control Breaking your robot is only fun in simulation Felix

Plan-based Control in an Plan-based Control in an Affordance-based Robot Control

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Robot behaviour and control A robot can be defined as an intelligent link between perception

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Establishing a Korean Robot Ethics Charter 2007. 4. 14 Robot Division, Ministry of Commerce,

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

Robot Localization Localization Robot and and Kalman Filters Filters Kalman Rudy Negenborn

? 1 1/31/2012 Every robot maps to a point in Every robot maps to a point in its configuration

Robot Walking with Genetic Algorithms Bente Reichardt 14. December 2015 Bente Reichardt 1/52

What is a Robot? (3) What Can Robots Do? (1) Autonomous Underwater Vehicle Unmanned Aerial

Building New Robots 1 Extending Robot Language Suppose we needed a Robot to patrol the walls

Robot sensors A robot can be defined as an intelligent link between perception and action

IMEXnet - A Forward Stable Deep Neural Network Eldad Haber, Keegan Lensink, Eran Treister and

J-P . Merlet HEPHAISTOS project INRIA 1 Assistance Robotics 2 Assistance Robotics is an

High-Energy Particle Showers in Coincidence with Downward Lightning Leaders at the Telescope Array

Initiative Grant (TB-ELI) Grant Call 8 Work & Community Integration 12 October 2018 1

This chart is a compartion between NTPv4 and IEEE1588v2 capabilities, and a summary of the

Why are nonlinear filters stable? Ramon van Handel Department of Operations Research &

Geant4: why move to 10.3.p01? Hans Wenzel LarSoft coordination meeting th 11 April 2017

Compact Third Order Logarithmic Limiting for Non-Linear Hyperbolic Conservation Laws Cada 1 ,

Safe model-based learning for robot control Felix Berkenkamp, - PowerPoint PPT Presentation

Safe model-based learning for robot control Felix Berkenkamp, Andreas Krause, Angela P. Schoellig @CDC Workshop on Learning for Control 16 th December 2018 The future of automation Felix Berkenkamp 2 The future of automation Large prior

Robothlon Team competition, each team programs a robot for each event Events Robot

Safe model-based learning for robot control Breaking your robot is only fun in simulation Felix

Plan-based Control in an Plan-based Control in an Affordance-based Robot Control

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Robot behaviour and control A robot can be defined as an intelligent link between perception

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Establishing a Korean Robot Ethics Charter 2007. 4. 14 Robot Division, Ministry of Commerce,

Out line Robot ics Percept ion Robot ics Planning Reading: R&amp;N Sect .

Robot Localization Localization Robot and and Kalman Filters Filters Kalman Rudy Negenborn

? 1 1/31/2012 Every robot maps to a point in Every robot maps to a point in its configuration

Robot Walking with Genetic Algorithms Bente Reichardt 14. December 2015 Bente Reichardt 1/52

What is a Robot? (3) What Can Robots Do? (1) Autonomous Underwater Vehicle Unmanned Aerial

Building New Robots 1 Extending Robot Language Suppose we needed a Robot to patrol the walls

Robot sensors A robot can be defined as an intelligent link between perception and action

IMEXnet - A Forward Stable Deep Neural Network Eldad Haber, Keegan Lensink, Eran Treister and

J-P . Merlet HEPHAISTOS project INRIA 1 Assistance Robotics 2 Assistance Robotics is an

High-Energy Particle Showers in Coincidence with Downward Lightning Leaders at the Telescope Array

Initiative Grant (TB-ELI) Grant Call 8 Work &amp; Community Integration 12 October 2018 1

This chart is a compartion between NTPv4 and IEEE1588v2 capabilities, and a summary of the

Why are nonlinear filters stable? Ramon van Handel Department of Operations Research &amp;

Geant4: why move to 10.3.p01? Hans Wenzel LarSoft coordination meeting th 11 April 2017

Compact Third Order Logarithmic Limiting for Non-Linear Hyperbolic Conservation Laws Cada 1 ,

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

Initiative Grant (TB-ELI) Grant Call 8 Work & Community Integration 12 October 2018 1

Why are nonlinear filters stable? Ramon van Handel Department of Operations Research &