Safe Reinforcement Learning in Robotics with Bayesian Models Feli lix Berk rkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause @Workshop on Reliable AI, October 2017
A new era of autonomy Images: rethink robotics, Waymob, iRobot Felix Berkenkamp 2
Reinforcement learning Explo loration Policy Poli licy update Image: Plainicon, https://flaticon.com Felix Berkenkamp 3
Dangers of autonomous learning Safety despite uncertain inty Safe exp xploration Image: Freepik, https://flaticon.com Felix Berkenkamp 4
Safe reinforcement learning Bayesian models for safety Model-free Model-based Exploration Policy Policy update Image: Plainicon, https://flaticon.com Felix Berkenkamp 5
Model-free reinforcement learning Tracking performance Few experiments Safety constraint Sa Safety for r all ll experiments Felix Berkenkamp 6
Gaussian process Felix Berkenkamp 7
Constrained Bayesian optimization Felix Berkenkamp 8
Vid ideo avail ilable at http:/ ://t /tiny.cc/ic icra16_video 9 Felix Berkenkamp
10 Felix Berkenkamp
Safe reinforcement learning Bayesian models for safety Model-free Model-based Exploration Policy Policy update Image: Plainicon, https://flaticon.com Felix Berkenkamp 11
Model-based reinforcement learning Modelling Model Control Theory Implement Felix Berkenkamp 12
Approximate dynamic programming Dynamics Expected cost Poli licy update Felix Berkenkamp 13
Uncertain dynamics Dynamics model Safety-critical Felix Berkenkamp 14
Approximate dynamic programming Dynamics Felix Berkenkamp 15
Reinforcement learning Sa Safe exploration Explo loration Policy Sa Safe poli licy update Poli licy update Image: Plainicon, https://flaticon.com Felix Berkenkamp 16
Region of attraction Felix Berkenkamp 17
Lyapunov functions [A.M. Lyapunov 1892] Felix Berkenkamp 18
Safe policy optimization (NIPS 2017) Optimize policy for performance Determine safe region Poli licy update Felix Berkenkamp 19
Policy optimization Policy Felix Berkenkamp 20
Policy optimization Need to explore! Felix Berkenkamp 21
Obtaining data Felix Berkenkamp 22
Experimental results Felix Berkenkamp 23
Policy performance Felix Berkenkamp 24
Conclusion Sa Safe fe re rein info forcement lea learnin ing! Can use st statis istic ical models to give high-probability safety guarantees Theoretical guarantees in the paper Code at github.com/befelix More safe learning at http://berkenkamp.me Felix Berkenkamp 25
Recommend
More recommend