Deep Reinforcement Learning for Robotics Using DIANNE Tim Verbelen, Steven Bohez, Elias De Coninck, Sam Leroux, Pieter Van Molle Bert VanKeirsbilck, Pieter Simoens, Bart Dhoedt sam.leroux@ugent.be PUBLIC
How can we build robots that are able to execute complex tasks without programming them explicitly ?
Kuka Youbot Gripper 5 axis arm Length: 66 cm Battery operated Embedded PC Omnidirectional wheels Max speed: 0.8 m/s 3
Kuka soft gripper Hokuyo Laser rangefinder
Reinforcement learning Agent Environment 5
Reinforcement learning Agent Environment Observation 6
Reinforcement learning Action Agent Environment 7
Reinforcement learning Reward Agent Environment 8
Deep Reinforcement learning The actor needs to process high dimensional observations to determine the next action. ● Our favorite processing block: deep neural networks ● Observation Action 9
How can we train without destroying our robot ?
11
V-REP simulator 12
Multiple simulator instances gathering experience on CPU 13
Multiple simulator instances gathering experience on CPU GPU system training the model 14
Abstraction layer with ROS Base Arm Sensor 15
How can we evaluate our models on the robot ?
Brain transplantation ! 17
How can we connect the different components ?
19
Dianne • Modular software framework for designing, training and evaluating neural networks. • Distributed training and evaluation • Java based • Easy integration (service based architecture) • GUI • Open source (AGPL 3) 20
Deployed Deployed agent agent 21
Experience Pool Deployed Deployed agent agent 22
Experience Pool Deployed Deployed Training agent agent Repository 23
Experience Pool Deployed Deployed Training agent agent Repository 24
Deep Reinforcement learning algorithms
DQN “Playing Atari with Deep Reinforcement Learning” (Mnih et al, 2013) Q Values raw laser scanner measurements Expected future return (512 values) for each possible action 26
27
DDPG Continuous control with Deep Reinforcement Learning (Lillicrap, et al. 2015) Actor network Continuous action raw laser scanner measurements (512 values) Expected future return Critic network 28
29
Visit dianne.intec.ugent.be for more information 30
PUBLIC
Recommend
More recommend