human robot social interactions through multimodal deep
play

Human-Robot Social Interactions through Multimodal Deep Attention - PowerPoint PPT Presentation

MIN Faculty Department of Informatics Human-Robot Social Interactions through Multimodal Deep Attention Recurrent Q-Network Nana Baah University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Department of Informatics


  1. MIN Faculty Department of Informatics Human-Robot Social Interactions through Multimodal Deep Attention Recurrent Q-Network Nana Baah University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Department of Informatics Technical Aspects of Multimodal Systems 14. December 2018 Nana Baah – Human-Robot Social Interactions through MDARQN 1 / 20

  2. Outline Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference 1. Introduction and Motivation 2. The Proposed MDARQN 3. Robot Actions with Attention 4. Training Phase 5. Results and Discussion 6. Conclusion Nana Baah – Human-Robot Social Interactions through MDARQN 2 / 20

  3. Why the need for HRSI? Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference ◮ Human-robot social interaction (HRSI) ◮ Humans and robots coexisting ◮ Impossible to program such interpreter to complex human behavior ◮ Self-learning architecture to learn social interaction skills Nana Baah – Human-Robot Social Interactions through MDARQN 3 / 20

  4. Introduction and Motivation Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference ◮ Reinforcement Learning (RL) + Deep Learning = Deep Q-Network ◮ Multimodal Deep Q-Network (MDQN) ◮ Robots augemented with MDQN learned to choose appropriate actions ◮ Required perceivability due to lack of attention Nana Baah – Human-Robot Social Interactions through MDARQN 4 / 20

  5. The Proposed MDARQN Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference Each stream consists of: 1. Convolution Network (Convnets) 2. Long-Short Term Memory (LSTM) 3. Attention Mechanism Network (G) Multimodal Deep Attention Recurrent Q-Network Architecture [2] Nana Baah – Human-Robot Social Interactions through MDARQN 5 / 20

  6. Convolutional Network (Convnets) Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference 1. MDARQN inputs are 2 streams of pre-processed visual frame: ◮ Y-channel for grayscale ◮ Depth-channel for depth images 2. Consist of 4 layers, followed by non-linear rectified function 3. Output D-dimensional feature vectors are feed to attention network Convolutional Network for gray-scale images as input [2] Nana Baah – Human-Robot Social Interactions through MDARQN 6 / 20

  7. Long-Short Term Memory (LSTM) Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference 1. Recurrent Neural Network (RNN) allow information to persist 2. Input gate: previous hidden state ( h t − 1 ), previous memory state ( c t − 1 ) and annotation vector ( z t ) 3. Forget gate: old/irrelevant data is discarded 4. Output gate: output data ( h t ) is based on memory state ( c t ) Long-short Term Memory Network (LSTM) [2] Nana Baah – Human-Robot Social Interactions through MDARQN 7 / 20

  8. Attention Mechanism Network Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference 1. Generates the annotation vector 2. Input: D-dimensional L feature vectors and previous hidden state of the LSTM 3. Soft attention network: ◮ Differentiable ◮ Deterministic Soft Attention Network [4] Nana Baah – Human-Robot Social Interactions through MDARQN 8 / 20

  9. Q-Network Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference 1. Q-values are normalized for the fusion 2. Normalized values are averaged together to generate output Q-values 3. Greedy action is taken (highest Q-value) Deep Q-learning for Human-robot Interaction [2] Nana Baah – Human-Robot Social Interactions through MDARQN 9 / 20

  10. Modified Robotic System Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference ◮ A Pepper robot was used for the project ◮ Used visual sensors ◮ 2-D camera and 3-D sensor (10 fps with 320x240) ◮ Modified with Force Sensing Resistors (FSR) touch sensor ◮ forms basis for reward function Pepper Robot [1] Nana Baah – Human-Robot Social Interactions through MDARQN 10 / 20

  11. Attention Steering for non-Greedy actions Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference ◮ Ensures perceivable HRSI ◮ Robot randomly picks an action from set of legal actions ◮ Attention steering function ◮ Awareness ◮ Sensitive to real world stimulus (sound and movement detection) ◮ Robot executes 4 sets of legal actions 1. Wait 2. Look towards human 3. Wave hand 4. Handshake Nana Baah – Human-Robot Social Interactions through MDARQN 11 / 20

  12. Reward Function Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference ◮ Handshake detection through touch sensor forms the baseline. ◮ Rewards ◮ Successful handshake: 1 ◮ Unsuccessful handshake: -0.1 ◮ Other actions: 0 Nana Baah – Human-Robot Social Interactions through MDARQN 12 / 20

  13. Training Phase Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference 1. Agent was trained for 14 days 2. By interacting with people in uncontrolled environment 3. At each time step: ◮ Environment provides an observation state o t ◮ Agent takes an action using ǫ -greedy policy ◮ Environment provides scalar Reinforcement model of interaction [3] reward r t and next state s t + 1 Nana Baah – Human-Robot Social Interactions through MDARQN 13 / 20

  14. Data Generation Phase Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference 1. Interaction experience e t = { s t , a t , r t , s t + 1 } is stored into a replay buffer M 2. Data generation cycle ends when terminate state T is achieved 3. Replay buffer stores N most recent experiences Nana Baah – Human-Robot Social Interactions through MDARQN 14 / 20

  15. Evaluation Procedure Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference Evaluate MDARQN decisions and impact of attention model on HRI. 1. MDARQN decisions on a data set ◮ More than 1 feasible action ◮ 3 volunteers suggested the best action for the scenario 2. Evaluating the impact of attention mechanism ◮ Robot interacted with the public under the trained Q-networks’ policy ◮ MDARQN performance were compared to MDQN Nana Baah – Human-Robot Social Interactions through MDARQN 15 / 20

  16. Results Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference Robot waits [2] Robot looks towards human [2] Robot offers a handshake [2] Robot waves [2] Nana Baah – Human-Robot Social Interactions through MDARQN 16 / 20

  17. Discussion Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference ◮ Interpret human walking trajectory ◮ Level of human engagement ◮ Human’s body orientation and distance ◮ People willingness to interact with a robot ◮ Precise selective interaction attention ◮ High penalty results in rude behavior ◮ Low penalty results in repeated handshakes Nana Baah – Human-Robot Social Interactions through MDARQN 17 / 20

  18. Conclusion Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference ◮ Proposed MDARQN was trained for 14 days ◮ By interpreting and executing a responsive action ◮ Learned to infer intention ◮ Attention indication adds perceivability to robot actions ◮ Learned to choose appropriate decisions in diverse interaction scenarios Nana Baah – Human-Robot Social Interactions through MDARQN 18 / 20

  19. Thank You Any Questions? Nana Baah – Human-Robot Social Interactions through MDARQN 19 / 20

  20. References Introduction and Motivation The Proposed MDARQN Robot Actions with Attention Training Phase Results and Discussion Conclusion Reference [1] MS Windows NT Kernel Description . https://www.softbankrobotics.com/emea/en/pepper . Accessed: 2018-12-14. [2] Ahmed Hussain Qureshi et al. “Show, attend and interact: Perceivable human-robot social interaction through neural attention Q-network”. In: Robotics and Automation (ICRA), 2017 IEEE International Conference on . IEEE. 2017, pp. 1639–1645. [3] Hado Van Hasselt, Arthur Guez, and David Silver. “Deep Reinforcement Learning with Double Q-Learning.” In: AAAI . Vol. 2. Phoenix, AZ. 2016, p. 5. [4] Shiyang Yan et al. “Hierarchical Multi-scale Attention Networks for action recognition”. In: Signal Processing: Image Communication 61 (2018), pp. 73–84. Nana Baah – Human-Robot Social Interactions through MDARQN 20 / 20

Recommend


More recommend