deep affordance grounded sensorimotor object recognition
play

Deep Affordance-Grounded Sensorimotor Object Recognition Authors: - PowerPoint PPT Presentation

Deep Affordance-Grounded Sensorimotor Object Recognition Authors: Spyridon Thermos, Georgios Presented By: Th. Papadopoulos, Petros Daras, Thomas Crosley Gerasimos Potamianos UT CS 381V Autumn 2017 Problem Integrate visual appearance


  1. Deep Affordance-Grounded Sensorimotor Object Recognition Authors: Spyridon Thermos, Georgios Presented By: Th. Papadopoulos, Petros Daras, Thomas Crosley Gerasimos Potamianos UT CS 381V Autumn 2017

  2. Problem ● Integrate visual appearance and visual affordance information ● Object + Affordance Classification Hit Using Hammer

  3. Affordances : “the types of actions that humans typically perform when interacting with an object.” Sit Throw Workout https://www.youtube.com/watch?v=V4XW74W9t4o https://www.youtube.com/watch?v=7Qxu5cvW-ds https://www.youtube.com/watch?v=1xS864zYIo8

  4. Related Work Simpler Methods Smaller Data ● Factorial Conditional ● Few objects [1, 2, 3] Random Fields and Binary ● Small number of affordances [1, 2, 3] SVMs [1] ● Ex: 6 objects and 3 affordances [1] ● Gaussian Processes [2] ● SVMs + Clustering [3] [1] [2] [3]

  5. RGB-D Sensorimotor Dataset

  6. RGB-D Sensorimotor Dataset http://sor3d.vcl.iti.gr/wp-content/uploads/2017/03/sor3d.mp4?_=1

  7. RGB-D Sensorimotor Dataset

  8. RGB-D Sensorimotor Dataset Original Input

  9. RGB-D Sensorimotor Dataset Input Processing

  10. RGB-D Sensorimotor Dataset Data Extraction

  11. RGB-D Sensorimotor Dataset ● 14 Object Types ● 13 Affordances ● 54 Interactions ● 105 subjects ● 4 to 8 seconds ● 20,830 instances

  12. Architectures ● Generalized Template-Matching (GTM) ● Model spatial correlations ● Appearance CNN for object detection

  13. Architectures ● Generalized Spatio-Temporal (GST) ● Encode time-evolving procedures ● CNN+LSTM for affordance modeling

  14. Long Short Term Memory Networks (LSTMs) LSTMs: recurrent architecture capable of learning long-term dependencies Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  15. LSTMs Core Idea: cell state updated and then passed on at each time step Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  16. LSTMs “Forget Gate” “Remember Gate” Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  17. LSTMs Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  18. Fusion ● Given multiple sources of information ● At what point do we combine their features? Image Source: http://cs.stanford.edu/people/karpathy/deepvideo/

  19. Fusion ● GST Architecture ● Combines ○ Appearance ○ Affordance ● (a) Late Fusion ● (b) Slow fusion

  20. Architecture Slow Fusion Multi-Level Late Fusion Late Fusion Fusion at FC at conv

  21. Results Single Stream (Best) Template Matching (Best) Spatio-Temporal

  22. Open Problems ● Authors’ Thoughts ○ NN-Autoencoders for human-object interactions ○ “In-the-wild” object-affordance detection ● Others ○ Affordance identification for control tasks ○ Better temporal sampling schemes

Recommend


More recommend