cs 378 autonomous intelligent robotics
play

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov - PowerPoint PPT Presentation

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov http://www.cs.utexas.edu/~jsinapov/teaching/cs378/ Multimodal Perception Announcements Final Projects Presentation Date: Thursday, May 12, 9:00-12:00 noon Project Deliverables


  1. CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov http://www.cs.utexas.edu/~jsinapov/teaching/cs378/

  2. Multimodal Perception

  3. Announcements Final Projects Presentation Date: Thursday, May 12, 9:00-12:00 noon

  4. Project Deliverables • Final Report (6+ pages in PDF) • Code and Documentation (posted on github) • Presentation including video and/or demo

  5. Multi-modal Perception

  6. The “5” Senses

  7. The “5” Senses [http://edublog.cmich.edu/meado1bl/files/2013/03/Five-Senses2.jpg]

  8. The “5” Senses [http://edublog.cmich.edu/meado1bl/files/2013/03/Five-Senses2.jpg]

  9. [http://neurolearning.com/sensoryslides.pdf]

  10. How are sensory signals from different modalities integrated?

  11. [Battaglia et. al. 2003]

  12. Locating the Stimulus Using a Single Modality Standard Comparison Trial Trial Is the stimulus in Trial 2 located to the left or to the right of the stimulus in Trial 1?

  13. Locating the Stimulus Using a Single Modality Standard Comparison Trial Trial Is the stimulus in Trial 2 located to the left or to the right of the stimulus in Trial 1?

  14. Multimodal Condition Standard Comparison Trial Trial

  15. [Ernst, 2006]

  16. Take-home Message During integration, sensory modalities are weighted based on their individual reliability

  17. Further Reading Ernst, Marc O., and Heinrich H. Bülthoff. "Merging the senses into a robust percept." Trends in cognitive sciences 8.4 (2004): 162-169. Battaglia, Peter W., Robert A. Jacobs, and Richard N. Aslin. "Bayesian integration of visual and auditory signals for spatial localization." JOSA A 20.7 (2003): 1391-1397.

  18. Sensory Integration During Speech Perception

  19. McGurk Effect

  20. McGurk Effect https://www.youtube.com/watch?v=G-lN8vWm3m0 https://vimeo.com/64888757

  21. Object Recognition Using Auditory and Proprioceptive Feedback Sinapov et al. “Interactive Object Recognition using Proprioceptive and Auditory Feedback” International Journal of Robotics Research, Vol. 30, No. 10, September 2011

  22. What is Proprioception? “It is the sense that indicates whether the body is moving with required effort, as well as where the various parts of the body are located in relation to each other.” - Wikipedia

  23. Why Proprioception?

  24. Why Proprioception? Empty Full

  25. Why Proprioception? Soft Hard

  26. Exploratory Behaviors Lift : Shake : Drop : Crush: Push :

  27. Objects

  28. Sensorimotor Contexts Sensory Modalities audio proprioception lift shake Behaviors drop press push

  29. Feature Extraction J 1 . . . J 7 Time

  30. Feature Extraction Training a self-organizing map (SOM) Training an SOM using sampled using sampled joint torques: frequency distributions:

  31. Feature Extraction Discretization of joint-torque Discretization of the DFT of a sound records using a trained SOM using a trained SOM is the is the sequence of activated SOM nodes sequence of activated SOM nodes over the duration of the interaction over the duration of the sound

  32. Proprioception sequence Audio sequence Proprioceptive Auditory Recognition Recognition Model Model Weighted Combination

  33. Accuracy vs. Number of Objects

  34. Accuracy vs. Number of Behaviors

  35. Results with a Second Dataset • Tactile Surface Recognition: – 5 scratching behaviors – 2 modalities: vibrotactile and proprioceptive Artificial Finger Tip Sinapov et al. “Vibrotactile Recognition and Categorization of Surfaces by a Humanoid Robot” IEEE Transactions on Robotics, Vol. 27, No. 3, pp. 488-497, June 2011

  36. Surface Recognition Results Chance accuracy = 1/20 = 5 %

  37. Scaling up: more sensory modalities, objects and behaviors ZCam (RGB+D) Microphones in the head Logitech Webcam Torque sensors in the joints 3-axis accelerometer

  38. 100 objects

  39. Exploratory Behaviors grasp lift hold shake drop tap poke push press

  40. Object Exploration Video

  41. Object Exploration Video #2

  42. Coupling Action and Perception Action: poke … … … Perception: optical flow … … … Time

  43. Sensorimotor Contexts audio proprioception proprioception Optical Color SURF (DFT) (joint torques) (finger pos.) flow look grasp lift hold shake drop tap poke push press

  44. Sensorimotor Contexts audio proprioception proprioception Optical Color SURF (DFT) (joint torques) (finger pos.) flow look grasp lift hold shake drop tap poke push press

  45. Feature Extraction: Proprioception Joint-Torque values for all 7 Joints Joint-Torque Features

  46. Feature Extraction: Audio audio spectrogram Spectro-temporal Features

  47. Feature Extraction: Color Object Segmentation Color Histogram (4 x 4 x 4 = 64 bins)

  48. Feature Extraction: Optical Flow … … … Count Angular bins

  49. Feature Extraction: Optical Flow … … …

  50. Feature Extraction: SURF

  51. Feature Extraction: SURF Each interest point is described by a 128-dimensional vector

  52. Feature Extraction: SURF Count Visual “words”

  53. Dimensionality of Data audio proprioception proprioception Optical Color SURF (DFT) (joint torques) (finger pos.) flow 100 70 6 64 10 200

  54. Data From a Single Exploratory Trial audio proprioception proprioception Optical Color SURF (DFT) (joint torques) (finger pos.) flow look grasp lift hold shake drop tap poke push press

  55. Data From a Single Exploratory Trial audio proprioception proprioception Optical Color SURF (DFT) (joint torques) (finger pos.) flow look grasp lift hold shake drop tap poke push press x 5 per object

  56. Overview Interaction with Object Category Estimates … Sensorimotor Feature Category Recognition Model Extraction

  57. Context-specific Category Recognition M poke-audio Observation from poke- Recognition model for Distribution over audio context poke-audio context category labels

  58. Context-specific Category Recognition • The models were implemented by two machine learning algorithms:  K-Nearest Neighbors (k = 3)  Support Vector Machine

  59. Support Vector Machine • Support Vector Machine: a discriminative learning algorithm 1. Finds maximum margin hyperplane that separates two classes 2. Uses Kernel function to map data points into a feature space in which such a hyperplane exists [http://www.imtech.res.in/raghava/rbpred/svm.jpg]

  60. Combining Model Outputs . . . . . . . . M look-color M tap-audio M lift-SURF M press-prop. Weighted Combination

  61. Model Evaluation: 5 fold Cross-Validation Train Set Test Set

  62. Recognition Rates (%) with SVM Audio Proprioception Color Optical Flow SURF All look 58.8 58.9 67.7 grasp 45.7 38.7 12.2 57.1 65.2 lift 48.1 63.7 5.0 65.9 79.0 hold 30.2 43.9 5.0 58.1 67.0 shake 49.3 57.7 32.8 75.6 76.8 drop 47.9 34.9 17.2 57.9 71.0 tap 63.3 50.7 26.0 77.3 82.4 push 72.8 69.6 26.4 76.8 88.8 poke 65.9 63.9 17.8 74.7 85.4 press 62.7 69.7 32.4 69.7 77.4

  63. Distribution of rates over categories

  64. Can behaviors be selected actively to minimize exploration time?

  65. Active Behavior Selection • For each behavior , estimate such that • Let be the vector encoding the robot’s current estimates over the category labels and let be the remaining set of behaviors available to the robot

  66. Example with 3 Categories and 2 Behaviors Remaining Behaviors and Associated Confusion: Current Estimate: A B C A B C A A B B C C A B C B1 B2

  67. Active Behavior Selection: Example Remaining Behaviors and Associated Confusion: Current Estimate: A B C A B C A A B B C C A B C B1 B2

  68. Active Behavior Selection

  69. Active vs. Random Behavior Selection

  70. Active vs. Random Behavior Selection

  71. Discussion What are some of the limitations of the experiment? What are some ways to address them? What other possible senses can you think of that would be useful to a robot?

  72. References Sinapov, J., Bergquist, T., Schenck, C., Ohiri, U., Griffith, S., and Stoytchev, A. (2011) Interactive Object Recognition Using Proprioceptive and Auditory Feedback . International Journal of Robotics Research, Vol. 30, No. 10, pp. 1250- 1262 Sinapov, J., Schenck, C., Staley, K., Sukhoy, V., and Stoytchev, A. (2014) Grounding Semantic Categories in Behavioral Interactions: Experiments with 100 Objects. Robotics and Autonomous Systems, Vol. 62, No. 5, pp. 632-645

  73. THE END

Recommend


More recommend