tracking deformable objects with wisard networks
play

Tracking deformable objects with WiSARD networks: a preliminary - PowerPoint PPT Presentation

Tracking deformable objects with WiSARD networks: a preliminary work INNOROBO 2014 European Workshop on Deformable Object Manipulation 20 March 2014 Lyon, France Massimo De Gregorio, Maurizio Giordano, Silvia Rossi, Mariacarla Staffa and


  1. Tracking deformable objects with WiSARD networks: a preliminary work… INNOROBO 2014 European Workshop on Deformable Object Manipulation 20 March 2014 ─ Lyon, France Massimo De Gregorio, Maurizio Giordano, Silvia Rossi, Mariacarla Staffa and Bruno Siciliano University of Naples Federico II

  2. Object Tracking Problem • The object tracking problem consists in reconstructing the trajectory of objects along a sequence of images • It is inherently difficult when applied to real world conditions: – – unstructured forms are present unstructured forms are present – real time responses are required – computational capabilities are limited to on-board units – problems of brightness and non-stationary background affect the image elaboration system It becomes even more challenging in case of: non-rigid objects 2

  3. Motivations industrial manufacturing processes: rubber tubes, sheet Medical operations: metals, cords, paper sheets soft tissues, muscles, skin domestic interaction: clothes, food, etc. clothes, food, etc. Objects’ location and deformation have to be tracked 3

  4. Proposed approach • Our aim is to address the problem of making a robot able to track any deformable object without an a priori physical model We propose a particular neural network as future detector for • tracking deformable objects during manipulation a WiSARD–based system 1. Model free 2. Noise tolerant 3. On-line learning 4

  5. Approaches to deformable objects CAD-like object model-based methods: Edge detection Edge detection 3D models (point clouds) 3D models (point clouds) Recognition by parts Recognition by parts Appearance-based methods: Appearance-based methods: Changes in lighting or color Changes in lighting or color WiSARD-based approach: WiSARD-based approach: Changes in viewing direction Changes in viewing direction Changes in size / shape Changes in size / shape non-rigid objects are tracked non-rigid objects are tracked Feature-based methods: based on visual features such as based on visual features such as color and/or texture, object color and/or texture, object surface patches surface patches contours, regions of interest. contours, regions of interest. corners corners linear edges linear edges 5

  6. WiSARD Wilkie Wi lkie Stonham tonham and and Aleksander’s leksander’s Recognition ecognition Device evice The McCulloch and Pitts model w 1 w 1 x 1 x 1 x 1 w 1 + x 2 w 2 + … + x n w n > σ x 1 w 1 + x 2 w 2 + … + x n w n > σ w 2 w 2 x 2 x 2 σ w 3 w 3 Σ x 3 x 3 y = 1 = 1 threshold - σ threshold - σ w n w n x n x n The RAM-node RAM 00 00 x 1 x 1 01 01 1 0 10 10 x 2 x 2 1 11 11 6

  7. WiSARD Discriminator • Biunivocal pseudo-random mapping for connecting uncorrelated parts of the image to specific address of a RAM–based node . • The uncorrelated n-tuples are used as address of the RAMs. • A set of RAM–based nodes represents a Discriminator WiSARD Discriminator RAM-based node 7

  8. WiSARD Discriminator Training phase Training Mapping Mapping phase Classification Classification RAM 1 00 00 0 0 0 0 0 0 0 0 0 1 1 0 01 01 0 0 0 0 10 10 0 0 0 0 0 0 0 0 11 11 RAM 2 0 0 0 0 0 0 00 00 0 0 0 0 0 0 01 01 0 0 0 0 0 0 10 10 1 0 1 0 0 0 11 11 RAM 3 0 0 0 0 0 0 00 00 Retina Retina tina tina 0 0 0 0 0 0 0 0 0 0 0 0 01 01 01 01 0 0 1 1 0 0 10 10 r Σ Σ Σ Σ Σ Σ Σ Σ 0 0 0 0 0 0 11 11 RAM 4 Similarity Similarity 0 0 0 0 0 0 00 00 0 0 0 0 0 0 01 01 measure measure 0 0 0 0 0 0 10 10 0 0 1 0 1 0 11 11 RAM 5 0 0 0 0 00 00 Training set Training set 0 0 1 1 0 0 01 01 0 0 0 0 0 0 10 10 11 11 0 0 0 0 0 0 RAM 6 0 0 0 0 0 0 00 00 0 0 0 0 01 01 0 0 0 0 0 0 10 10 0 0 0 0 1 1 11 11 8

  9. WiSARD Network • A WiSARD Network is a multi-discriminator system Wi Wi.S S.A.R.D. Wilkie Wi lkie Stonham tonham and and Aleksander’s leksander’s Recognition ecognition Device evice Input Input Output Output Image Image Belonging Belonging class class R (%) R (%) discriminator – 0 discriminator – 1 discriminator – 2 discriminator – 3 r 1 discriminator – 4 c = d/r 1 discriminator – 5 discriminator – 6 discriminator – 7 discriminator – 8 discriminator – 9 d 9

  10. WiSARD Modified 2) 2) Increasing Increasing the the 0 if 0 if i i = 0 = 0 Training phase Training Mapping Mapping phase RAM 1 0 0 0 0 00 00 0 0 RAM cell RAM cell content content 3 0 0 3 1 1 01 01 0 0 0 0 10 10 0 0 1 otherwise 1 otherwise 11 11 0 0 0 0 0 0 RAM 2 0 0 0 0 0 0 00 00 0 0 0 0 0 0 01 01 0 0 0 0 0 0 10 10 3 3 1 1 0 0 11 11 RAM 3 0 0 0 0 0 0 00 00 0 0 0 0 0 0 0 0 0 0 0 0 01 01 01 01 3 3 1 0 0 1 10 10 r Σ Σ Σ Σ Σ Σ Σ Σ 0 0 0 0 0 0 11 11 RAM 4 0 0 0 0 00 00 0 0 0 0 0 0 0 0 01 01 0 0 0 0 0 0 10 10 1 3 3 0 0 1 11 11 Retina Retina RAM 5 0 0 0 0 0 0 00 00 0 0 1 1 1 1 01 01 0 0 0 0 0 0 10 10 1) Learning frame by frame 0 0 0 0 11 11 2 2 RAM 6 0 0 0 0 0 0 00 00 0 0 0 0 1 1 01 01 0 0 0 0 0 0 10 10 0 1 1 0 11 11 2 2 time 10

  11. WiSARD Modified 2) 2) Increasing Increasing the the 0 if 0 if i i = 0 = 0 RAM 1 00 00 0 0 RAM RAM cell cell content content 0 0 01 01 3 3 10 10 0 0 1 otherwise 1 otherwise 11 11 0 0 RAM 2 3) ) Filtering Filtering output output 00 00 0 0 0 0 01 01 0 0 0 0 10 10 0 0 11 11 3 3 RAM 3 0 0 00 00 0 0 1+1+1+1=4 1+1+1+1=4 1+1+1+1=4 1+1+1+1=4 0 0 0 0 3 3 3 3 01 01 01 01 3 3 10 10 r 1 1 Σ Σ Σ Σ Σ Σ Σ Σ 0 0 11 11 4 4 1 1 RAM 4 0 0 00 00 0 0 1 1 01 01 3 3 10 10 0 0 11 11 3 3 1 1 RAM 5 0 0 00 00 01 01 1 1 1 1 0 0 10 10 1) Learning frame by frame 11 11 2 2 RAM 6 00 00 0 0 Classification Classification 1 1 01 01 1 1 0 0 10 10 2 2 11 11 time 11

  12. DRASiW for Shape Detection Input Input Output Output Image Image Belonging class Belonging class R (%) R (%) discriminator – 0 discriminator – 1 Wi.S.A.R.D. Wi . discriminator – 2 discriminator – 3 r 1 Wi Wilkie lkie Stonham tonham and and discriminator – 4 c = d/r 1 Aleksander’s leksander’s Recognition ecognition discriminator – 5 discriminator – 6 Device evice discriminator – 7 discriminator – 8 discriminator – 8 discriminator – 9 Output Output Input Input Output Output Input Input d “ mental image mental image ” Image Image Belonging class Belonging class Class name Class name R (%) R (%) discriminator – 0 discriminator – 1 D.R.A.S.iW iW.: .: discriminator – 2 Show me Show me exploits exploits the k the k-bit words bit words in in discriminator – 3 discriminator – 4 the RAMs cells to produce the RAMs cells to produce this class! this class! discriminator – 5 example of learned example of learned discriminator – 6 discriminator – 7 pattern categories pattern categories discriminator – 8 discriminator – 9 12

  13. Mental Image frame by frame: RAM 1 Learning frame by frame 0000 0 3 0001 0 1 0010 0 0 0011 0 0 0100 0 0 0101 0 0 0110 0 0 time 0111 0 0 Histopixels 1000 0 1 1001 0 0 RAM 2 1010 0 0 0000 0 0 1011 0 0 0001 0 0 1100 0 0 0010 0 0 1101 1101 0 0 0 0 0011 0011 0 0 0 0 1110 0 0 0100 0 0 1111 0 0 0101 0 0 R 0110 0 0 b 3 b 2 b 1 b 0 b 3 b 2 b 1 b 0 b 3 b 2 b 1 b 0 0111 0 2 e 1000 0 0 RAM 3 1001 0 0 0 0000 3 t 1010 0 0 0001 1 0 1011 0 0 0 0010 0 1100 i 0 0 0011 0 0 1101 0 0 0100 0 0 1110 2 0 0 0101 0 n 1111 1 0 0110 0 0 “Mental” image 0111 0 0 a 0 1000 1 1001 0 0 1010 0 0 0 1011 0 1100 0 0 1101 0 0 0 1110 0 1111 0 0 13

  14. WiSARD bleaching • The system always trains itself with the image on the retina of the discriminator that outputs the best response • The sub-patterns of the new image on the retina are combined with those of the MI (this means increasing their frequencies in the RAM contents) • bleaching: a “forgetting mechanism” to avoid RAM memory location saturation The sub-patterns which are not addressed by the current image on the retina are decremented (−1) DRASiW maintains an updated MI of the tracked object shape 14

  15. WiSARD for Object Tracking • 10 left, right, up and down discriminators + central discriminator • Each discriminator is identified by its relative coordinates and is in charge of learning the object in the retina, but looking at different parts of the image • The displacement of all the retinas forms the prediction window • The position of the discriminator with higher response will identify the movement direction of the tracked object. 15

Recommend


More recommend