Tracking deformable objects with WiSARD networks: a preliminary work… INNOROBO 2014 European Workshop on Deformable Object Manipulation 20 March 2014 ─ Lyon, France Massimo De Gregorio, Maurizio Giordano, Silvia Rossi, Mariacarla Staffa and Bruno Siciliano University of Naples Federico II
Object Tracking Problem • The object tracking problem consists in reconstructing the trajectory of objects along a sequence of images • It is inherently difficult when applied to real world conditions: – – unstructured forms are present unstructured forms are present – real time responses are required – computational capabilities are limited to on-board units – problems of brightness and non-stationary background affect the image elaboration system It becomes even more challenging in case of: non-rigid objects 2
Motivations industrial manufacturing processes: rubber tubes, sheet Medical operations: metals, cords, paper sheets soft tissues, muscles, skin domestic interaction: clothes, food, etc. clothes, food, etc. Objects’ location and deformation have to be tracked 3
Proposed approach • Our aim is to address the problem of making a robot able to track any deformable object without an a priori physical model We propose a particular neural network as future detector for • tracking deformable objects during manipulation a WiSARD–based system 1. Model free 2. Noise tolerant 3. On-line learning 4
Approaches to deformable objects CAD-like object model-based methods: Edge detection Edge detection 3D models (point clouds) 3D models (point clouds) Recognition by parts Recognition by parts Appearance-based methods: Appearance-based methods: Changes in lighting or color Changes in lighting or color WiSARD-based approach: WiSARD-based approach: Changes in viewing direction Changes in viewing direction Changes in size / shape Changes in size / shape non-rigid objects are tracked non-rigid objects are tracked Feature-based methods: based on visual features such as based on visual features such as color and/or texture, object color and/or texture, object surface patches surface patches contours, regions of interest. contours, regions of interest. corners corners linear edges linear edges 5
WiSARD Wilkie Wi lkie Stonham tonham and and Aleksander’s leksander’s Recognition ecognition Device evice The McCulloch and Pitts model w 1 w 1 x 1 x 1 x 1 w 1 + x 2 w 2 + … + x n w n > σ x 1 w 1 + x 2 w 2 + … + x n w n > σ w 2 w 2 x 2 x 2 σ w 3 w 3 Σ x 3 x 3 y = 1 = 1 threshold - σ threshold - σ w n w n x n x n The RAM-node RAM 00 00 x 1 x 1 01 01 1 0 10 10 x 2 x 2 1 11 11 6
WiSARD Discriminator • Biunivocal pseudo-random mapping for connecting uncorrelated parts of the image to specific address of a RAM–based node . • The uncorrelated n-tuples are used as address of the RAMs. • A set of RAM–based nodes represents a Discriminator WiSARD Discriminator RAM-based node 7
WiSARD Discriminator Training phase Training Mapping Mapping phase Classification Classification RAM 1 00 00 0 0 0 0 0 0 0 0 0 1 1 0 01 01 0 0 0 0 10 10 0 0 0 0 0 0 0 0 11 11 RAM 2 0 0 0 0 0 0 00 00 0 0 0 0 0 0 01 01 0 0 0 0 0 0 10 10 1 0 1 0 0 0 11 11 RAM 3 0 0 0 0 0 0 00 00 Retina Retina tina tina 0 0 0 0 0 0 0 0 0 0 0 0 01 01 01 01 0 0 1 1 0 0 10 10 r Σ Σ Σ Σ Σ Σ Σ Σ 0 0 0 0 0 0 11 11 RAM 4 Similarity Similarity 0 0 0 0 0 0 00 00 0 0 0 0 0 0 01 01 measure measure 0 0 0 0 0 0 10 10 0 0 1 0 1 0 11 11 RAM 5 0 0 0 0 00 00 Training set Training set 0 0 1 1 0 0 01 01 0 0 0 0 0 0 10 10 11 11 0 0 0 0 0 0 RAM 6 0 0 0 0 0 0 00 00 0 0 0 0 01 01 0 0 0 0 0 0 10 10 0 0 0 0 1 1 11 11 8
WiSARD Network • A WiSARD Network is a multi-discriminator system Wi Wi.S S.A.R.D. Wilkie Wi lkie Stonham tonham and and Aleksander’s leksander’s Recognition ecognition Device evice Input Input Output Output Image Image Belonging Belonging class class R (%) R (%) discriminator – 0 discriminator – 1 discriminator – 2 discriminator – 3 r 1 discriminator – 4 c = d/r 1 discriminator – 5 discriminator – 6 discriminator – 7 discriminator – 8 discriminator – 9 d 9
WiSARD Modified 2) 2) Increasing Increasing the the 0 if 0 if i i = 0 = 0 Training phase Training Mapping Mapping phase RAM 1 0 0 0 0 00 00 0 0 RAM cell RAM cell content content 3 0 0 3 1 1 01 01 0 0 0 0 10 10 0 0 1 otherwise 1 otherwise 11 11 0 0 0 0 0 0 RAM 2 0 0 0 0 0 0 00 00 0 0 0 0 0 0 01 01 0 0 0 0 0 0 10 10 3 3 1 1 0 0 11 11 RAM 3 0 0 0 0 0 0 00 00 0 0 0 0 0 0 0 0 0 0 0 0 01 01 01 01 3 3 1 0 0 1 10 10 r Σ Σ Σ Σ Σ Σ Σ Σ 0 0 0 0 0 0 11 11 RAM 4 0 0 0 0 00 00 0 0 0 0 0 0 0 0 01 01 0 0 0 0 0 0 10 10 1 3 3 0 0 1 11 11 Retina Retina RAM 5 0 0 0 0 0 0 00 00 0 0 1 1 1 1 01 01 0 0 0 0 0 0 10 10 1) Learning frame by frame 0 0 0 0 11 11 2 2 RAM 6 0 0 0 0 0 0 00 00 0 0 0 0 1 1 01 01 0 0 0 0 0 0 10 10 0 1 1 0 11 11 2 2 time 10
WiSARD Modified 2) 2) Increasing Increasing the the 0 if 0 if i i = 0 = 0 RAM 1 00 00 0 0 RAM RAM cell cell content content 0 0 01 01 3 3 10 10 0 0 1 otherwise 1 otherwise 11 11 0 0 RAM 2 3) ) Filtering Filtering output output 00 00 0 0 0 0 01 01 0 0 0 0 10 10 0 0 11 11 3 3 RAM 3 0 0 00 00 0 0 1+1+1+1=4 1+1+1+1=4 1+1+1+1=4 1+1+1+1=4 0 0 0 0 3 3 3 3 01 01 01 01 3 3 10 10 r 1 1 Σ Σ Σ Σ Σ Σ Σ Σ 0 0 11 11 4 4 1 1 RAM 4 0 0 00 00 0 0 1 1 01 01 3 3 10 10 0 0 11 11 3 3 1 1 RAM 5 0 0 00 00 01 01 1 1 1 1 0 0 10 10 1) Learning frame by frame 11 11 2 2 RAM 6 00 00 0 0 Classification Classification 1 1 01 01 1 1 0 0 10 10 2 2 11 11 time 11
DRASiW for Shape Detection Input Input Output Output Image Image Belonging class Belonging class R (%) R (%) discriminator – 0 discriminator – 1 Wi.S.A.R.D. Wi . discriminator – 2 discriminator – 3 r 1 Wi Wilkie lkie Stonham tonham and and discriminator – 4 c = d/r 1 Aleksander’s leksander’s Recognition ecognition discriminator – 5 discriminator – 6 Device evice discriminator – 7 discriminator – 8 discriminator – 8 discriminator – 9 Output Output Input Input Output Output Input Input d “ mental image mental image ” Image Image Belonging class Belonging class Class name Class name R (%) R (%) discriminator – 0 discriminator – 1 D.R.A.S.iW iW.: .: discriminator – 2 Show me Show me exploits exploits the k the k-bit words bit words in in discriminator – 3 discriminator – 4 the RAMs cells to produce the RAMs cells to produce this class! this class! discriminator – 5 example of learned example of learned discriminator – 6 discriminator – 7 pattern categories pattern categories discriminator – 8 discriminator – 9 12
Mental Image frame by frame: RAM 1 Learning frame by frame 0000 0 3 0001 0 1 0010 0 0 0011 0 0 0100 0 0 0101 0 0 0110 0 0 time 0111 0 0 Histopixels 1000 0 1 1001 0 0 RAM 2 1010 0 0 0000 0 0 1011 0 0 0001 0 0 1100 0 0 0010 0 0 1101 1101 0 0 0 0 0011 0011 0 0 0 0 1110 0 0 0100 0 0 1111 0 0 0101 0 0 R 0110 0 0 b 3 b 2 b 1 b 0 b 3 b 2 b 1 b 0 b 3 b 2 b 1 b 0 0111 0 2 e 1000 0 0 RAM 3 1001 0 0 0 0000 3 t 1010 0 0 0001 1 0 1011 0 0 0 0010 0 1100 i 0 0 0011 0 0 1101 0 0 0100 0 0 1110 2 0 0 0101 0 n 1111 1 0 0110 0 0 “Mental” image 0111 0 0 a 0 1000 1 1001 0 0 1010 0 0 0 1011 0 1100 0 0 1101 0 0 0 1110 0 1111 0 0 13
WiSARD bleaching • The system always trains itself with the image on the retina of the discriminator that outputs the best response • The sub-patterns of the new image on the retina are combined with those of the MI (this means increasing their frequencies in the RAM contents) • bleaching: a “forgetting mechanism” to avoid RAM memory location saturation The sub-patterns which are not addressed by the current image on the retina are decremented (−1) DRASiW maintains an updated MI of the tracked object shape 14
WiSARD for Object Tracking • 10 left, right, up and down discriminators + central discriminator • Each discriminator is identified by its relative coordinates and is in charge of learning the object in the retina, but looking at different parts of the image • The displacement of all the retinas forms the prediction window • The position of the discriminator with higher response will identify the movement direction of the tracked object. 15
Recommend
More recommend