multi object tracking mot visual and audio visual
play

Multi-object tracking (MOT): visual and audio-visual Daniel - PowerPoint PPT Presentation

Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland Outline MOT using Particle Filters


  1. Multi-object tracking (MOT): visual and audio-visual Daniel Gatica-Perez (joint work with Kevin Smith, Guillaume Lathoud, Iain McCowan, Jean-Marc Odobez) IDIAP Research Institute Martigny, Switzerland

  2. Outline � MOT using Particle Filters � Our work � Visual MOT with Distributed Partitioned Sampling [Smith et al, BMVC’04] � Audio-Visual MOT [Gatica et al, in preparation] � Conclusion

  3. MOT as Bayesian inference � the problem: given y t � image observations 1: � a state-space MO representation x , , ) ( 1 ,..., 1 ,..., ) ( 1: 1: K M K M � k k x x � k x t t t t t t t , N N i i � � � k x � � I � R t t � object state: � geometric transformations � discrete indices: head pose, speak � compute posterior or filtering distribution (x | y ) (x | y ) p p 0: 1: 1: t t t t

  4. Joint state space representation � M objects: a joint state � formal { } 1 ( 1 , 1 , 1 , 1 ) j 2 ( 2 , 2 , 2 , 2 ) x x � u v x � u v � � � � t � MO joint configuration: ( , 1 : ) ( , 1 ,..., ) M M X � M x � M x x t t t t object state vector: 3 ( 3 , 3 , 3 , 3 ) x � u v � � ( , , , ) j j j j j x � u v � � spk/no-spk 1 : 1 2 ( , ) ( , , ,..., ) M M X � M x � M x x x translation t t t t t scaling

  5. The basic MOT joint tracker assumptions: � each object has its own dynamics � marginally independent, but conditionally dependent given observations (explaining away) 1 1 1 x x x 1 1 t t t � � 2 2 2 x x x 1 1 t t t � � y y y 1 1 t t t � � t � (x ,y ) ( 1 ) ( 2 ) ( 1 | 1 ) ( 2 | 2 ) ( | 1:2 ) p p x p x p x x p x x p y x � 0: 1: 0 0 n n-1 n n-1 n t t n 1 n �

  6. Particle Filters for MOT Filtering distribution ( | ) ( | ) � ( | ) ( | ) p x y p y x p x x p x y dx � 1 : 1 1 1 : 1 1 t t t t t t � t � t � t � x 1 t � {( ( ) , ( ) ), 1 ,..., } i i x w i � N approximated with particle set t t � N ˆ ( | ) ( ) ( ( ) ) by i i p x y � w x � x � 1: N t t t t t 1. resample 1 i � ( | ) t+1 p x t y 2. prediction 1 t : M � � (x | x ) (x | x ) z z p p t t-1 t-1 t 1 z � ˆ ( | ) p x y 3. likelihood 1: N t t M � � ( | x ) ( z | x ) z p y p y t t t t 1 z �

  7. Complexity for Joint State Space � More objects: cost increases exponentially � Solution: sample more efficiently M N � N 1 M N 2 N 3 N 1

  8. Distributed Partitioned Sampling (DPS) for visual MOT

  9. Partitioned Sampling (PS) Reduces size of B x � search space � Searches each A x B Q � x 1 objects state sequentially � Samples moved to areas of high ’ � 0 . 1 Q Q 0.5 likelihood Example: 2 one- � dimensional objects’ configuration space 0 A x 0 1 0.2 [MacCormick, Isard, Blake, ECCV 2000]

  10. Partitioned Sampling (PS) Divide the space into M subspace partitions; search each sequentially � Block repeats for M objects … ( | ) ~ ( ’ | ) ~g ( | ) ( | ) p X Y p x t x p Y t X p X t Y 1 1 : 1 1 1 t : t � t � t � t prior dynamics likelihood resampling weighted resampling posterior Importance function g Weighted resampling � distribution “IS” using obs likelihood � Adverse effects � impoverishment � bias � particle representation

  11. PS: Ordering and Impoverishment Weighted resampling effects ordering � Impoverishment � Loss of multi-modality � Bias � Poor tracking quality � In general, ordering of objects is arbitrary � More objects, greater effect � Object # 1 2 3 4 5 6 7 impoverishment bias

  12. Distributed Partitioned Sampling (DPS) Block repeats for M objects {1 � �� Mixture components … ( ’ | ) ~g 1 p x t x 1 1 t � Assemble … ( | ) ~ ( | ) ( | ) p X Y p Y t X p X t Y 1 1 : 1 1 t : t � t � t {N �� -1)} � � prior likelihood … ( ’ | ) ~g C p x x 1 C t t � resampling posterior dynamics weighted resampling Each subset: PS in a different ordering circular shift: {1 �� -1)} � ��������� � �

  13. Results *200 particles, examples taken from 50 runs per scenario Joint PF PS DPS Joint PF PS DPS

  14. audio-visual MOT

  15. Audio-visual observation model � Visual 1: contour-based (wire on clutter), edges on normal lines � Visual 2: skin-blob-based precision/recall between configuration and skin blobs � GMM on features � � Audio: switching distribution around 2-D audio estimates , ( ( ) ) 2 ( ( ) ) 2 2 , ( ) � i est i est i K u � u � v � v � R � spk � � 1 t t t t t ( | x ( ) ) audio i p y � � t t , ( ( ) ) 2 ( ( ) ) 2 2 , ( ) _ i est i est i K u � u � v � v � R � no spk � � � 2 t t t t t

  16. Sampling using MCMC � MH sampler � Posterior as target distribution � Better candidates are almost always accepted � Particles where all objects have good guesses

  17. Results (1) Joint PF, contour-only likelihood, 2000p Joint PF, contour-blob likelihood, 1000p

  18. Results (2) Joint PF-MCMC, contour-blob likelihood, 500p Joint PF-MCMC, contour-blob likelihood, 500p, visual clutter

  19. Conclusion � visual tracking + DPS improves MOT because ordering matters + fairly distributes ordering effects + retains computational benefits of PS - not so good for low number of particles (e.g. <100) � audio-visual tracking + blob likelihood improves robustness + joint a-v likelihood allows for fast spk/non-spk switching + MCMC reduces complexity + currently: (re)-initialization + later: extension to more complex models

Recommend


More recommend