gpu accelerated object tracking using
play

GPU-Accelerated Object Tracking Using Particle Filtering and - PowerPoint PPT Presentation

GPU-Accelerated Object Tracking Using Particle Filtering and Appearance-adaptive Models Bogusaw Rymut, Bogdan Kwolek Rzeszw University of Technology In this work we present an object tracking algorithm running on GPU. The tracking is


  1. GPU-Accelerated Object Tracking Using Particle Filtering and Appearance-adaptive Models Bogusław Rymut, Bogdan Kwolek Rzeszów University of Technology In this work we present an object tracking algorithm running on GPU. The tracking is achieved by a particle filter using appearance-adaptive models. The main focus of our work is parallel computation of the particle weights. The tracker yields promising GPU/CPU speed-up. We demonstrate that the GPU implementation of the algorithm that runs with 256 particles is about 30 times faster than the CPU implementation. Practical implementation issues in the CUDA framework are discussed. The algorithm has been tested on freely available test sequences. International Conference on Image Processing & Communications 2010

  2. Agenda  The problem  CUDA programming model  Particle Filtering  Problem decomposition  Experiments 2

  3. The problem  Appearance based object tracking is time-consuming  The tracking algorithm must run in real-time  GPU implementation of PF algorithm  Real-time tracking using PF and GPU  How to decompose algorithm on GPU 3

  4. Object appearance t-1 t 1 1 Fitness function 3 K         , f m M I , , , k i k i k k i   1 1 k i  1 K initial intensity K    2 i previous intensity  I 3  slow changes 4

  5. CPU vs. GPU SIMD Architecture 1. www.nvidia.com 5

  6. CUDA programming model  Highly Multithreaded Coprocessor  Small set of extensions to C language  Low level programming  Focus on parallel algorithms 6

  7. CUDA programming model  High scalable heterogeneous system CPU & GPU are separate devices with separate  DRAMs GPU uses and executes thousand of extremely  light threads to achieve high performance GPU DEVICE CPU DEVICE 7

  8. The problem of object tracking  The goal is to find the same object in the sequence of images  In simplest approach this can be achieved via brute-force based searching 8

  9. Tracknig - Probabilistic Approach One of the goals of visual tracking is to estimate the states of the  objects of interest from image sequences. observation hidden state The problem of tracking can be formulated as the Bayesian filtering          p z x p z z p z x dx     1 1 1: 1 1 t 1:t-1 t t t t t x Z where , and denote the hidden state of the object of interest and k k z observation vector at discrete time , respectively, whereas , k t denotes all the observations ut to current time step 9

  10. Particle Filtering         x x x ruch obserwacja ( | ) ( | ) ( | ) p Z p Z p Z    1 1 1 t t t t t t     M  ( ) ( ) i i , Starting with a weighted particle set approximately S x w    1 1 t t  1 i ( | ) distributed according to the particle filter operates p x Z   1 1 t t through predicting new samples from a proposal distribution.     M  To give a new particle representation of the posterior ( ) ( ) i i , S x w t t  1 i ( | ) density the samples are set to : p x Z t t     i i i p x p x x z  1 t t t t  i i w w     t 1 t 1 i i 1 , q x x z  t t t   2 1 ( ) f x   ( | ) exp  t  p z x  t t   2  2  2 Each sample represents the hypothetical state of the object  10

  11. Particle Filtering For i = 1 , 2 , . . . , M sample or propose 1.   particles using p x x  1 t t For i = 1 , 2 , . . . , M calculate the 2.   i  w weights i i i k w w p z x t t t t i i w Normalize the weights using w 3. t k   M ˆ i i Calculate the state estimates x w x 4.  t t t   1 i i i , x w Resample to get new set of 5. t t    particles i i , 1/ x w M t t 11

  12. Particle Filtering time observation prediction time 12

  13. Approach to algorithm decomposition  Each part of the algorithm has been implemented as kernel function.  Every particle has been implemented as thread block. 13

  14. Approach to algorithm decomposition 14

  15. Data decomposition 15

  16. Optimization of data access  Access to on GPU global memory is bottleneck  Correctly data alignments essential to overall performance 16

  17. Experiments  PC with Intel Core 2 Quad 2.66 GHz, 1GB RAM  PC with nVidia GeForce 9800 GT 14 multiprocessors 1.5 GHz, 1024MB RAM 17

  18. Face tracking Real time Slow motion 18

  19. Experimental results Computation time [ms] CPU [ms] 9800 GT [ms] Speedup #32 16.53 1.30 x12.8 #64 32.27 1.80 x18.3 #128 62.65 2.70 x24.4 #256 123.73 4.17 x29.5 #512 243.19 7.51 X32.4 19

  20. Conclusions  GPU implementation of PF algorithm has been prepared  Our GPU based implementation is 30 times faster than CPU implementation 20

Recommend


More recommend