GPU-Accelerated Object Tracking Using Particle Filtering and Appearance-adaptive Models Bogusław Rymut, Bogdan Kwolek Rzeszów University of Technology In this work we present an object tracking algorithm running on GPU. The tracking is achieved by a particle filter using appearance-adaptive models. The main focus of our work is parallel computation of the particle weights. The tracker yields promising GPU/CPU speed-up. We demonstrate that the GPU implementation of the algorithm that runs with 256 particles is about 30 times faster than the CPU implementation. Practical implementation issues in the CUDA framework are discussed. The algorithm has been tested on freely available test sequences. International Conference on Image Processing & Communications 2010
Agenda The problem CUDA programming model Particle Filtering Problem decomposition Experiments 2
The problem Appearance based object tracking is time-consuming The tracking algorithm must run in real-time GPU implementation of PF algorithm Real-time tracking using PF and GPU How to decompose algorithm on GPU 3
Object appearance t-1 t 1 1 Fitness function 3 K , f m M I , , , k i k i k k i 1 1 k i 1 K initial intensity K 2 i previous intensity I 3 slow changes 4
CPU vs. GPU SIMD Architecture 1. www.nvidia.com 5
CUDA programming model Highly Multithreaded Coprocessor Small set of extensions to C language Low level programming Focus on parallel algorithms 6
CUDA programming model High scalable heterogeneous system CPU & GPU are separate devices with separate DRAMs GPU uses and executes thousand of extremely light threads to achieve high performance GPU DEVICE CPU DEVICE 7
The problem of object tracking The goal is to find the same object in the sequence of images In simplest approach this can be achieved via brute-force based searching 8
Tracknig - Probabilistic Approach One of the goals of visual tracking is to estimate the states of the objects of interest from image sequences. observation hidden state The problem of tracking can be formulated as the Bayesian filtering p z x p z z p z x dx 1 1 1: 1 1 t 1:t-1 t t t t t x Z where , and denote the hidden state of the object of interest and k k z observation vector at discrete time , respectively, whereas , k t denotes all the observations ut to current time step 9
Particle Filtering x x x ruch obserwacja ( | ) ( | ) ( | ) p Z p Z p Z 1 1 1 t t t t t t M ( ) ( ) i i , Starting with a weighted particle set approximately S x w 1 1 t t 1 i ( | ) distributed according to the particle filter operates p x Z 1 1 t t through predicting new samples from a proposal distribution. M To give a new particle representation of the posterior ( ) ( ) i i , S x w t t 1 i ( | ) density the samples are set to : p x Z t t i i i p x p x x z 1 t t t t i i w w t 1 t 1 i i 1 , q x x z t t t 2 1 ( ) f x ( | ) exp t p z x t t 2 2 2 Each sample represents the hypothetical state of the object 10
Particle Filtering For i = 1 , 2 , . . . , M sample or propose 1. particles using p x x 1 t t For i = 1 , 2 , . . . , M calculate the 2. i w weights i i i k w w p z x t t t t i i w Normalize the weights using w 3. t k M ˆ i i Calculate the state estimates x w x 4. t t t 1 i i i , x w Resample to get new set of 5. t t particles i i , 1/ x w M t t 11
Particle Filtering time observation prediction time 12
Approach to algorithm decomposition Each part of the algorithm has been implemented as kernel function. Every particle has been implemented as thread block. 13
Approach to algorithm decomposition 14
Data decomposition 15
Optimization of data access Access to on GPU global memory is bottleneck Correctly data alignments essential to overall performance 16
Experiments PC with Intel Core 2 Quad 2.66 GHz, 1GB RAM PC with nVidia GeForce 9800 GT 14 multiprocessors 1.5 GHz, 1024MB RAM 17
Face tracking Real time Slow motion 18
Experimental results Computation time [ms] CPU [ms] 9800 GT [ms] Speedup #32 16.53 1.30 x12.8 #64 32.27 1.80 x18.3 #128 62.65 2.70 x24.4 #256 123.73 4.17 x29.5 #512 243.19 7.51 X32.4 19
Conclusions GPU implementation of PF algorithm has been prepared Our GPU based implementation is 30 times faster than CPU implementation 20
Recommend
More recommend