GPU-Supported Object Tracking Using Adaptive Appearance Models Bogusław Rymut, Bogdan Kwolek Rzeszów University of Technology This paper describes how Graphics Processor Unit can be effectively used to speed-up the tracking algorithm based on adaptive appearance models. The object tracking is done by particle swarm optimization algorithm. Experimental results show that the GPU implementation of the algorithm exhibits a more than 40-fold speed-up over the CPU implementation. ICCVG 2010
Agenda The problem CUDA programming model Particle Swarm Optimization Problem decomposition Experiments 2
The problem Appearance based object tracking is time-consuming The tracking algorithm must run in real-time GPU implementation of PSO algorithm Real-time tracking using PSO and GPU How to decompose algorithm on GPU 3
Object appearance t-1 t 1 1 Fintess function K 3 f m M I , k i , k i , k k i , k 1 i 1 K 1 initial intensity K i 2 previous intensity I 3 slow changes 4
CPU vs. GPU SIMD Architecture 1. www.nvidia.com 5
CUDA programming model Highly Multithreaded Coprocessor Small set of extensions to C language Low level programming Focus on parallel algorithms 6
CUDA programming model High scalable heterogeneous system CPU & GPU are separate devices with separate DRAMs GPU uses and executes thousand of extremely light threads to achive high performance GPU DEVICE CPU DEVICE 7
Particle Swarm Optimization Stochastic optimization algorithm The optimization is achieved via set of particles Particles collaborate each other in optimization process 8
Particle Swarm Optimization 9
Particle Swarm Optimization ( ) i ( ) i ( ) i ( ) i ( ) i v v c r ( pbest x ) c r ( gbest x ) j j 1 1, j j j 2 2, j j j ( ) i ( ) i ( ) i x x v j j j 10
Particle Swarm Optimization Assign each particle a random position in the 1. problem hyperspace Evaluate the fitness function and find local best 2. value for each particle Find the particle that has the best fitness value 3. Update the velocities and positions of all particles 4. Repeat steps 2-4 until maximum number of 5. iterations is not attained Update appearance model 6. 11
Particle Swarm Optimization 12
Approach to algorithm decomposition Each part of the algorithm has been implemented as kernel function. Every particle has been implemented as thread block 13
Approach to algorithm decomposition 14
Data decomposition 15
Optimization of data access Access to on GPU global memory is bottleneck Correctly data alignments essential to overall performance 16
Experiments PC with Intel Core 2 Quad 2.66 GHz, 1GB RAM PC with nVidia GeForce 9800 GT 14 multiprocessors 1.5 GHz, 1024MB RAM 17
Face tracking Real time Slow motion 18
Experimental results Computation time [ms] CPU [ms] 9800 GT [ms] Speedup #32, 5 it 30.6 1.4 x22.4 #64, 5 it 60.0 1.9 x31.5 #128, 5 it 117.9 3.4 x38.8 #256, 5 it 234,2 5.6 x41.5 19
Conclusions GPU implementation of PSO algorithm has been prepared Our GPU based implementation is 40 times faster than CPU implementation 20
Recommend
More recommend