t hread hierarchy on cuda gpu
play

T HREAD HIERARCHY ON CUDA GPU In CUDA, threads are grouped in blocks - PowerPoint PPT Presentation

GPU-based Massively Parallel Implementation of Metaheuristic Algorithms GPU- BASED M ASSIVELY P ARALLEL I MPLEMENTATION OF M ETAHEURISTIC A LGORITHMS Robert Nowotniak, Jacek Kucharski Computer Engineering Department Technical University of Lodz


  1. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms GPU- BASED M ASSIVELY P ARALLEL I MPLEMENTATION OF M ETAHEURISTIC A LGORITHMS Robert Nowotniak, Jacek Kucharski Computer Engineering Department Technical University of Lodz SŁOK, June 15-17, 2011 Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011

  2. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms T HREAD HIERARCHY ON CUDA GPU In CUDA, threads are grouped in blocks and blocks constitute a grid . The unit of thread scheduling is warp (32 threads). Grid of Thread Blocks Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 1 / 7

  3. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms P ROPOSED APPROACH TO PARALLELIZATION Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 2 / 7

  4. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms GPU- BASED IMPLEMENTATION OF M ETAHEURISTICS Two levels: 1 Coarse-grained parallelization In a grid, there can be several hundred blocks evolving independent populations with same or different parameters simultaneously. 2 Fine-grained parallelization On the population level, each individual can be evaluated and transformed in a separate GPU thread. Thus, the whole population can be represented as a block of threads. Hundreds of populations with same or different parameters can be evolved in parallel, simultaneously. Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 3 / 7

  5. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms GPU- BASED IMPLEMENTATION OF M ETAHEURISTICS Two levels: 1 Coarse-grained parallelization In a grid, there can be several hundred blocks evolving independent populations with same or different parameters simultaneously. 2 Fine-grained parallelization On the population level, each individual can be evaluated and transformed in a separate GPU thread. Thus, the whole population can be represented as a block of threads. Hundreds of populations with same or different parameters can be evolved in parallel, simultaneously. Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 3 / 7

  6. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms P ERFORMANCE COMPARISON Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 4 / 7

  7. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms P ERFORMANCE COMPARISON Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 4 / 7

  8. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms R ESULTS 1 Pentium-III 500MHz (Visual C++ 6.0) 0.723 experiments / second (according to [ 1 ]) 2 Intel Core i7 2.93GHz (1 core, ANSI C) 7.33 experiments / second 3 NVidia GTX 295 (CUDA C) 890 experiments / second ( about 120x speedup ) 4 8 GPUs (GTX295+GTX285+Tesla s1070+Tesla C2070) 3089 experiments / second ( over 400x speedup ) 1 Han, K. H., Kim, J. H.: Genetic quantum algorithm and its application to combinatorial optimization problem. Proceedings of the 2000 Congress on Evolutionary computation, 2000 Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 5 / 7

  9. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms C ORRECTNESS VERIFICATION Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 6 / 7

  10. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms C ORRECTNESS VERIFICATION Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011 7 / 7

  11. GPU-based Massively Parallel Implementation of Metaheuristic Algorithms Thank you for your attention Robert Nowotniak, Jacek Kucharski SŁOK, June 15-17, 2011

Recommend


More recommend