cupp a framework for easy cuda integration
play

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 - PowerPoint PPT Presentation

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 University of Kassel Research Group Programming Languages / Methodologies Rom, Italy May 25, 2009 Breitbart CuPP A framework for easy CUDA integration 1 The current state


  1. CuPP – A framework for easy CUDA integration Jens Breitbart 1 1 University of Kassel Research Group Programming Languages / Methodologies Rom, Italy May 25, 2009 Breitbart CuPP – A framework for easy CUDA integration 1

  2. The current state of CUDA development GPU? GPUs are REALLY fast Performance (gflops) Development Time (hours) 3D Filterbank Convolution 0.3 Matlab 0.5 9.0 C/SSE 10.0 110.0 PS3 30.0 330.0 GT200 10.0 Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

  3. Introduction CUDA Overview CUDA is NVIDIAs general purpose programming system In CUDA the GPU (=device) executes a function (=kernel) in the SPMD model The kernels run in their own memory domain and data must be explicitly transfered to/from it Breitbart CuPP – A framework for easy CUDA integration 3

  4. Introduction CUDA ... so what is the problem? dataflow is known dataflow is not known application � � it written in C � � it written in C++ Breitbart CuPP – A framework for easy CUDA integration 4

  5. CuPP Overview CuPP – one solution to the problems Breitbart CuPP – A framework for easy CUDA integration 5

  6. CuPP Overview Device management The developer uses a device handle to identify a GPU / device A handle must by passed to all functions using the device Designed to support multiple devices per thread... ... but this not yet implemented Some magic so far? No, just some basics. Breitbart CuPP – A framework for easy CUDA integration 6

  7. CuPP Overview Memory management Two levels: 1 CUDA like, but C++-ified only use this for short experiments 2 CuPP memory objects A memory object represents data stored at device memory Use them to implement your data structures , not for everyday use We still haven’t solved a problem... ... sorry, but we are close Breitbart CuPP – A framework for easy CUDA integration 7

  8. CuPP Overview Support for classes Recall: CUDA does officially not support C++ “Direct” ... it works with some restrictions Type transformations allow you to use two independent type device type: used @ device host type: used @ host Breitbart CuPP – A framework for easy CUDA integration 8

  9. CuPP Overview Kernel call (Almost) identical to a C++ function call Supports both call by value and call by reference Behaviour is customizable by using call back functions This is were the magic starts ... ... and multiple seconds of compile time are spent Breitbart CuPP – A framework for easy CUDA integration 9

  10. CuPP Overview Data structures ... well currently there is just a std::vector wrapper with some lazy memory copying But you can easily design (or adopt) your own data structures Three steps to add adopt your existing data structure 1 Add memory objects to store your data at the device 2 Create a C-conform device type to be used at the GPU 3 Implement the kernel callback functions to transform between host and device type Breitbart CuPP – A framework for easy CUDA integration 10

  11. CuPP Examples The result of the puzzle 1 cupp :: device dev; 2 cupp :: kernel k(get_fct_ptr (), gridDim , blockDim); 3 4 cupp ::vector <int > one , two; 5 cupp ::vector <int > *input = &one; 6 cupp ::vector <int > *output = &two; 7 8 for (int i=0; i <10; ++i) { 9 k(dev , *input , *output); 10 #if debug 11 for (int i=0; i<output ->size (); ++i) 12 std:: cout << output ->at(i) << ", "; 13 #endif 14 swap (input , output); 15 } 16 17 write_to_file (* input); Breitbart CuPP – A framework for easy CUDA integration 11

  12. Conclusion What we have done CuPP can ... CuPP can’t ... help you integrate manage multiple devices effectivly CUDA into your double buffering ( GPU/CPU application. concurrency) manage your data on the device. Breitbart CuPP – A framework for easy CUDA integration 12

  13. Conclusion What we have (not) done CuPP can ... CuPP can’t ... help you integrate manage multiple devices effectivly CUDA into your double buffering ( GPU/CPU application. concurrency) manage your data on the device. Breitbart CuPP – A framework for easy CUDA integration 12

  14. Conclusion What we have (not) done CuPP can ... CuPP can’t ... help you integrate manage multiple devices effectivly CUDA into your double buffering ( GPU/CPU application. concurrency) manage your data on the device. Thank you � Breitbart CuPP – A framework for easy CUDA integration 12

  15. Conclusion Data structure to speed up k-nn The overall performance is similar to a data structure that can be both created and transfered effectivly. Breitbart CuPP – A framework for easy CUDA integration 13

  16. Conclusion Einstein@Home The existing data structure cannot be transfered to the device Solution? Design a new data structure and rewrite all existing CPU functions or use two types Breitbart CuPP – A framework for easy CUDA integration 14

Recommend


More recommend