Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014.
Outline ▶ What is OpenCL™? ▶ VecGeom in a few words ▶ What are the goals? ▶ Results , conclusions Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 1 / 16
What is OpenCL? Khronos Group ▶ Khronos Group: founded in January 2000 ▶ Non-profit member-funded consortium focused on the creation of royalty-free open standards for parallel computing, graphics and dynamic media ( ∼ 100 copmanies) ▶ Currently 15 active standards, including OpenCL™ ▶ OpenCL (Open Computing Language) is a framework that increases application performance by enabling efficient parallel programming of a variety of CPUs, GPUs ▶ Initially developed by Apple ▶ OpenCL 1.0 was released on August 28, 2009 ▶ C++ Wrapper API: built on the top of the OpenCL C API 1.2 Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 2 / 16
What is OpenCL? OpenCL and C++ ▶ AMD extension for C++ kernel language > major supported C++ features: kernel overloading, templates, namespaces, references… > major unsupported C++ features: virtual functions, abstract classes, dynamic memory allocation, the :: operator, STL and other standard C++ libraries… ▶ SYCL : ,,royalty-free, cross-platform C++ abstraction layer that builds on the underlying concepts, portability and efficiency of OpenCL, while adding the ease-of-use and flexibility of C++'' ▶ Provisional SYCL 1.2 specification was released on March 19, 2014 Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 3 / 16
What is OpenCL? Heterogeneous systems ▶ Hierarchy of models: > platform model > execution model >> host programs >> kernels > memory model Illustrations taken from: Introduction to OpenCL™ Programming, AMD Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 4 / 16
What is OpenCL? host/device code host code kernel code cl::Context __kernel void kernelfuntion ( context (CL_DEVICE_TYPE_DEFAULT); __global float* in, __global float* out ){ std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>(); int id = get_global_id(0); out[id] = in[id] * in[id]; cl::Program } program(context, util::loadProgram(clFile), (...)); cl::CommandQueue queue(context); scalar code auto KernelFunctor = void function ( int datapoints, cl::make_kernel<cl::Buffer&, (...)> float* in, (program, "kernelfunction"); float* out ){ cl::Buffer inputbuf = for ( int i = 0; i < datapoints; i++ ){ cl::Buffer(context, CL_MEM_USE_HOST_PTR, out[i] = in[i] * in[i]; sizeof(Precision)*datapoints, input); } } KernelFunctor (cl::EnqueueArgs(queue, cl::NDRange(datapoints)), inputbuf, (...)); queue.finish(); Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 5 / 16
What is OpenCL? Benefits ▶ OpenCL can manage all devices[devid].getInfo<cl_device_info>(); available computational resources where cl_device_info can be ▶ Software portability: all the ▶ CL_DEVICE_TYPE hardware implementation ▶ CL_DEVICE_VENDOR_ID specifics (such as drivers and runtime) are invisible to the ▶ CL_DEVICE_MAX_COMPUTE_UNITS upper-level software ▶ CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS programmers ▶ CL_DEVICE_MAX_WORK_ITEM_SIZES ▶ CL_DEVICE_MAX_WORK_GROUP_SIZES ▶ Very highly customizable: ▶ CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE the developer can choose ▶ CL_DEVICE_MAX_CLOCK_FREQUENCY the best hardware without ▶ CL_DEVICE_GLOBAL_MEM_SIZES having to reshuffle the ▶ … upper software infrastructure Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 6 / 16
VecGeom Vectorized Geometry Generic kernel ▶ Parallelism in Input size > particle-level > primitive-level 1 ! Scalar (autovectorized?) ! ▶ Support multiple instantiation … architectures without having ! multiple implementations ! Vectorized 4 ! instantiation ! … ! GPU ! instantiation 1024 Illustration borrowed from: First experience with portable high-performance geometry code on GPU, J.d.F.L. Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 7 / 16
VecGeom Generic templated code template <TranslationCode transCodeT, RotationCode rotCodeT> ▶ Generic templated code template <class Backend> VECGEOM_CUDA_HEADER_BOTH void ParallelepipedImplementation<transCodeT, rotCodeT>::DistanceToOut( ▶ Backend is specified during UnplacedParallelepiped const &unplaced, Vector3D<typename Backend::precision_v> const &point, compile time Vector3D<typename Backend::precision_v> const &direction, typename Backend::precision_v const &stepMax, typename Backend::precision_v &distance) { ▶ The specific method for a ...vectorized, generic computation... specific volume is called in a generic way } ▶ Calling a method for a given volume: v->Specialized() ->DistanceToOut(points, directions, fStepMax, distances); Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 8 / 16
VecGeom + OpenCL Goals and motivations ▶ Goals : > investigate the OpenCL support with the least possible modification of the existing code > implement an OpenCL API (without creating too much extra code) > investigate the performance with AMD GPU's ▶ Motivations : > architecture independency > for GPU's the CUDA backend already exists, but there is no support for AMD cards ▶ Target of investigation: > simplest volume: Box ▶ Used: > AMD Accelerated Parallel Processing (APP) SDK > AMD OpenCL Static C++ Kernel Language Extension Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 9 / 16
VecGeom + OpenCL Experiences ▶ The main problems are the unsupported C++ features like STL libraries > macros are not a good solution ▶ Structure of the kernel functions is different compared to the generic volume methods int3 id = (int3) ( get_global_id(0), get_global_id(1), get_global_id(2)); int gSize = get_global_size(0); get_local_id(0); get_num_groups(0); ... Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 10 / 16
VecGeom + OpenCL Experiences ▶ OpenCL kernel launching process needs a different approach ▶ Structure of the kernel functions is different compared to the generic volume methods CommandQueue::enqueueNDRangeKernel(Kernel &kernel, NDRange &offset, NDRange &global, NDRange &local, VECTOR_CLASS<Event> *events, Event *event); ▶ Currently the OpenCL support of the volume methods is feasible only with new, external codes > minimal modification of existing codes (code maintenance) > new kernel codes with different working mechanism > with OpenCL one should receive at least as fast results as with the vectorized code Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 11 / 16
VecGeom + OpenCL Benchmarks Figure : Benchmark with an Intel Core i3 M350 CPU Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 12 / 16
VecGeom + OpenCL Benchmarks Figure : Benchmark with an ATi Radeon HD5970 GPU Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 13 / 16
VecGeom + OpenCL Benchmarks Figure : Benchmark with an AMD Opteron 6376 CPU Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 14 / 16
Conclusions ▶ Memory management with OpenCL is a difficult task ▶ Optimization and more benchmarks needed ▶ Current version of OpenCL needs some effort ▶ Need further investigations with OpenCL/SYCL Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 15 / 16
backup slides
Kernel of the DistanceToOut __kernel void FastDistanceToOut(__global double* dimensions, __global double* posX, __global double* posY, __global double* posZ, __global double* dirX, __global double* dirY, __global double* dirZ, __global double* distances, unsigned PointCount){ unsigned rem = PointCount%2; for (unsigned i = 0; i < PointCount-rem; i=i+2){ double2 distX = (double2)((( copysign(dimensions[0], dirX[i] ) - posX[i] ) * ( 1. / ( dirX[i]+1e-30 ) )), (( copysign(dimensions[0], dirX[i+1]) - posX[i+1] ) * ( 1. / ( dirX[i+1]+1e-30 ) ))); double2 distY = (double2)((( copysign(dimensions[1], dirY[i]) - posY[i] ) * ( 1. / ( dirY[i]+1e-30 ) )), (( copysign(dimensions[1], dirY[i+1]) - posY[i+1] ) * ( 1. / ( dirY[i+1]+1e-30 ) ))); double2 distZ = (double2)((( copysign(dimensions[2], dirZ[i]) - posZ[i] ) * ( 1. / ( dirZ[i]+1e-30 ) )), (( copysign(dimensions[2], dirZ[i+1]) - posZ[i+1] ) * ( 1. / ( dirZ[i+1]+1e-30 ) ))); double2 temp = (double2) ( fmin( distX.x, distY.x), distZ.x ); double2 temp2 = (double2) ( fmin( distX.y, distY.y), distZ.y ); distances[i] = fmin( temp.x, temp.y); distances[i+1] = fmin( temp2.x, temp2.y ); } } Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 15 / 16
Illustration borrowed from: Kronos Overview, November 2012 Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 16 / 16
Recommend
More recommend