LLVM AMDGPU for High Performance Computing: are we competitive yet? Vedran Miletić, HITS gGmbH Szilárd Páll, KTH Frauke Gräter, HITS gGmbH
Layers of GPU computing GPU accelerated apps and libraries CUDA OpenCL Our work NVIDIA AMD proprietary Clang Clang proprietary compiler compiler AMD proprietary Libclc, LLVM, Mesa, and NVIDIA proprietary driver driver amdgpu/nouveau
State of the art: CUDA and OpenCL ● CUDA ● OpenCL – 338 applications listed at – ~70 applications listed NVIDIA’s website on Wikipedia ● ~30 in Scientifjc – Over 50% market share in computing category Top 500 (Nov 2016) ● Couple of benchmarks and toys
OpenCL applications ● Image taken from: Ribeiro, João V., et al. "QwikMD—Integrative Molecular Dynamics Toolkit for Novices and Experts." Scientifjc reports 6 (2016). ● Focus on GROMACS , LAMMPS , OpenMM, ASL
Running open source OpenCL stack on Radeon/FirePro/FireStream ● AMD’s proprietary OpenCL driver and compiler – GPUs released 2012 or later – Will be open sourced soon™ ● Mesa/LLVM – AMD GPUs released 2009 or later – Open source from the beginning™
Our work ● No changes or minor changes in apps/libs ● Improvements to LLVM, Clang, libclc, Mesa – Missing math functions, OpenCL 1.2 API calls – Bug fjxes
GROMACS OpenCL kernel execution time GROMACS OpenCL kernel execution time 140 120 140 100 80 AMDGPU-PRO Time 120 60 AMDGPU 40 100 20 0 AMDGPU-PRO 80 1,5 3 6 12 24 48 96 192 384 768 1536 3072 Time AMDGPU Systerm size 60 40 20 0 1,5 3 6 12 24 48 96 192 384 768 1536 3072 Systerm size
LAMMPS example execution time 80 OpenMM test execution time 70 60 60 50 40 50 30 AMDGPU-PRO AMDGPU-PRO 20 Time 40 Time 10 AMDGPU AMDGPU 0 30 AndersenThermostat BrownianIntegrator LangevinIntegrator VerletIntegrator 20 10 Test 0 melt_imd-gpu Example
ASL test execution time 50 45 40 35 30 AMDGPU-PRO Time AMDGPU 25 20 15 10 5 0 testKernel testKernelMerger testPrivateVar Test
Other OpenCL software ● Blender – Different users report performance issues and crashes ● BEAGLE, phylogenetics library – Made some progress ● clBLAS and clFFT – Implmented clEnqueueFillBuffer, requires more work – Required for Octopus (quantum chem), probably others
Other OpenCL software ● BOINC, CP2K, Theano – Had users tell me “I would try it if worked” ● clpeak, opencl-stream, SNU NPB – Benchmarks ● App or lib you care about?
Acknowledgments ● Matt Arsenault, AMD ● Jan Vesely, Aaron Watry and Serge Martin, Mesa contributors ● Francisco Jerez, Intel ● Peter Eastman, OpenMM ● Tom Stellard, Red Hat ● Freenode channel #radeon
Recommend
More recommend