llvm amdgpu for high performance computing are we
play

LLVM AMDGPU for High Performance Computing: are we competitive yet? - PowerPoint PPT Presentation

LLVM AMDGPU for High Performance Computing: are we competitive yet? Vedran Mileti, HITS gGmbH Szilrd Pll, KTH Frauke Grter, HITS gGmbH Layers of GPU computing GPU accelerated apps and libraries CUDA OpenCL Our work NVIDIA AMD


  1. LLVM AMDGPU for High Performance Computing: are we competitive yet? Vedran Miletić, HITS gGmbH Szilárd Páll, KTH Frauke Gräter, HITS gGmbH

  2. Layers of GPU computing GPU accelerated apps and libraries CUDA OpenCL Our work NVIDIA AMD proprietary Clang Clang proprietary compiler compiler AMD proprietary Libclc, LLVM, Mesa, and NVIDIA proprietary driver driver amdgpu/nouveau

  3. State of the art: CUDA and OpenCL ● CUDA ● OpenCL – 338 applications listed at – ~70 applications listed NVIDIA’s website on Wikipedia ● ~30 in Scientifjc – Over 50% market share in computing category Top 500 (Nov 2016) ● Couple of benchmarks and toys

  4. OpenCL applications ● Image taken from: Ribeiro, João V., et al. "QwikMD—Integrative Molecular Dynamics Toolkit for Novices and Experts." Scientifjc reports 6 (2016). ● Focus on GROMACS , LAMMPS , OpenMM, ASL

  5. Running open source OpenCL stack on Radeon/FirePro/FireStream ● AMD’s proprietary OpenCL driver and compiler – GPUs released 2012 or later – Will be open sourced soon™ ● Mesa/LLVM – AMD GPUs released 2009 or later – Open source from the beginning™

  6. Our work ● No changes or minor changes in apps/libs ● Improvements to LLVM, Clang, libclc, Mesa – Missing math functions, OpenCL 1.2 API calls – Bug fjxes

  7. GROMACS OpenCL kernel execution time GROMACS OpenCL kernel execution time 140 120 140 100 80 AMDGPU-PRO Time 120 60 AMDGPU 40 100 20 0 AMDGPU-PRO 80 1,5 3 6 12 24 48 96 192 384 768 1536 3072 Time AMDGPU Systerm size 60 40 20 0 1,5 3 6 12 24 48 96 192 384 768 1536 3072 Systerm size

  8. LAMMPS example execution time 80 OpenMM test execution time 70 60 60 50 40 50 30 AMDGPU-PRO AMDGPU-PRO 20 Time 40 Time 10 AMDGPU AMDGPU 0 30 AndersenThermostat BrownianIntegrator LangevinIntegrator VerletIntegrator 20 10 Test 0 melt_imd-gpu Example

  9. ASL test execution time 50 45 40 35 30 AMDGPU-PRO Time AMDGPU 25 20 15 10 5 0 testKernel testKernelMerger testPrivateVar Test

  10. Other OpenCL software ● Blender – Different users report performance issues and crashes ● BEAGLE, phylogenetics library – Made some progress ● clBLAS and clFFT – Implmented clEnqueueFillBuffer, requires more work – Required for Octopus (quantum chem), probably others

  11. Other OpenCL software ● BOINC, CP2K, Theano – Had users tell me “I would try it if worked” ● clpeak, opencl-stream, SNU NPB – Benchmarks ● App or lib you care about?

  12. Acknowledgments ● Matt Arsenault, AMD ● Jan Vesely, Aaron Watry and Serge Martin, Mesa contributors ● Francisco Jerez, Intel ● Peter Eastman, OpenMM ● Tom Stellard, Red Hat ● Freenode channel #radeon

Recommend


More recommend