gpgpu computing with opencl
play

GPGPU Computing with OpenCL . Institute for Data Processing and - PowerPoint PPT Presentation

. . . National Research Center of the Helmholtz Association KIT University of the State of Baden-Wuerttemberg and . Matthias Vogelgesang (IPE), Daniel Hilk (IEKP) GPGPU Computing with OpenCL . Institute for Data Processing and


  1. . . . National Research Center of the Helmholtz Association KIT – University of the State of Baden-Wuerttemberg and . Matthias Vogelgesang (IPE), Daniel Hilk (IEKP) GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik . 0 KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . www.kit.edu

  2. . Institut für Experimentelle Kernphysik . Despite Moore’s law, CPUs hit a performance wall . More data is generated, more data has to be processed and analyzed . KIT Institute for Data Processing and Electronics, Motivation . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 1 . GPU architectures can give a higher throughput and better performance

  3. . . . 4500 (SP) / 1500 (DP) GFLOPs (equivalent of supercomputer in 2000) . 6 GB at 288.4 GB/s . Some numbers of NVIDIAs GTX Titan flagship Instruction set is tailored towards math and image operations . Architecture consists of many but rather simple compute cores . GPUs have flexible, programmable pipelines GPUs are heavily optimized towards pixelation of 3D data GPU advantages . Why are GPUs good at what they do? KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 2 . 250 W power consumption

  4. . Limitations ¹4500 GFLOPS / 288.4 GB/s = 16 FLOP/B . Cliché quote: “premature optimization is the root of all evil” . Think about your algorithm first Limited main memory, thus partitioning might be necessary . Bus can become a bottleneck² . High operations-per-memory-access ratios¹ . Optimal performance with regular, parallel tasks . There are no silver bullets KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 3 . ²4500 GFLOPS / 16 GB/s (PCIe 3.0 x16) = 280 FLOP/B O ( c n ) is slow, no matter where you run it

  5. . . . Cross-platform support (Linux, Windows, Mac) . Open, vendor-neutral standard . Why OpenCL? High-level pragmas in OpenACC à la OpenMP since 2012 . OpenCL initiated by Apple first released in 2008/09 . NVIDIA presented CUDA in 2007 Early research prototypes (e.g. Brook) used OpenGL shaders History and Background . Development of GPGPU abstractions KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 4 . Multiple hardware platforms (CPUs, GPUs, FPGAs)

  6. . 5 . Oct. 18ᵗʰ 2013 . M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik KIT OpenCL concepts

  7. . Programming model . 1 processing elements Each CU has . 1 compute units A device has . Devices The devices execute code assigned to them by the host . The host manages resources and schedules execution . . . Platform KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 6 . How CUs and PEs are mapped to hardware is not specified A host controls ≥ 1 platforms (e.g. vendor SDKs) A platform consists of ≥ 1 devices

  8. . . . . . Devices The devices execute code assigned to them by the host . The host manages resources and schedules execution . . Programming model How CUs and PEs are mapped to hardware is not specified Platform KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 6 . A host controls ≥ 1 platforms (e.g. vendor SDKs) A platform consists of ≥ 1 devices A device has ≥ 1 compute units Each CU has ≥ 1 processing elements

  9. . Work is arranged as Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . Grid is split into . work items on a 1D, 2D or 3D grid . . Execution model KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

  10. . . Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . . Grid is split into . work items on a 1D, 2D or 3D grid Work is arranged as Execution model . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

  11. . . Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . . Grid is split into . work items on a 1D, 2D or 3D grid Work is arranged as Execution model . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

  12. . . Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . . Grid is split into . work items on a 1D, 2D or 3D grid Work is arranged as Execution model . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

  13. . In most cases it corresponds to the innermost body of a for loop, e.g. from . Location relative to the global grid . Location relative to the work group . A kernel has implicit parameters to identify itself . x[i] = sin(y[i]) + 0.5 * (x[i-1] + x[i+1]); you would extract the kernel x[i] = sin(y[i]) + 0.5 * (x[i-1] + x[i+1]); for (int i = 1; i < N-1; i++) . Kernel A kernel is a piece of code executed by each work item . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 8 . Number of work groups/items

  14. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  15. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  16. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  17. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  18. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  19. . 10 . Oct. 18ᵗʰ 2013 . M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik KIT OpenCL API

  20. . FPGA 1.0 Altera 1.1¹ Apple 1.2 Intel 1.2 AMD Implementations 1.1 NVIDIA OS ¹ OpenCL 1.2 from OS X 10.9 CPU . . 11 . Oct. 18ᵗʰ 2013 GPU M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Vendor Rev. KIT Institut für Experimentelle Kernphysik ✓ ✗ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ ✓

Recommend


More recommend