symengine
play

SymEngine Symbolic Executjon of OpenCL Kernels Alberto Magni - PowerPoint PPT Presentation

SymEngine Symbolic Executjon of OpenCL Kernels Alberto Magni Optjmize code for GPUs Optjmize Memory Accesses 2 GPU Memory Transactjons Coalesced Access GPU Core 1 Load Request = 4 Bytes per Thread 128 Bytes L1 32 Threads GPU Memory


  1. SymEngine Symbolic Executjon of OpenCL Kernels Alberto Magni

  2. Optjmize code for GPUs Optjmize Memory Accesses 2

  3. GPU Memory Transactjons Coalesced Access GPU Core 1 Load Request = 4 Bytes per Thread 128 Bytes L1 32 Threads GPU Memory Cache 1 Cache Line 3

  4. GPU Memory Transactjons UnCoalesced Access GPU Core 1 Load Request = 4 Bytes per Thread L1 512 Bytes 32 Threads GPU Memory Cache 4 Cache Lines 4

  5. GPU Memory Transactjons UnCoalesced Access GPU Core 1 Load Request = 4 Bytes per Thread L1 512 Bytes 32 Threads GPU Memory Cache 4 Cache Lines Wasted Bandwidth 5

  6. SymEngine Statjcally Detect Suboptjmal Accesses to Memory 6

  7. SymEngine Statjcally Detect Suboptjmal Accesses to Memory OpenCL Kernel int threadID = get_global_id(0); sX = x[threadID]; Resolve Address sY = y[threadId]; sZ = z[threadId]; sQr = Qr[threadId]; sQi = Qi[threadId]; for (int kIndex = 0; (kIndex < KERNEL_ELEMS_PER_GRID); kIndex ++, kGlobalIndex ++) { Compute fmoat expArg = PIx2 * (ck[kIndex].Kx * sX + ck[kIndex].Ky * sY + Number of Transactjons ck[kIndex].Kz * sZ); sQr += ck[kIndex].PhiMag * cos(expArg); sQi += ck[kIndex].PhiMag * sin(expArg); } Qr[threadId] = sQr; Qi[threadId] = sQi ; 7

  8. Symbolic Executjon OpenCL Code Warp-Id Hardware SymEngine Memory Number of Transactjons Threads Input Values 8

  9. Symbolic Executjon Threads in a Warp 0 1 2 3 4 … 29 30 31 Memory Memory Memory Memory ... Instructjon Instructjon Instructjon Instructjon SCEV SCEV SCEV SCEV Address Address Address Address 9

  10. Symbolic Executjon Threads in a Warp 0 1 2 3 4 … 29 30 31 Memory Memory Memory Memory ... Instructjon Instructjon Instructjon Instructjon SCEV SCEV SCEV SCEV Address Address Address Address Number of Cache Transactjon lines touched Number 10

  11. Validatjon – Nvidia GTX480 Against Hardware Performance counters Total HW Transactjons for Black-Scholes HW Counter Program Versions 11

  12. Validatjon – Nvidia GTX480 Against Hardware Performance counters Total HW Transactjons for Black-Scholes HW Counter Predictjon Program Versions 12

  13. Validatjon – Nvidia GTX480 13

  14. Validatjon – Nvidia GTX480 0.99 correlatjon with HW counters 14

  15. It's on GitHub! htup://github.com/HariSeldon/SymEngine 15

Recommend


More recommend