SymEngine Symbolic Executjon of OpenCL Kernels Alberto Magni
Optjmize code for GPUs Optjmize Memory Accesses 2
GPU Memory Transactjons Coalesced Access GPU Core 1 Load Request = 4 Bytes per Thread 128 Bytes L1 32 Threads GPU Memory Cache 1 Cache Line 3
GPU Memory Transactjons UnCoalesced Access GPU Core 1 Load Request = 4 Bytes per Thread L1 512 Bytes 32 Threads GPU Memory Cache 4 Cache Lines 4
GPU Memory Transactjons UnCoalesced Access GPU Core 1 Load Request = 4 Bytes per Thread L1 512 Bytes 32 Threads GPU Memory Cache 4 Cache Lines Wasted Bandwidth 5
SymEngine Statjcally Detect Suboptjmal Accesses to Memory 6
SymEngine Statjcally Detect Suboptjmal Accesses to Memory OpenCL Kernel int threadID = get_global_id(0); sX = x[threadID]; Resolve Address sY = y[threadId]; sZ = z[threadId]; sQr = Qr[threadId]; sQi = Qi[threadId]; for (int kIndex = 0; (kIndex < KERNEL_ELEMS_PER_GRID); kIndex ++, kGlobalIndex ++) { Compute fmoat expArg = PIx2 * (ck[kIndex].Kx * sX + ck[kIndex].Ky * sY + Number of Transactjons ck[kIndex].Kz * sZ); sQr += ck[kIndex].PhiMag * cos(expArg); sQi += ck[kIndex].PhiMag * sin(expArg); } Qr[threadId] = sQr; Qi[threadId] = sQi ; 7
Symbolic Executjon OpenCL Code Warp-Id Hardware SymEngine Memory Number of Transactjons Threads Input Values 8
Symbolic Executjon Threads in a Warp 0 1 2 3 4 … 29 30 31 Memory Memory Memory Memory ... Instructjon Instructjon Instructjon Instructjon SCEV SCEV SCEV SCEV Address Address Address Address 9
Symbolic Executjon Threads in a Warp 0 1 2 3 4 … 29 30 31 Memory Memory Memory Memory ... Instructjon Instructjon Instructjon Instructjon SCEV SCEV SCEV SCEV Address Address Address Address Number of Cache Transactjon lines touched Number 10
Validatjon – Nvidia GTX480 Against Hardware Performance counters Total HW Transactjons for Black-Scholes HW Counter Program Versions 11
Validatjon – Nvidia GTX480 Against Hardware Performance counters Total HW Transactjons for Black-Scholes HW Counter Predictjon Program Versions 12
Validatjon – Nvidia GTX480 13
Validatjon – Nvidia GTX480 0.99 correlatjon with HW counters 14
It's on GitHub! htup://github.com/HariSeldon/SymEngine 15
Recommend
More recommend