OPENACC: ACCELERATE KIRCHHOFF 2D MIGRATION Ken Hester: NVIDIA - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley OPENACC: ACCELERATE KIRCHHOFF 2D MIGRATION Ken Hester: NVIDIA Solution Architect | Oil &Gas

EXPLORATION & PRODUCTION WORKFLOW Acquire Seismic Data Process Seismic Data Interpret Seismic Data Characterize Reservoirs Simulate Reservoirs Drill Wells 2 2 Images courtesy Schlumberger

EXPLORATION & PRODUCTION WORKFLOW Acquire Seismic Data Process Seismic Data Interpret Seismic Data Characterize Reservoirs Simulate Reservoirs Drill Wells 3 3 Images courtesy Schlumberger

HOW DO YOU PORT TO GPU S ? Assess Deploy Parallelize Optimize 4 4

3 WAYS TO ACCELERATE APPLICATIONS Applications Programming OpenACC Libraries Languages Directives “Drop - in” Easily Accelerate Maximum Acceleration Applications Flexibility 5 5

KIRCHHOFF 2D CASE STUDY Center for Wave Phenomena Download Seismic Unix ftp://ftp.cwp.mines.edu/pub/cwpcodes/cwp_su_all_43R8.tgz Set environment variables: CWPROOT, PATH Unpack, edit Makefile.config, build Use PGI compilers (CC=pgcc, FC=pgfortran) OPTC=-g, FFLAGS=$(FOPTS) 6 6

KIRCHHOFF 2D CASE STUDY Seismic Unix (SU) datasets Download Marmousi data, velocity, and density files http://www.trip.caam.rice.edu/downloads/ieee.tar.gz Convert SEGY format to SU format #!/bin/bash segyread tape=data.segy conv=0 endian=0 > data.su segyread tape=velocity.segy conv=0 endian=0 > velocity.su suflip flip=0 < velocity.su > velocity1.su sustrip < velocity1.su > velocity.h@ ftn=0 suwind < data.su > data1.su tmax=2.9 7 7

KIRCHHOFF 2D CASE STUDY Smooth, build ray trace model, migrate #!/bin/bash # smoothing time smooth2 < $vfile n1=$nz n2=$nx r1=20 r2=20 nz = 751 >smoothvel nx = 2301 dz = 4 # raytrace dx = 4 time rayt2d < smoothvel dt=0.004 nt=751 dz=$dz nz=$nz dx=$dx nx=$nx fxo=0 dxo=25 nxo=369 fxs=0 dxs=100 nxs=93 nt = 750 >$tfile ntr= 96 dt = 4000 # migrate (Example) ifile = data1.su sukdmig2d infile=$ifile datain=$ifile outfile=$ofile ofile = datamig.su dataout=$ofile ttfile=$tfile fzt=0 dzt=4 nzt=751 fxt=0 tfile = tfile dxt=25 nxt=369 fs=0 ns=93 ds=100 nzo=751 dzo=4 dxm=25 vfile = velocity.h@ mtr=1 8 8

Assess KIRCHHOFF 2D CASE STUDY Profile - Use PGI tools pgcollect sukdmig2d pgprof – exe sukdmig2d Function Percent runtime mig2d 77 % sum2 9% resit < 1% 9 9

Parallelize KIRCHHOFF 2D CASE STUDY void sum2(int nx, int nz,float a1,float a2, float ** restrict t1, float ** restrict t2, float ** Use Managed memory restrict t) Compiler handles data movement { int ix,iz; Parallel Directives #pragma acc parallel for restrict on pointers! for(ix=0; ix < nx; ++ix) www.wikipedia.org/wiki/Restrict { for(iz=0; iz < nz; ++iz) t[ix][iz] = a1*t1[ix][iz]+a2*t2[ix][iz]; #pragma } Parallelize outer for loops } Compiler parallelizes Inner loop 10 10

Parallelize KIRCHHOFF 2D CASE STUDY mig2d: #pragma acc parallel for Parallel Directives for (ix=nxtf; ix<=nxte; ++ix) { #pragma . . . Parallelize for loops #pragma acc loop for (iz=izt0; iz<nzt; ++iz) { Vectorize . . . Compiler vectorizes inner loops 11 11

Parallelize KIRCHHOFF 2D CASE STUDY Compile Resit (managed): 537, Accelerator kernel generated pgcc – acc – ta=tesla:managed Generating Tesla code 538, #pragma acc loop gang /* blockIdx.x */ 553, #pragma acc loop vector(128) /* threadIdx.x */ Resolve Errors! 540, Loop carried dependence of t->-> prevents parallelization Loop carried backward dependence of t->-> prevents vectoriz Parallel Directives Resit: #pragma #pragma acc parallel for for (ix=0; ix<nx; ++ix) Parallelize for outer loop { #pragma acc loop Parallelize inner loops for (is=0; is<ns; ++is) Resolve loop carried { dependence . . . #pragma acc loop for (iz=0; iz<nz; ++iz) Add acc loop directive t[ix][iz] -= sr0*tb[jr][iz]+sr*tb[jr+1][iz]; 12 12

Parallelize KIRCHHOFF 2D CASE STUDY SUKDMIG2D Configuration Model Size Cores Elapsed Time Speed (s) up CPU Only (Baseline) 2x E5-2698 v3 2.30GHz 2301 x 751 1 218 1.00 NVIDIA OpenACC (Managed) 1x K40 2301 x 751 2880 46 4.70 Now optimize using the Verbose output from compiler! 13 13

Optimize KIRCHHOFF 2D CASE STUDY ==55246== Profiling result: Time(%) Time Calls Avg Min Max Name Compile 42.82% 4.03645s 23040 175.19us 121.12us 196.38us mig2d_787_gpu 28.79% 2.71389s 23040 117.79us 80.800us 135.68us mig2d_726_gpu 27.35% 2.57762s 69120 37.291us 33.248us 42.240us sum2_571_gpu pgcc – acc - 1.00% 93.936ms 23040 4.0770us 3.2000us 12.992us [CUDA memcpy HtoD] ta=tesla:managed 0.04% 3.4627ms 1 3.4627ms 3.4627ms 3.4627ms resit_537_gpu 0.00% 126.14us 1 126.14us 126.14us 126.14us timeb_592_gpu ==55246== API calls: Time(%) Time Calls Avg Min Max Name Profile ! 30.16% 11.5982s 230423 50.334us 118ns 3.9101ms cuMemFree 29.21% 11.2327s 230429 48.746us 10.132us 12.821ms cuMemAllocManaged 27.15% 10.4430s 253444 41.204us 1.0420us 3.4680ms cuStreamSynchronize nvprof <managed binary> 10.42% 4.00751s 115202 34.786us 5.4290us 99.805ms cuLaunchKernel 1.13% 433.50ms 1428513 303ns 141ns 429.42us cuPointerGetAttributes 0.81% 310.55ms 1 310.55ms 310.55ms 310.55ms cuDevicePrimary … 0.71% 273.10ms 23040 11.853us 7.3210us 409.13us cuMemcpyHtoDAsync 0.33% 125.36ms 1 125.36ms 125.36ms 125.36ms cuDevicePrimary … 0.06% 24.165ms 1 24.165ms 24.165ms 24.165ms cuMemHostAlloc … 0.02% 9.5668ms 1 9.5668ms 9.5668ms 9.5668ms cuMemFreeHost 0.00% 534.34us 1 534.34us 534.34us 534.34us cuMemAllocHost 0.00% 461.71us 1 461.71us 461.71us 461.71us cuModuleLoad.. 0.00% 363.83us 2 181.91us 180.02us 183.81us cuMemAlloc 14 14

Optimize KIRCHHOFF 2D CASE STUDY main: 453, Generating update host(mig[:noff][:nxo][:nzo]) Managed Compile 455, Generating update host(mig1[:noff][:1][:1]) 459, Generating update host(mig1[:noff][:nxo][:nzo]) Verbose output Guided enhancements resit: 539, Generating copyin(ttab[:ns],tb[:][:nz]) Targeted changes sum2: 571, Generating copyin(t2[:nx][:nz],t1[:nx][:nz]) Generating copyout(t[:nx][:nz]) Common Optimizations Data Movement mig2d: 721, Generating copy(ampt1[nxtf:nxte-nxtf+1][:]) Copy, copyin, copyout Generating copyin(cssum[nxtf:nxte-nxtf+1][:],tvsum[nxtf:nxte-nxtf+1][ Create, delete Generating copy(tmt[nxtf:nxte-nxtf+1][:],ampti[nxtf:nxte-nxtf+1][:]) Update Generating copyin(pb[:][:]) Generating copy(ampt[nxtf:nxte-nxtf+1][:]) Generating copyin(cs0b[:][:],angb[:][:]) Loop Collapse Generating copy(zpt[nxtf:nxte-nxtf+1]) 782, Generating copy(mig1[nxf:nxe-nxf+1][:]) Generating copyin(ampt1[:][:], tb[:][:], tsum[:][:], ampt[:][:], ... Generating copy(mig[nxf:nxe-nxf+1][:]) Generating copyin(zpt[:]) 15 15

Optimize KIRCHHOFF 2D CASE STUDY Data Movement sum2: (managed) 571, Generating copyin(t2[:nx][:nz],t1[:nx][:nz]) Compiler choice was good Generating copyout(t[:nx][:nz]) Explicitly use present for data already on GPU! void sum2(int nx, int nz,float a1,float a2, float ** restrict t1, float ** restrict t2, float ** Collapse restrict t) Increase the threads nx*nz { int ix,iz; Present #pragma # acc parallel for collapse(2) present(t1,t2,t) Data is already on the GPU for(ix=0; ix < nx; ++ix) Prevent data movement { for(iz=0; iz < nz; ++iz) t[ix][iz] = a1*t1[ix][iz]+a2*t2[ix][iz]; } } 16 16

Optimize KIRCHHOFF 2D CASE STUDY Data Movement Resit: (managed) 539, Generating copyin(ttab[:ns],tb[:][:nz]) Use present for data already on GPU! resit: ... Collapse #pragma acc parallel for collapse(2) present(tb, ttab) Increase the threads nx*ns for (ix=0; ix<nx; ++ix) { for (is=0; is<ns; ++is) Present { Data is already on the GPU ... Prevent data movement #pragma acc loop for (iz=0; iz<nz; ++iz) t[ix][iz] -= sr0*tb[jr][iz]+sr*tb[jr+1][iz]; } 17 17

Optimize KIRCHHOFF 2D CASE STUDY void mig2d(float * restrict trace, int nt, float ft,...) { Data Movement ... mig, mig1 data large #pragma acc data Move to main copyin(trace[0:nz],trf[0:nt+2*mtmax]) \ Copyin at start present( mig , mig1, tb,tsum,tvsum,cssum,pb,... \ Mark as present create(tmt[0:nxt][0:nzt], ampt[0:nxt][0:nzt],... Copyout for snapshots { ... Minimize Copyin, Copyout #pragma acc parallel for for (ix=nxtf; ix <= nxte; ++ix) { Use create ... #pragma acc loop Prevents copy in/out for (iz=izt0; iz < nzt; ++iz) { ... Delete happens when leaving scope 18 18

OPENACC: ACCELERATE KIRCHHOFF 2D MIGRATION Ken Hester: NVIDIA - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley OPENACC: ACCELERATE KIRCHHOFF 2D MIGRATION Ken Hester: NVIDIA Solution Architect | Oil &Gas EXPLORATION & PRODUCTION WORKFLOW Acquire Seismic Data Process Seismic Data Interpret Seismic Data

GPU COMPUTING WITH OPENACC 3 WAYS TO ACCELERATE APPLICATIONS Applications Programming OpenACC

ADVANCED OPENACC PROGRAMMING JEFF LARKIN, NVIDIA DEVELOPER TECHNOLOGIES AGENDA OpenACC Review

L8179 ZERO TO GPU HERO WITH OPENACC Jeff Larkin, GTC 2019, March 2019 OUTLINE Topics to be

OpenACC Birgitte Bryds HPC2N, Ume a University 12 December 2017 1 / 27 OpenACC Overview

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Improving access to migration data Improving access to migration data Local area migration

OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel Guray

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

S7546 Multi-GPU Programming with OpenACC Jeff Larkin, May 9, 2017, GTC17 Multi-GPU

OpenACC 2.0 and Beyond PGI Accelerator Compilers and Tools One Slide Intro to OpenACC Directives

S6540 High-Accuracy Quantum Chemistry Need for Speed: Accelerating High-Accuracy using OpenACC

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

The Kirchhoff Index of Cluster Networks uz 1 , E. Bendito 1 , A. Carmona 1 and A.M. Encinas 1 C.

International Dialogue on Migration (IDM) Human Rights and Migration: Working Together for Safe,

WHY IS MIGRATION SO IMPORTANT? Why migration? German National Team 2014 Why migration?

EU policy on Legal Migration DG Migration and Home Affairs EU migration basic facts and figures

From DB-nets to Coloured Petri Nets with Priorities Marco Montali and Andrey Rivkin KRDB Research

Joint state and parameter estimation with an iterative ensemble Kalman smoother Marc Bocquet 1 , 2

Efficient Valuation Methods for Contracts in Finance and Insurance Kees Oosterlee 1 , 2 1 CWI,

Distributions of Age at Death from Roman Epitaph Inscriptions in North Africa Peter Pflaumer 1 1

Let U be an open subset of R n . Part I. Differential forms. The differential forms are an

ANNUAL RESULTS FOR THE YEAR ENDED 31 DECEMBER 2019 NEDBANK GROUP LIMITED Annual Results 2019

Annual Results for the year ended 31 December 2018 OVERVIEW A year of achievement across a

Nut uts & & Bol olts of of t the he Stud uden ent C Cen entered ed

OPENACC: ACCELERATE KIRCHHOFF 2D MIGRATION Ken Hester: NVIDIA - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley OPENACC: ACCELERATE KIRCHHOFF 2D MIGRATION Ken Hester: NVIDIA Solution Architect | Oil &Gas EXPLORATION & PRODUCTION WORKFLOW Acquire Seismic Data Process Seismic Data Interpret Seismic Data

GPU COMPUTING WITH OPENACC 3 WAYS TO ACCELERATE APPLICATIONS Applications Programming OpenACC

ADVANCED OPENACC PROGRAMMING JEFF LARKIN, NVIDIA DEVELOPER TECHNOLOGIES AGENDA OpenACC Review

L8179 ZERO TO GPU HERO WITH OPENACC Jeff Larkin, GTC 2019, March 2019 OUTLINE Topics to be

OpenACC Birgitte Bryds HPC2N, Ume a University 12 December 2017 1 / 27 OpenACC Overview

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Improving access to migration data Improving access to migration data Local area migration

OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel Guray

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

S7546 Multi-GPU Programming with OpenACC Jeff Larkin, May 9, 2017, GTC17 Multi-GPU

OpenACC 2.0 and Beyond PGI Accelerator Compilers and Tools One Slide Intro to OpenACC Directives

S6540 High-Accuracy Quantum Chemistry Need for Speed: Accelerating High-Accuracy using OpenACC

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

The Kirchhoff Index of Cluster Networks uz 1 , E. Bendito 1 , A. Carmona 1 and A.M. Encinas 1 C.

International Dialogue on Migration (IDM) Human Rights and Migration: Working Together for Safe,

WHY IS MIGRATION SO IMPORTANT? Why migration? German National Team 2014 Why migration?

EU policy on Legal Migration DG Migration and Home Affairs EU migration basic facts and figures

From DB-nets to Coloured Petri Nets with Priorities Marco Montali and Andrey Rivkin KRDB Research

Joint state and parameter estimation with an iterative ensemble Kalman smoother Marc Bocquet 1 , 2

Efficient Valuation Methods for Contracts in Finance and Insurance Kees Oosterlee 1 , 2 1 CWI,

Distributions of Age at Death from Roman Epitaph Inscriptions in North Africa Peter Pflaumer 1 1

Let U be an open subset of R n . Part I. Differential forms. The differential forms are an

ANNUAL RESULTS FOR THE YEAR ENDED 31 DECEMBER 2019 NEDBANK GROUP LIMITED Annual Results 2019

Annual Results for the year ended 31 December 2018 OVERVIEW A year of achievement across a

Nut uts &amp; &amp; Bol olts of of t the he Stud uden ent C Cen entered ed

Nut uts & & Bol olts of of t the he Stud uden ent C Cen entered ed