high performance asynchronous execution of the reverse
play

High Performance Asynchronous Execution of the Reverse Time - PowerPoint PPT Presentation

High Performance Asynchronous Execution of the Reverse Time Migration for the Oil & Gas Industry NVIDIA GTC Conference at San Jose, CA March 26-29, 2018 I. Said & H. Ltaief HPC RTM 1 / 47 Issam Said 1 and Hatem Ltaief 2 1 NVIDIA Oil


  1. High Performance Asynchronous Execution of the Reverse Time Migration for the Oil & Gas Industry NVIDIA GTC Conference at San Jose, CA March 26-29, 2018 I. Said & H. Ltaief HPC RTM 1 / 47 Issam Said 1 and Hatem Ltaief 2 1 NVIDIA Oil and Gas, Paris, France 2 Extreme Computing Research Center, KAUST, Saudi Arabia

  2. Outline 1 Background on Seismic Imaging 2 Ubiquitous Matricization and Taskifjcation for Seismic Imaging 3 Matrices Over Runtime Systems 4 Application to Frequency Domain 5 Performance Results 6 Summary and Future Work I. Said & H. Ltaief HPC RTM 2 / 47

  3. Seismic Imaging Application to Frequency Domain HPC RTM I. Said & H. Ltaief Summary and Future Work 6 Performance Results 5 4 Outline Matrices Over Runtime Systems 3 Ubiquitous Matricization and Taskifjcation for Seismic Imaging 2 Background on Seismic Imaging 1 3 / 47

  4. Seismic Imaging Energy supply and demand 40% more energy is needed by 2035 No choice but Oil, Gas and Coal Sophisticated seismic methods I. Said & H. Ltaief HPC RTM 4 / 47

  5. Seismic Imaging Seismic methods for Oil & Gas exploration Acquisition Processing Interpretation Shot = source activation + data collection (receivers) Seismic survey Shot record I. Said & H. Ltaief HPC RTM 5 / 47 • Air-gun array • Hydrophones

  6. Seismic Imaging lation HPC RTM I. Said & H. Ltaief Subsurface image { Interpretation Imaging Interpo- Seismic methods for Oil & Gas exploration tiple Demul- tenuation Noise at- Processing Acquisition 5 / 47

  7. Seismic Imaging Seismic methods for Oil & Gas exploration Acquisition Processing Interpretation Calculate seismic attributes I. Said & H. Ltaief HPC RTM 5 / 47 • Dip • Azimuth • Coherence

  8. Seismic Imaging Seismic methods for Oil & Gas exploration Acquisition Processing Interpretation Calculate seismic attributes I. Said & H. Ltaief HPC RTM 5 / 47 • Dip • Azimuth • Coherence (courtesy of Total)

  9. Seismic Imaging Reverse Time Migration (RTM) The reference computer based imaging algorithm in the industry Repositions seismic events into their true location in the subsurface I. Said & H. Ltaief HPC RTM 6 / 47

  10. Seismic Imaging Reverse Time Migration (RTM) The reference computer based imaging algorithm in the industry Repositions seismic events into their true location in the subsurface Sub-salt and steep dips imaging Accurate (full wave equation (two-way)) Requires massive compute resources (compute and storage) I. Said & H. Ltaief HPC RTM 6 / 47

  11. Seismic Imaging RTM workflow Forward modeling (FWD) I. Said & H. Ltaief HPC RTM 7 / 47

  12. Seismic Imaging RTM workflow Forward modeling (FWD) Backward modeling (BWD) I. Said & H. Ltaief HPC RTM 7 / 47

  13. Seismic Imaging RTM workflow Forward modeling (FWD) Backward modeling (BWD) Imaging condition I. Said & H. Ltaief HPC RTM 7 / 47

  14. Seismic Imaging RTM workflow Forward modeling (FWD) Backward modeling (BWD) Imaging condition I. Said & H. Ltaief HPC RTM 7 / 47

  15. Seismic Imaging RTM workflow Forward modeling (FWD) Backward modeling (BWD) Imaging condition I. Said & H. Ltaief HPC RTM 7 / 47

  16. Seismic Imaging The Cauchy problem HPC RTM I. Said & H. Ltaief Boundary condition The underlying theory of the RTM algorithm 8 / 47 The RTM operator � H � T Img ( x ) = S h ( x , t ) ∗ R h ( x , T − t ) dt dh 0 0 ∂ 2 u ( x , t ) 1  − ∆ u ( x , t ) = s ( t ) , in Ω   c 2 ∂ t 2   u ( x , 0) = 0 ∂ u ( x , 0)    = 0  ∂ t u = 0 on ∂ Ω

  17. Seismic Imaging Finite Difference Time Domain for RTM HPC RTM I. Said & H. Ltaief Requires High Performance Computing Terabytes of temporary data Heavy computation (hours to days of processing time) 9 / 47 Perfectly Matched Layers (PML) as an absorbing boundary condition Regular grids Finite Difference Time Domain ( 8 th order in space, 2 nd order in time) U n +1 i , j , k − U n − 1 i , j , k + c 2 i , j , k ∆ t 2 ∆ U n i , j , k + c 2 i , j , k ∆ t 2 s n i , j , k = 2 U n

  18. Seismic Imaging Frequency Domain Translate to frequency domain and solve the Helmholtz equation (acoustic wave equation): w fjeld, and u(x, w) is the time-harmonic wavefjeld solution to the forcing term s(x, w). w/ S. Zampini I. Said & H. Ltaief HPC RTM 10 / 47 ( − ∆ − k 2 ) u ( x , w ) = s ( x , w ) k = v ( x ) , w is the angular frequency, v(x) is the seismic velocity

  19. Matricize and Taskify Application to Frequency Domain HPC RTM I. Said & H. Ltaief Summary and Future Work 6 Performance Results 5 4 Outline Matrices Over Runtime Systems 3 Ubiquitous Matricization and Taskifjcation for Seismic Imaging 2 Background on Seismic Imaging 1 11 / 47

  20. Matricize and Taskify Hardware Trends: Energy Matters! HPC RTM I. Said & H. Ltaief John Shalf, LBNL 12 / 47 2011 2018 DP FLOP 100 pJ 10 pJ DP DRAM Read 4800 pJ 1920 pJ Local interconnect 7500 pJ 2500 pJ Cross system 9000 pJ 3500 pJ

  21. Matricize and Taskify Welcome DGX-2! Extremely dense, tightly connected, GPU-based system: strong scaling! I. Said & H. Ltaief HPC RTM 13 / 47

  22. Matricize and Taskify Vendors’ Message ;-) ”You are either compute-bound or compute-irrelevant. ” P. Luszczek, ICL@UTK I. Said & H. Ltaief HPC RTM 14 / 47

  23. Matricize and Taskify 3D Finite Difference Time Domain Four main computational phases: Stencil integration: compute-bound Snapshotting: I/O-bound Imaging condition: memory-bound Compression: binary (e.g., gzip), truncation (e.g., brute force) or dense linear algebra (e.g., Tucker decomposition) The 3D stencil domain is a tensor! I. Said & H. Ltaief HPC RTM 15 / 47

  24. Matricize and Taskify Intertwined AI Kernels Throughout the Integration I. Said & H. Ltaief HPC RTM 16 / 47

  25. Matrices Over Runtime Systems Application to Frequency Domain HPC RTM I. Said & H. Ltaief Summary and Future Work 6 Performance Results 5 4 Outline Matrices Over Runtime Systems 3 Ubiquitous Matricization and Taskifjcation for Seismic Imaging 2 Background on Seismic Imaging 1 17 / 47

  26. Matrices Over Runtime Systems LAPACK DPOTRF from last century HPC RTM I. Said & H. Ltaief Figure: Block Algorithms. (c) Third step. (b) Second step. 18 / 47 (a) First step. L A N I F UPDATE L PANEL A N I F UPDATE PANEL PANEL

  27. Matrices Over Runtime Systems PLASMA/MAGMA/CHAMELEON DPOTRF from this century Figure: Tile Algorithms. I. Said & H. Ltaief HPC RTM 19 / 47

  28. Matrices Over Runtime Systems LAPACK: Blocked Algorithms Principles: Panel-Update sequence Transformations are blocked/accumulated within the Panel Level-2 BLAS Transformations applied at once on the trailing submatrix Level-3 BLAS Parallelism hidden inside the BLAS Fork-join model A broken model! I. Said & H. Ltaief HPC RTM 20 / 47

  29. Matrices Over Runtime Systems Tile Data Layout Format LAPACK: column-major format PLASMA/CHAMELEON: tile format I. Said & H. Ltaief HPC RTM 21 / 47

  30. Matrices Over Runtime Systems Remove unnecessary synchronization points between Panel-Update HPC RTM I. Said & H. Ltaief Quark, PaRSEC, OmpSs, OpenMP etc.) Default dynamic runtime system environment StarPU (but could use dependencies between them DAG execution where nodes represent tasks and edges defjne sequences Tile data layout translation PLASMA/CHAMELEON: Tile Algorithms May require the redesign of linear algebra algorithms Parallelism is brought to the fore Break the bulk synchronous programming model CHAMELEON = PLASMA = 22 / 47 ⇒ http://icl.cs.utk.edu/plasma/ ⇒ https://gitlab.inria.fr/solverstack/chameleon.git

  31. Matrices Over Runtime Systems StarPU Runtime System 101 HPC RTM I. Said & H. Ltaief Distributed-memory = = multi-GPU) = (x86, PPC, …) = Supports: = = = Provides: 23 / 47 ⇒ Task scheduling ⇒ Memory management ⇒ Out-of-core ⇒ SMP/Multicore Processors ⇒ NVIDIA GPUs (e.g., ⇒ Hybrid architectures ⇒ Shared and

  32. Matrices Over Runtime Systems Ahandle(k, k), HPC RTM I. Said & H. Ltaief 0); CALLBACK, profiling?cl_dpotrf_callback:NULL, NULL, sizeof(int) &info, OUTPUT, sizeof(int), &lda, VALUE, INOUT, StarPU Runtime System: User Productivity! sizeof(int), &n, VALUE, sizeof(char), &uplo, VALUE, starpu_Insert_Task(&cl_dpotrf, Main user API: Heterogeneous tasks’ orchestration: compute, I/O, compression hardware complexity Separation of concerns: task-based numerical algorithms and 24 / 47

  33. Matrices Over Runtime Systems Back to RTM: The Tucker Decomposition Generalization of SVD for Tensors: Courtesy of J. Choi, IBM TJ Watson I. Said & H. Ltaief HPC RTM 25 / 47

  34. Matrices Over Runtime Systems Back to RTM: Out-of-Core Algorithms to Maximize Memory and Computing Resources Occupancy I. Said & H. Ltaief HPC RTM 26 / 47

Recommend


More recommend