GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua - PowerPoint PPT Presentation

High Performance Geo-Computing Group GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua University May 8 st , 2017, GTC 2017

About Tsinghua HPGC • High Performance Geo-Computing Group – Interdisciplinary research group – High performance, high resolution geo-science acceleration GPU Acceleration on Elastic RTM

About Tsinghua HPGC • High Performance Geo-Computing Group – Interdisciplinary research group – High performance, high resolution geo-science acceleration data computing Climate changing Seismic modeling High Performance Computing GPU Acceleration on Elastic RTM

About Tsinghua HPGC • High Performance Geo-Computing Group – Interdisciplinary research group – High performance, high resolution geo-science acceleration – The most advanced HPC platforms • Multi-core CPU, many-core GPU & MIC • Reconfigurable data flow engines – Maxeler DFEs, IBM OpenPower, Intel Xeon+FPGA • Supercomputer – Tianhe-1A: 7168 CPU-GPU nodes, 4.7PFlops Rpeak – Tianhe-2: 16,000 CPU-3MIC nodes, 54.9PFlops Rpeak – Tsinghua Explore100: 740 CPU nodes, 4TFlops Rpeak – Cooperation and Sponsorship GPU Acceleration on Elastic RTM

About This Work • HPGC-SEP Summer Exchange Project – Advisor: Dr. Haohuan Fu , Dr. Robert Clapp, and Prof. Biondo Biondi – Special thanks to Gustavo Alves, and Ettore Biondi • Achievements on GPU – 10x speedup accelerating a 2D elastic RTM code over 24 CPU cores – Implementation of a 3D elastic RTM kernel with adjustable interfaces – 27x speedup accelerating the 3D RTM kernel over 24 CPU cores GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • State variables (data) and the attributes (model) Shear stresses Particle velocities Normal stresses 𝑤 " , 𝑤 # , 𝑤 $ , Data Data 𝜏 "" ,𝜏 ## , 𝜏 $$ 𝜏 "# ,𝜏 "$ , 𝜏 #$ Forward Adjoint Model Density Model Mu Lambda mass kg ρ = = Δ Δ Δ 3 x y z m Force ∂ P = Area λ = ρ µ = = GPa GPa ∂ ρ Δ x length GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • Forward and Adjoint t=0 t=Nt Data Data Forward Adjoint … … ∆𝑢 ∆𝑢 Model Model t=0 t=Nt Memory GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • Wave Equations ∂ ∂ ∂ ∂ 1 = σ + σ + σ + V ( , ) x t [ ( , ) x t ( , ) x t ( , ) x t S ( , )] x t x xx xy xz x ∂ ρ ∂ ∂ ∂ t ( ) x x y z ∂ ∂ ∂ ∂ 1 = σ + σ + σ + V ( , ) x t [ ( , ) x t ( , ) x t ( , ) x t S ( , )] x t ∂ y ρ ∂ xy ∂ yy ∂ yz y t ( ) x x y z ∂ ∂ ∂ ∂ 1 = σ + σ + σ + V ( , ) x t [ ( , ) x t ( , ) x t ( , ) x t S ( , )] x t z xz yz zz z ∂ ρ ∂ ∂ ∂ t ( ) x x y z ∂ ∂ ∂ ∂ σ = λ + µ + λ + + ( , ) x t [ ( ) x 2 ( )] x V ( , ) x t ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ∂ xx ∂ x ∂ y ∂ z xx t x y z ∂ ∂ ∂ ∂ σ = λ + µ ( , ) x t [ ( ) x 2 ( )] x V ( , ) x t + λ + + ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ∂ yy ∂ x x z yy t x ∂ ∂ x z ∂ ∂ ∂ ∂ + λ + + σ = λ + µ ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ( , ) x t [ ( ) x 2 ( )] x V ( , ) x t ∂ x ∂ y zz ∂ zz ∂ x x y t x ∂ ∂ ∂ σ = µ + + ( , ) x t ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ∂ xy ∂ y ∂ x xy t x y ∂ ∂ ∂ σ = µ + + ( , ) x t ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t xz z x xz ∂ ∂ ∂ t x z ∂ ∂ ∂ σ = µ + + ( , ) x t ( )[ x V ( , ) x t V ( , )] x t S ( , ) x t ∂ yz ∂ z ∂ x yz t y z GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • For time: 2 nd ord. F.D. approximation Δ Δ t t Δ ∂ ∂ ∂ t + − t t = + σ + σ + σ + t t t t V ( ) x V ( ) x [ ( ) x ( ) x ( )] x S ( ) x 2 2 x x xx xy xz x ρ ∂ ∂ ∂ ( ) x x y z Δ Δ t t Δ ∂ ∂ ∂ t + − t t = + σ + σ + σ + Forward t t t t V ( ) x V ( ) x [ ( ) x ( ) x ( )] x S ( ) x 2 2 y y xy yy yz y ρ ∂ ∂ ∂ ( ) x x y z Δ Δ t t Δ ∂ ∂ ∂ t + − t t = + σ + σ + σ + t t t t V ( ) x V ( ) x [ ( ) x ( ) x ( )] x S ( ) x 2 2 z z ρ ∂ xz ∂ yz ∂ zz z ( ) x x y z Adjoint • Based on staggered grid • For space: 10 th ord. F.D. approximation 4 or 5 Stencil 5 or 4 GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • GPU Optimizations – K40 GPU, (200*200*200)*1000ts • Configuration of different blk sizes, reg. per blk • Best: blk ß 20*20; max reg. ß 56 • Variable data into L1/SM, Constant data into Read-only Cache – Dynamic Pointer Switch & Minimum Data Cubes • Only malloc data cubes covering three steps … … … 𝜏 #$ 𝜏 #$ 𝜏 #$ 𝑤 " 𝑤 " 𝑤 " t-1 t t+1 GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • GPU Optimizations – K40 GPU, (200*200*200)*1000ts • Configuration of different blk sizes, reg. per blk • Best: blk ß 20*20; max reg. ß 56 • Variable data into L1/SM, Constant data into Read-only Cache – Dynamic Pointer Switch & Minimum Data Cubes • Only malloc data cubes covering three steps … … … 𝜏 #$ 𝜏 #$ 𝜏 #$ 𝑤 " 𝑤 " 𝑤 " pre cur next GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • GPU Optimizations – K40 GPU, (200*200*200)*1000ts • Configuration of different blk sizes, reg. per blk • Best: blk ß 20*20; max reg. ß 56 • Variable data into L1/SM, Constant data into Read-only Cache – Dynamic Pointer Switch & Minimum Data Cubes • Only malloc data cubes covering three steps … … … 𝜏 #$ 𝜏 #$ 𝜏 #$ 𝑤 " 𝑤 " 𝑤 " cur next pre GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • GPU Optimizations – K40 GPU, (200*200*200)*1000ts • Configuration of different blk sizes, reg. per blk • Best: blk ß 20*20; max reg. ß 56 • Variable data into L1/SM, Constant data into Read-only Cache – Dynamic Pointer Switch & Minimum Data Cubes • Only malloc data cubes covering three steps … … … 𝜏 #$ 𝜏 #$ 𝜏 #$ 𝑤 " 𝑤 " 𝑤 " ∆𝑢 next pre cur Memory GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • GPU Optimizations x – Multiple GPUs y z 4 or 5 5 or 4 GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • GPU Optimizations x – Multiple GPUs y z halo 4 or 5 Internal 5 or 4 halo GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • GPU Optimizations – Multiple GPUs Internal GPU 0 GPU Algorithm per Stencil sweep halo For each subdomain ① Calculate RTM stencil ② Update Halo halo ③ Add Source ④ Switch Pointer GPU 1 Internal halo Stencil Computing Updating halo workflow GPU 2 Internal GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • GPU Optimizations – Multiple GPUs Internal GPU 0 GPU Algorithm per Stencil sweep halo For each subdomain ① Calculate halo RTM stencil ② Calculate Internal RTM stencil halo Update Halo ④ Add Source GPU 1 Internal ⑤ Switch Pointers halo Updating Halo Internal halo GPU 2 Overlapping workflow Internal GPU Acceleration on Elastic RTM

3D Elastic RTM Stencils • Validation and Performance – GPU Cluster in SEP • 4 K40 GPUs over 24 core CPU (OpenMP) • 200*200*200 + 1000 steps (record every 100 steps) Vx Vy GPU Acceleration on Elastic RTM

GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua - PowerPoint PPT Presentation

High Performance Geo-Computing Group GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua University May 8 st , 2017, GTC 2017 About Tsinghua HPGC High Performance Geo-Computing Group Interdisciplinary research group High

Slides and Flumes L-Resin Transfer Moulding (L-RTM) Sliden Roll Technology Flume Slides

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim

Using Kieker with Elastic APM: An Experience Report Valentin Seifermann Duan Okanovi SSP

Monitor your containers with the Elastic Stack Monica Sarbu Monica Sarbu Team lead, Beats team

OHIO OHIO SUNSHINE LAW SUNSHINE LAW Curt C. Hartm an The La w Firm of Curt C. Ha rtm a n

Ruby Topic Maps http://rtm.rubyforge.org Benjamin Bock 1 Third International Conference on

Unconstrained Elastic Matching Unconstrained Elastic Matching and Eigen Eigen- -Deformations

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. Mai and Mikael Johansson KTH

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge

GPYTORCH : BLACKBOX MATRIX- MATRIX GAUSSIAN PROCESS INFERENCE WITH GPU ACCELERATION Jacob R.

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Linear Elastic Model for Generating Wavy Structure Wavy in Lipid Membrane by Peripheral Proteins

PTC India Financial Services Limited May 2012 Our Vision and Mission Be the most preferred

../ DEEPAK FERTILISERS AND PETROCHEMICALS CORPORATION LIMITED 4 September 2019 BSE Limited

Design Odyssey: A Co-Curricular Design Innovation and Entrepreneurship Program for Systemic Change

A spatiotemporal model with visual attention for video classification Mo Shan and Nikolay

MATRIX DAMAGE IN LAMINATED COMPOSITES UNDER BIAXIAL STRESS M. Salavatian, L.V. Smith* 1 School of

Dynamics of harmonically excited irregular cellular metamaterials S. Adhikari 1 , T. Mukhopadhyay

How Businesses Survive How Businesses Survive after a Disaster after a Disaster EPI CC

GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua - PowerPoint PPT Presentation

High Performance Geo-Computing Group GPU Acceleration on the 3D Elastic RTM Method Lin Gan, Tsinghua University May 8 st , 2017, GTC 2017 About Tsinghua HPGC High Performance Geo-Computing Group Interdisciplinary research group High

Slides and Flumes L-Resin Transfer Moulding (L-RTM) Sliden Roll Technology Flume Slides

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim

Using Kieker with Elastic APM: An Experience Report Valentin Seifermann Duan Okanovi SSP

Monitor your containers with the Elastic Stack Monica Sarbu Monica Sarbu Team lead, Beats team

OHIO OHIO SUNSHINE LAW SUNSHINE LAW Curt C. Hartm an The La w Firm of Curt C. Ha rtm a n

Ruby Topic Maps http://rtm.rubyforge.org Benjamin Bock 1 Third International Conference on

Unconstrained Elastic Matching Unconstrained Elastic Matching and Eigen Eigen- -Deformations

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. Mai and Mikael Johansson KTH

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge

GPYTORCH : BLACKBOX MATRIX- MATRIX GAUSSIAN PROCESS INFERENCE WITH GPU ACCELERATION Jacob R.

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Linear Elastic Model for Generating Wavy Structure Wavy in Lipid Membrane by Peripheral Proteins

PTC India Financial Services Limited May 2012 Our Vision and Mission Be the most preferred

../ DEEPAK FERTILISERS AND PETROCHEMICALS CORPORATION LIMITED 4 September 2019 BSE Limited

Design Odyssey: A Co-Curricular Design Innovation and Entrepreneurship Program for Systemic Change

A spatiotemporal model with visual attention for video classification Mo Shan and Nikolay

MATRIX DAMAGE IN LAMINATED COMPOSITES UNDER BIAXIAL STRESS M. Salavatian, L.V. Smith* 1 School of

Dynamics of harmonically excited irregular cellular metamaterials S. Adhikari 1 , T. Mukhopadhyay

How Businesses Survive How Businesses Survive after a Disaster after a Disaster EPI CC

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team