Multi-parameter Waveform Inversion with GPUs for the Cloud A Pipelined Implementation Huy Le*, Stewart A. Levin, and Robert G. Clapp Geophysics Department, Stanford University March 28, 2018 Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 1
Waveform inversion χ ( m ) = 1 2 � f ( m ) − d � 2 2 χ ( m ): objective function m : subsurface parameters to recover f ( m ): modeled data by solving wave equations d : observed seismic data Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 2
Gradient-based optimization � T g ( m ) = 0 u ( m ) v ( m ) dt g ( m ): gradients u ( m ): source wavefields by solving forward wave equations v ( m ): receiver wavefields by solving adjoint wave equations in reverse time with data residuals as sources � T 0 : zero-lag temporal cross-correlation Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 3
Multi-parameters for better physics Solving wave equations with multiple parameters requires more memory. For a 1000 × 1000 × 500 volume, Physics Parameters Wavefields Memory (GBs) Acoustic isotropic 1 1 4 Acoustic VTI 3 2 12 Elastic isotropic 3 9 54 Elastic VTI 6 9 108 (VTI: vertical transverse isotropic). Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 4
Conventional domain decomposition Divide volumes among multiple GPUs, which are potentially on different nodes. More parameters demand more GPUs or GPUs with larger memory. Two-way communication among devices to exchange halos. Fast inter-nodal connection is not guaranteed, particularly on the cloud. Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 5
Pipelined approach Thor Johnsen and Alex Loddoch (GTC 2014). Divide computational domain along one axis into blocks. A single GPU streams through domain block by block and updates as many time steps as possible. Multiple updates significantly overlap host-device IO. Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 6
Stencil for 2nd-order time difference Y Divide along z-axis. X Z Each block contains half-stencil-length number of depth slices. block i-1 t=1 Need three consecutive blocks block i block i block i block i v t=0 t=1 t=2 for second derivatives. block i+1 t=1 time Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 7
Pipeline iteration 0 CPU GPU block0 block0 block0 block0 block0 block0 v t=0 t=1 v t=0 t=1 block1 block1 block1 v t=0 t=1 transfer in update transfer out Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 8
Pipeline iteration 1 CPU GPU block0 block0 block0 block0 block0 block0 v t=0 t=1 v t=0 t=1 block1 block1 block1 block1 block1 block1 v t=0 t=1 v t=0 t=1 block2 block2 block2 v t=0 t=1 transfer in update transfer out Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 9
Pipeline iteration 2 CPU GPU block0 block0 block0 block0 block0 block0 block0 v t=0 t=1 v t=0 t=1 t=2 block1 block1 block1 block1 block1 block1 v t=0 t=1 v t=0 t=1 block2 block2 block2 block2 block2 block2 v t=0 t=1 v t=0 t=1 block3 block3 block3 v t=0 t=1 transfer in update transfer out Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 10
Pipeline iteration 3 CPU GPU block0 block0 block0 block0 block0 block0 block0 block0 v t=0 t=1 v t=0 t=1 t=2 t=3 block1 block1 block1 block1 block1 block1 block1 v t=0 t=1 v t=0 t=1 t=2 block2 block2 block2 block2 block2 block2 v t=0 t=1 v t=0 t=1 block3 block3 block3 block3 block3 block3 v t=0 t=1 v t=0 t=1 block4 block4 block4 v t=0 t=1 transfer in update transfer out Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 11
Pipeline iteration 4 CPU GPU block0 block0 block0 block0 block0 block0 block0 block0 v t=0 t=1 v t=0 t=1 t=2 t=3 block1 block1 block1 block1 block1 block1 block1 block1 v t=0 t=1 v t=0 t=1 t=2 t=3 block2 block2 block2 block2 block2 block2 block2 v t=0 t=1 v t=0 t=1 t=2 block3 block3 block3 block3 block3 block3 v t=0 t=1 v t=0 t=1 block4 block4 block4 block4 block4 block4 v t=0 t=1 v t=0 t=1 block5 block5 block5 v t=0 t=1 transfer in update transfer out Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 12
Pipeline iteration 5 CPU GPU block0 block0 block0 block0 block0 block0 block0 block0 v t=2 t=3 v t=0 t=1 t=2 t=3 block1 block1 block1 block1 block1 block1 block1 block1 v t=0 t=1 v t=0 t=1 t=2 t=3 block2 block2 block2 block2 block2 block2 block2 block2 v t=0 t=1 v t=0 t=1 t=2 t=3 block3 block3 block3 block3 block3 block3 block3 v t=0 t=1 v t=0 t=1 t=2 block4 block4 block4 block4 block4 block4 v t=0 t=1 v t=0 t=1 block5 block5 block5 block5 block5 block5 v t=0 t=1 v t=0 t=1 block6 block6 block6 v t=0 t=1 transfer in update transfer out Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 13
Streams and threads to overlap transfer and compute Pipeline takes some iterations to initialize and drain. Stagger tasks to overlap. cudaMemcpyAsynch to copy between host and devices. Two CPU threads to copy between swappable buffers and pinned buffers. Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 14
Pipeline for 2 GPUs CPU GPU1 block0 block0 block0 block0 block0 block0 block0 block0 v t=4 t=5 v t=2 t=3 t=4 t=5 block1 block1 block1 block1 block1 block1 block1 block1 v t=0 t=1 v t=2 t=3 t=4 t=5 block2 block2 block2 block2 block2 block2 block2 block2 v t=0 t=1 v t=2 t=3 t=4 t=5 block3 block3 block3 block3 block3 block3 block3 v t=0 t=1 v t=2 t=3 t=4 GPU0 block4 block4 block4 block4 block4 block4 v t=0 t=1 v t=2 t=3 block5 block5 block5 block5 block5 block5 block5 block5 block5 block5 block5 v t=0 t=1 v t=0 t=1 t=2 t=3 v t=2 t=3 block6 block6 block6 block6 block6 block6 block6 block6 v t=0 t=1 v t=0 t=1 t=2 t=3 block7 block7 block7 block7 block7 block7 block7 block7 v t=0 t=1 v t=0 t=1 t=2 t=3 block8 block8 block8 block8 block8 block8 block8 v t=0 t=1 v t=0 t=1 t=2 block9 block9 block9 block9 block9 block9 v t=0 t=1 v t=0 t=1 block10 block10 block10 block10 block10 block10 v t=0 t=1 v t=0 t=1 block11 block11 block11 v t=0 t=1 transfer in update transfer out Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 15
IO bottle neck Computation of the gradients requires reverse-time propagation. Absorbing boundary condition and checkpoints require three propagations, but are IO- and memory-intensive. Solution: random boundary condition (Clapp, SEG 2009; Shen, SEG 2011). Trade-off: gradients computed on the fly and on device but require four propagations. Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 16
Pipelines for source and receiver wavefields Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 17
Acoustic isotropic wave equation One medium parameter and one wavefield at two consecutive time steps: 12 bytes per cell. Example: 6GB for volume 1000 × 1000 × 500 and 8th-order stencil. CPU code: blocked, Intel Thread Building Blocks (TBB), Intel SPMD Program Compiler (ISPC), single Xeon machine with 12 cores and 24 threads. "Optimal" speed when volume fits in one Tesla K80 GPU (12GB global memory, 2500 threads), i.e. no domain decomposition or host-device transfer. Pipelined code updates 94 times per host-device transfer for same memory. Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 18
Acoustic isotropic wave equation: forward modeling 3.0 2.5 2.380 2.220 2.0 GCells/s 1.5 1.000 1.0 0.5 0.0 CPU Pipeline 1 GPU Optimal Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 19
Acoustic VTI wave equations System of two second-order wave equations, three medium parameters and two wavefields, each at two consecutive time steps: 28 bytes per cell. Example: 28GB for volume 1000 × 1000 × 1000. Number of updates GPU Memory (GBs) 2 0.736 4 1.024 8 1.6 16 2.752 32 5.056 64 9.664 Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 20
Acoustic VTI wave equations: forward modeling 2.00 1.834 1.828 1.816 1.790 1.75 1.50 1.25 GCells/s 1.003 1.00 0.75 0.579 0.50 0.25 0.00 2 4 8 16 32 64 Number of updates Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 21
Acoustic VTI wave equations: forward modeling 8 updates (1.6GB on GPU) completely overlap host-device transfers. bandwidth max. speed = bytes per cell × N update . 7 GB/s 28 bytes per cell × 8 = 2 GCell/s. Achieved 1.79 GCell/s. Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 22
Recommend
More recommend