optimising in field processing using gpu s
play

Optimising In- Field Processing using GPUs Tarik Saidani Senior - PowerPoint PPT Presentation

Optimising In- Field Processing using GPUs Tarik Saidani Senior Software Engineer, PGS Peng Wang DevTech, Nvidia From a Seismic Acquisition Survey To a High Resolution Image of The Sea Subsurface Problem: Source and Receiver Ghost


  1. Optimising In- Field Processing using GPU’s Tarik Saidani Senior Software Engineer, PGS Peng Wang DevTech, Nvidia

  2. From a Seismic Acquisition Survey

  3. To a High Resolution Image of The Sea Subsurface

  4. Problem: Source and Receiver Ghost receiver ghost sea surface source ghost source ghost ray-path cable (receiver) ghost ray-path direct ray-path direct ray-path far field

  5. A Ghost Free Marine Acquisition System 5

  6. Solution: Dual Sensor Streamer Acquisition • Method – Combine pressure and velocity sensors in a solid streamer – Use the complementary ghost patterns of the two sensors to remove the receiver ghost – Tow the dual sensor streamer deep for low frequency content • Result – The bandwidth of the data is increased for both low and high frequencies when compared with conventional streamer data – There is better low frequency penetration of the source signal – The acquisition method is less sensitive to weather conditions 6

  7. The Big Data Challenge In the Seismic Business 1995 6 streamers 2005 16 streamers 2015 24 streamers

  8. Seismic Acquisition Data Volumes • A typical streamer is 8000 meters long and contains 1280 receivers • Data is recorded in time chunks or as continuous series, 2ms sample interval, generating 500 samples per second per receiver • A streamer (single sensor) generates 640,000 samples per second • One streamer spread (10 streamers) generates 6,400,000 samples per second • Big spread, 20 streamers, dual sensor will generates 25,600,000 samples per second • Typical acquisition can generate multi-TBs of data per day

  9. 3D Wave-field Separation Workflow upsampling 16% Receiver 84 % Deghosting 100 % 9

  10. Getting the Best Possible Image from the Early Stages Streamer-wise Wavefield Separation 3D Wavefield Separation 10

  11. Upsampling • Iterative process • Frequency domain (wavenumber) • Not enough parallelism in inner loop (few thousands of threads) • Window parallelism not exposed in the CPU code • Loop restructuring to expose window parallelism • After the code change enough parallelism for the GPU (millions of threads) 11

  12. Receiver Deghosting • Large volume of data (hydrophone and geosensor data) • Frequency domain computations • Parallelism over traces and frequency samples • Fairly straightforward parallel code • Parallelism available at many loop levels on a large number of iterations 12

  13. Infield Constraints • Although she looks big in the picture the ship has very limited space to host a compute cluster • Power and cooling are also limited on-board a vessel • A CPU based solution was considered but was quickly discarded because of the constraints described above 13

  14. But Also Facing Up to New Realities …

  15. Phase 1: Getting the Most of the CPU Cycles • CPU code profiling and analysis • Hotspot analysis showed that not much could be improved • The vectorizer was not doing a great job, had to write vector intrinsics … • Reached an upper bound in terms of CPU performance … not enough! 15

  16. Phase 2: What Can We Do Next? • Parallelism already present at different levels: thread, process, vectorization … • We can not rely on increasing the CPU core count because of the above constraints • GPU accelerators were the most obvious way forward • GPU prototype code: – Ported the streamerwise degohsting code to the GPU – 25x speedup compared to the single core CPU (Haswell) – 7x speedup on the entire flow: interesting … 16

  17. Phase 3: The Bigger Picture • The streamerwise deghosting code having been ported to the GPU, upsampling was the new hotspot in the flow • Two parallel development branches: – Porting the upsampling code to the GPU – Porting the 3D deghosting code to the GPU • At the end of this phase the processing flow was 15x faster • In the meantime an additional processing step was added to the deghosting code: extrapolation • It increased the runtime and changed the application profile ( 50% of the runtime). 17

  18. Phase 4: Putting it All Together • Ported the extrapolation to the GPU • Very similar compute kernel to the upsampling • The first benchmarks showed a throughput that was 40x faster than 1 CPU core • After running more production like tests we achieved an impressive 100x ! 18

  19. Hardware Footprint CPU based 20:1 GPU based Nvidia Tesla K80 19

  20. Summary • Wavefield separation is a fundamental step in marine data acquisition and processing • It is a very demanding process in terms of compute power • Infield constraints discard large scale systems • In order to deliver an acceptable throughput within an acceptable footprint the only viable solution is GPU based • The final result showed an impressive throughput along with a very small footprint • It Improves the geophysical quality of PGS field acquisition deliverable • Real-time 3D processing of data during acquisition • GPU deployment started on vessels in Q1 2016 20

  21. Titan Class Tethys, Now With GPU- Based “3D Wavefield Separation A ppliance”

  22. Acknowledgment Peng Wang Ty Mckercher Ken Hester 22

Recommend


More recommend