Image-Domain Gridding on Accelerators Bram Veenboer Monday 26 th - PowerPoint PPT Presentation

Netherlands Institute for Radio Astronomy Image-Domain Gridding on Accelerators Bram Veenboer Monday 26 th March, 2018, GPU Technology Conference 2018, San Jose, USA ASTRON is part of the Netherlands Organisation for Scientific Research (NWO)

Introduction to radio astronomy Observe the sky at radio wavelengths: Image credits: NRAO Size of the telescope is proportional to the wavelength e.g. Hubble Space telescope: 1 um, 2.4 m Same resolution for 1 mm requires 2 km dish! 1

Radio telescope: astronomical interferometer Array of seperate telescopes: interferometer Create an image: interferometry Combine the signals for pairs of telescopes Resolution similar to one very large dish 2

Interferometry theory Sparse sampling of the ‘uv-plane‘: Every baseline samples the uv-plane: ‘visibility’ and ‘uvw-coordinate’ Orientation of baseline also determines orientation in the uv-plane A sample V ( u , v ) is the 2D FT of the brightness on the sky B ( l , m ) Apply Non-uniform Fourier transform to get sky image from uv data 3

Sampling using 2 antennas Every sample (in the uv-domain) shows up as a waveform in the image Image credits: NRAO 4

Sampling using 4 antennas Every baseline (pair of two antennas) adds information to the image 5

Sampling using 16 antennas (compact) Using 16 antennas, the (artificial) source in the center of the image becomes visibile 6

Sampling using 16 antennas (extended) Longer baselines (larger antenna spacings) increase resolution of the image 7

Sampling using 32 antennas, for 8 hours Sampling for an extended period of time increases signal to noise 8

Creating a sky-image gridder → regular grid 2D FFT Imager: visibilities → sky-image − − − − − − − − − i n c o m i n g ionosphere r a d i receiver o w a v e s imager sky-model correlator sky-image visibilities I × baseline (pair of receivers) calibration imaging Correlator: combine signal into ‘visibility’ (with associated ‘uvw-coordinate’) 9

Gridding using AW-projection (and W-stacking) A-term : correct for W-term : correct for direction-dependent effects curvature of the earth 10

W-projection gridding and Image-Domain gridding W-projection gridding using convolution kernels grid: visibilities visibilities gridder kernel image subgrids gridder kernel FFT time Fourier subgrids channels adder visibility: convolution: updated pixel: Fourier grid Fourier grid Image-Domain gridding using subgrids For more details: Image-Domain Gridding on Graphics Processors, Bram Veenboer, Matthias Petschow and John. W Romein, IPDPS 2017 11

Square Kilometre Array SKA1 Low, Australia SKA1 Mid, Africa Data rates up to ≈ 10 . 000 . 000 . 000 visibilities/second 12

Results: runtime/throughput Runtime: time spend in one imaging cycle: gridding, fft and degridding Throughput: number of visibilities processed per second gridding Haswell Haswell degridding KNL KNL Pascal Pascal gridder subgrid-ifft adder grid-fft splitter subgrid-fft Vega Vega degridder 0 20 40 60 80 100 120 140 160 180 200 0 10 20 30 40 50 60 Throughput [MVisibilities/s] Runtime [seconds] Most time spent in gridder/degridder GPUs perform > order of magnitude better than CPU and Xeon Phi Very similar throughput for gridding and degridding 13

Roofline analysis: overview Vega 10 Pascal degridder gridder KNL Performance [TOp/s] Haswell 1 degridder gridder 0.1 Haswell KNL Pascal Vega 1 2 4 8 16 32 64 128 256 512 1024 Operational intensity [Op/Byte] 14

Throughput limitation: host-device transfers PCIe : ≈ 12 GB/s vs. NVLINK : ≈ 68 GB/s Fast interconnect needed to keep GPU computing. 15

Roofline analysis: overview Vega 10 Pascal degridder gridder KNL Performance [TOp/s] Haswell 1 degridder gridder 0.1 Haswell KNL Pascal Vega 1 2 4 8 16 32 64 128 256 512 1024 Operational intensity [Op/Byte] 16

Inner loop for (de)gridder kernel: instruction mix Many fused-multiply-add ( FMA ) and one sine/cosine computation: 1 α = . . . ; 2 3 for c=1,. . . , ˜ C do // channel Φ = cos ( α ) + i sin ( α ); 4 5 Re( pix 11 ) += Re( vis 11 [ c ]) ∗ Re(Φ[ c ]); 6 Im( pix 11 ) += Re( vis 11 [ c ]) ∗ Im(Φ[ c ]); 7 Re( pix 11 ) − = Im( vis 11 [ c ]) ∗ Im(Φ[ c ]); 8 Im( pix 11 ) += Im( vis 11 [ c ]) ∗ Re(Φ[ c ]); 9 10 // [... same for pix 12 , pix 21 and pix 22 ] 11 12 end FMA : peak performance on all architectures sine/cosine : poor performance on Intel architectures 17

Roofline analysis: instruction mix Vega 10 Pascal degridder gridder KNL Performance [TOp/s] Haswell 1 degridder gridder 0.1 Haswell KNL Pascal Vega 1 2 4 8 16 32 64 128 256 512 1024 Operational intensity [Op/Byte] 18

Roofline analysis: shared memory Pascal Vega 10 degridder gridder Performance [TOp/s] 1 0.1 Pascal Vega 1 2 4 8 1 1 4 2 Operational intensity [Op/Byte] 19

Results: energy consumption/efficiency gridder Haswell Haswell degridder KNL KNL Pascal Pascal gridder subgrid-ifft adder grid-fft splitter subgrid-fft Vega Vega degridder host 0 2 4 6 8 10 12 14 16 18 0 5 10 15 20 25 30 35 Energy consumption [kJ] Energy efficiency [GFlop/W] Most energy spent in gridder/degridder GPUs perform > order of magnitude better than CPU and Xeon Phi 20

Results: comparison with AW-projection IDG Pascal WPG Pascal Throughput [Visibilities/s] AWPG Pascal 10 8 10 7 64 56 48 40 32 24 16 8 W-kernel size N W IDG outperforms W-projection, while it also corrects for the challenging A-terms 21

Results: creating very large images (gpu-only) 240 Throughput [MVisibilities/s] 220 200 180 GPU only, gridding GPU only, degridding 1024 2048 4096 8192 16384 32768 65536 Size [pixels 2 ] The size of the image is restricted by the amount of GPU device memory 22

Results: creating very large images (gpu+cpu) 240 Throughput [MVisibilities/s] 220 200 180 Hybrid, gridding Hybrid, degridding GPU only, gridding GPU only, degridding 1024 2048 4096 8192 16384 32768 65536 Size [pixels 2 ] The adder kernel is executed by the host CPU 23

Results: creating very large images (Unified Memory) 240 tiling in adder/splitter Throughput [MVisibilities/s] 220 200 Unified, gridding Unified, degridding 180 Hybrid, gridding Hybrid, degridding GPU only, gridding GPU only, degridding 1024 2048 4096 8192 16384 32768 65536 Size [pixels 2 ] Unified memory (and tiling) enables the GPU to create very large images 24

Image-Domain Gridding for the Square Kilometre Array SKA-1 Low SKA-1 Mid Imaging data rate: 200 GVis/s # receivers 512 133 Compute: 50 PFlop/s (DP) # baselines 13,0816 8778 # channels 65,536 65.536 Power budget: ≈ 1 MW # polarizations 4 4 (De)gridding: ≈ 60% integration time 0.9 (s) 0.14 (s) data rate 8.3 (GVis/s) 9.53 (GVis/s) IDG on Tesla V100/NVLINK: ≈ 0 . 26 GVis/s per GPU → 770 V100s required Required compute: 770 × 7 . 8 ≈ 6 PFlop/s ≪ 15 PFlop/s available Power budget: 770 × 300 ≈ 231 KW ≪ 600 KW available 25

Summary High-performance gridding and degridding, including AW-term correction GPUs much faster and more (energy)-efficient than CPUs and Xeon Phi On GPUs, IDG outperforms AW-projection IDG is able to make very large images (using Unified Memory) Most challenging sub-parts of imaging for SKA is solved! More details: Image-Domain Gridding on Graphics Processors, Bram Veenboer, Matthias Petschow and John. W Romein, IPDPS 2017 Source available at: https://gitlab.com/astron-idg/idg 26

Image-Domain Gridding on Accelerators Bram Veenboer Monday 26 th - PowerPoint PPT Presentation

Netherlands Institute for Radio Astronomy Image-Domain Gridding on Accelerators Bram Veenboer Monday 26 th March, 2018, GPU Technology Conference 2018, San Jose, USA ASTRON is part of the Netherlands Organisation for Scientific Research (NWO)

Image Domain Gridding Sebastiaan van der Tol, Bram Veenboer Overview brief recap of imaging,

Application Accelerators: Application Accelerators: Application Accelerators: Application

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Noise Reduction in Gridded AIRS Radiance Products using the MODIS Gridding Algorithm. David

Evaluation of MapReduce for Gridding LIDAR Data Sriram Krishnan, Ph.D., Senior Distributed

The Encyclopedie of Diderot and dAlembert 1 Tailor of Suits, I This image is in the public

The Execution of Charles I 1 This image is in the public domain . This image is in the public

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

EUCARD2/WP4:Applica2ons Medium Energy Accelerators/Accelerators for Medicine

Natural Gas in Maine Second level Third level Manufacturing Fourth level Fifth

Air Source Heat Pumps Kyle Svendsen & Stephan Wollenburg Agenda Heat Pump Overview

Plug-in Vehicle Infrastructure Atlantic City Electric October 16, 2017 1 Atlantic City Electric

Adding Adders in County Durham Agenda Background Vulnerability of adders Connecting

An EM&V Framework for Delaware Consultants Initial Plan, for May 2015 Energy Efficiency

-1- -2- COMPANY BACKGROUND SYMBOL PSTC Core Business of PSTC Market MAI 1 . Power

Corporate Presentation Thailand Focus 2016 1 th September 2016 1 Thailand Power Industry

2018 INTEGRATED RESOURCE PLAN PROJECT UPDATE APRIL 10, 2018 Commission Presentation August 9,

Sambuz

Useful Links

Newsletter

Mail Us

Image-Domain Gridding on Accelerators Bram Veenboer Monday 26 th - PowerPoint PPT Presentation

Netherlands Institute for Radio Astronomy Image-Domain Gridding on Accelerators Bram Veenboer Monday 26 th March, 2018, GPU Technology Conference 2018, San Jose, USA ASTRON is part of the Netherlands Organisation for Scientific Research (NWO)

Image Domain Gridding Sebastiaan van der Tol, Bram Veenboer Overview brief recap of imaging,

Application Accelerators: Application Accelerators: Application Accelerators: Application

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Noise Reduction in Gridded AIRS Radiance Products using the MODIS Gridding Algorithm. David

Evaluation of MapReduce for Gridding LIDAR Data Sriram Krishnan, Ph.D., Senior Distributed

The Encyclopedie of Diderot and dAlembert 1 Tailor of Suits, I This image is in the public

The Execution of Charles I 1 This image is in the public domain . This image is in the public

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

EUCARD2/WP4:Applica2ons Medium Energy Accelerators/Accelerators for Medicine

Natural Gas in Maine Second level Third level Manufacturing Fourth level Fifth

Air Source Heat Pumps Kyle Svendsen &amp; Stephan Wollenburg Agenda Heat Pump Overview

Plug-in Vehicle Infrastructure Atlantic City Electric October 16, 2017 1 Atlantic City Electric

Adding Adders in County Durham Agenda Background Vulnerability of adders Connecting

An EM&amp;V Framework for Delaware Consultants Initial Plan, for May 2015 Energy Efficiency

-1- -2- COMPANY BACKGROUND SYMBOL PSTC Core Business of PSTC Market MAI 1 . Power

Corporate Presentation Thailand Focus 2016 1 th September 2016 1 Thailand Power Industry

2018 INTEGRATED RESOURCE PLAN PROJECT UPDATE APRIL 10, 2018 Commission Presentation August 9,

Sambuz

Useful Links

Newsletter

Mail Us

Air Source Heat Pumps Kyle Svendsen & Stephan Wollenburg Agenda Heat Pump Overview

An EM&V Framework for Delaware Consultants Initial Plan, for May 2015 Energy Efficiency