efficient imaging in radio astronomy using gpus
play

Efficient Imaging in Radio Astronomy using GPUs Bram Veenboer, - PowerPoint PPT Presentation

Netherlands Institute for Radio Astronomy Efficient Imaging in Radio Astronomy using GPUs Bram Veenboer, Matthias Petschow and John W. Romein Tuesday 9 th May, 2017, GTC 2017, San Jose, USA ASTRON is part of the Netherlands Organisation for


  1. Netherlands Institute for Radio Astronomy Efficient Imaging in Radio Astronomy using GPUs Bram Veenboer, Matthias Petschow and John W. Romein Tuesday 9 th May, 2017, GTC 2017, San Jose, USA ASTRON is part of the Netherlands Organisation for Scientific Research (NWO)

  2. Radio Astronomy Array of antennas and/or dishes Radio frequencies (30-240 Mhz) Map of radio sources LOFAR, The Netherlands Bo¨ otes field, > 1000 Megapixel Image credits: Wendy Williams, Reinout van Weeren and Huub Rottgering 1

  3. Square Kilometre Array SKA1 Mid, Africa SKA1 Low, Australia 2

  4. Square Kilometre Array 3

  5. Imaging in Radio Astronomy Convert measurements (visibilities) into a sky-image: i n c o m i n g r a d i station o w a v e s imager sky-model correlator sky-image visibilities × I baseline (pair of stations) calibration imaging Measurement equation: phasor e − iφ A-term W-term m A p ( l, m ) × B ( l, m ) × e − 2 πi ( u pq l + v pq m + w pq n ) dldm V pq = � � l visibility sky coordinates source brightness visibility coordinate u, v, w 4

  6. Fourier sampling instantaneous u,v-coverage u,v-coverage for one hour every baseline contributes one point (visibility) ‘earth rotation synthesis‘ 5

  7. ‘Gridding’ visibilities Place visibilities onto a regular Fourier grid: correct for correct for ‘direction-dependent effects’ phase correction earth curvature phasor e − iφ A-term W-term m A p ( l, m ) × B ( l, m ) × e − 2 πi ( u pq l + v pq m + w pq n ) dldm V pq = � � l visibility sky coordinates source brightness visibility coordinate u, v, w floating-point numbers Traditional approach: apply ‘convolution’ to each visibility 6

  8. Imaging example Simulated three point sources, observed by 30 stations for 4 hours: 2 D FFT − → gridded visibilities sky image 7

  9. Efficient Imaging in Radio Astronomy measured visibilities “image” − gridding iFFT CLEAN Fourier residual image grid bright sources Fourier grid model image degridding FFT model visibilities “predict” sky-image Problem: The ‘gridding’ and ‘degridding’ steps are computationally very expensive Solution: Use the novel Image-Domain Gridding (IDG) algorithm on accelerators Algorithm credits: Bas van der Tol 8

  10. Placing visibilities onto a regular Fourier grid Fourier domain gridding using convolution kernels grid visibilities visibilities gridder kernel image subgrids gridder kernel FFT Fourier subgrids adder visibility: convolution: Fourier grid Fourier grid pixel in subgrid: Image domain gridding using subgrids 9

  11. Image domain gridding: subgrids grid subgrid V j : (1 , ˜ C ) ( ˜ C ) (1 , 1) ( ˜ T, 1) A subset ( ˜ T × ˜ C ) of visibilities from baseline j are placed onto a subgrid 10

  12. Image domain gridding: work distribution (2) subset of work (a number of subgrids) (3) work element (one subgrid) (1) work (4) pixels (all subgrids for a few baselines) 11

  13. Optimizations General: Coarse-grained parallelism, vectorization, libraries Double buffering, shared memory Application specific: Fine-grained parallelism Data transpose (visibilities) Data alignment (uvw coordinates) Architecture specific: Computation of phasor term ( e − i φ ) Nvidia: one special function unit (SFU) for every four/six cores GCN: one transcendental operation per SIMD per four clock cycles 12

  14. GPU implementation gpu subset of work HtoD shared global memory cpu queue memory preload into cache DtoH queue precompute / store result load from cache execute queue cpu threads o ffl oad subsets of work to GPU gpu threads perform gpu cores computation (in registers) sfu 13

  15. Results: throughput/runtime Throughput: number of visibilities processed per second Haswell Haswell Pascal Pascal Fiji Fiji 0 20 40 60 Runtime [seconds] 0 100 200 300 gridder subgrid-ifft adder grid-fft Throughput [MVisibilities/s] splitter subgrid-fft degridder gridding degridding Most time spent in gridder/degridder GPUs perform > order of magnitude better than CPU 14

  16. Roofline analysis: overview Pascal 10 Fiji gridder Performance [TOp/s] Haswell degridder 1 degridder gridder Pascal Fiji 0.1 Haswell 1 2 4 8 16 32 64 128 256 512 1 1 4 2 Operational intensity [Op/Byte] 15

  17. Performance for FMA/sincos instruction mix 10 4 98 % 4 × 59 % Performance [GOp/s] 10 3 22 % 10 2 Pascal Fiji Haswell 256 128 64 32 16 8 4 2 1 1 1 2 4 ρ [fma/sincos] 16

  18. Roofline analysis: instruction mix Pascal 10 Fiji gridder Performance [TOp/s] Haswell degridder Device Memory 1 M degridder gridder A R D Pascal Fiji 0.1 Haswell 1 2 4 8 16 32 64 128 256 512 1 1 4 2 Operational intensity [Op/Byte] 17

  19. Roofline analysis: shared memory Pascal 10 Shared Memory Fiji gridder gridder Performance [TOp/s] Haswell degridder degridder Device Memory 1 M degridder gridder A R D Pascal Fiji 0.1 Haswell 1 2 4 8 16 32 64 128 256 512 1 1 4 2 Operational intensity [Op/Byte] 18

  20. Results: energy consumption/efficiency Haswell Haswell Pascal Pascal Fiji Fiji 0 5 10 15 20 0 10 20 30 Energy consumption [kJ] Energy efficiency [GFlop/W] gridder subgrid-ifft adder grid-fft gridder degridder splitter subgrid-fft degridder host Most energy spent in gridder/degridder GPUs perform > order of magnitude better than CPU 19

  21. Results: AW-projection at the cost of W-projection 400 W-projection 2 . 4 × Image-Domain Gridding Throughput [MVisibilities/s] 2 . 1 × 300 200 1 . 8 × 100 1 . 2 × 1 . 3 × 1 . 4 × 1 . 4 × 1 . 1 × 0 64 56 48 40 32 24 16 8 W-kernel size N W 20

  22. Conclusion First implementations of the IDG algorithm on CPUs and GPUs First efficient degridding implementation on GPUs ever A thorough (roofline) analysis of the achieved performance An assessment of energy efficiency IDG on GPUs is a candidate to meet the demanding computational and energy efficiency constraints imposed by future telescopes such as the Square Kilometre Array (SKA). Image-Domain Gridding on Graphics Processors, Bram Veenboer, Matthias Petschow and John. W Romein, IPDPS 2017 21

Recommend


More recommend