Netherlands Institute for Radio Astronomy Efficient Imaging in Radio Astronomy using GPUs Bram Veenboer, Matthias Petschow and John W. Romein Tuesday 9 th May, 2017, GTC 2017, San Jose, USA ASTRON is part of the Netherlands Organisation for Scientific Research (NWO)
Radio Astronomy Array of antennas and/or dishes Radio frequencies (30-240 Mhz) Map of radio sources LOFAR, The Netherlands Bo¨ otes field, > 1000 Megapixel Image credits: Wendy Williams, Reinout van Weeren and Huub Rottgering 1
Square Kilometre Array SKA1 Mid, Africa SKA1 Low, Australia 2
Square Kilometre Array 3
Imaging in Radio Astronomy Convert measurements (visibilities) into a sky-image: i n c o m i n g r a d i station o w a v e s imager sky-model correlator sky-image visibilities × I baseline (pair of stations) calibration imaging Measurement equation: phasor e − iφ A-term W-term m A p ( l, m ) × B ( l, m ) × e − 2 πi ( u pq l + v pq m + w pq n ) dldm V pq = � � l visibility sky coordinates source brightness visibility coordinate u, v, w 4
Fourier sampling instantaneous u,v-coverage u,v-coverage for one hour every baseline contributes one point (visibility) ‘earth rotation synthesis‘ 5
‘Gridding’ visibilities Place visibilities onto a regular Fourier grid: correct for correct for ‘direction-dependent effects’ phase correction earth curvature phasor e − iφ A-term W-term m A p ( l, m ) × B ( l, m ) × e − 2 πi ( u pq l + v pq m + w pq n ) dldm V pq = � � l visibility sky coordinates source brightness visibility coordinate u, v, w floating-point numbers Traditional approach: apply ‘convolution’ to each visibility 6
Imaging example Simulated three point sources, observed by 30 stations for 4 hours: 2 D FFT − → gridded visibilities sky image 7
Efficient Imaging in Radio Astronomy measured visibilities “image” − gridding iFFT CLEAN Fourier residual image grid bright sources Fourier grid model image degridding FFT model visibilities “predict” sky-image Problem: The ‘gridding’ and ‘degridding’ steps are computationally very expensive Solution: Use the novel Image-Domain Gridding (IDG) algorithm on accelerators Algorithm credits: Bas van der Tol 8
Placing visibilities onto a regular Fourier grid Fourier domain gridding using convolution kernels grid visibilities visibilities gridder kernel image subgrids gridder kernel FFT Fourier subgrids adder visibility: convolution: Fourier grid Fourier grid pixel in subgrid: Image domain gridding using subgrids 9
Image domain gridding: subgrids grid subgrid V j : (1 , ˜ C ) ( ˜ C ) (1 , 1) ( ˜ T, 1) A subset ( ˜ T × ˜ C ) of visibilities from baseline j are placed onto a subgrid 10
Image domain gridding: work distribution (2) subset of work (a number of subgrids) (3) work element (one subgrid) (1) work (4) pixels (all subgrids for a few baselines) 11
Optimizations General: Coarse-grained parallelism, vectorization, libraries Double buffering, shared memory Application specific: Fine-grained parallelism Data transpose (visibilities) Data alignment (uvw coordinates) Architecture specific: Computation of phasor term ( e − i φ ) Nvidia: one special function unit (SFU) for every four/six cores GCN: one transcendental operation per SIMD per four clock cycles 12
GPU implementation gpu subset of work HtoD shared global memory cpu queue memory preload into cache DtoH queue precompute / store result load from cache execute queue cpu threads o ffl oad subsets of work to GPU gpu threads perform gpu cores computation (in registers) sfu 13
Results: throughput/runtime Throughput: number of visibilities processed per second Haswell Haswell Pascal Pascal Fiji Fiji 0 20 40 60 Runtime [seconds] 0 100 200 300 gridder subgrid-ifft adder grid-fft Throughput [MVisibilities/s] splitter subgrid-fft degridder gridding degridding Most time spent in gridder/degridder GPUs perform > order of magnitude better than CPU 14
Roofline analysis: overview Pascal 10 Fiji gridder Performance [TOp/s] Haswell degridder 1 degridder gridder Pascal Fiji 0.1 Haswell 1 2 4 8 16 32 64 128 256 512 1 1 4 2 Operational intensity [Op/Byte] 15
Performance for FMA/sincos instruction mix 10 4 98 % 4 × 59 % Performance [GOp/s] 10 3 22 % 10 2 Pascal Fiji Haswell 256 128 64 32 16 8 4 2 1 1 1 2 4 ρ [fma/sincos] 16
Roofline analysis: instruction mix Pascal 10 Fiji gridder Performance [TOp/s] Haswell degridder Device Memory 1 M degridder gridder A R D Pascal Fiji 0.1 Haswell 1 2 4 8 16 32 64 128 256 512 1 1 4 2 Operational intensity [Op/Byte] 17
Roofline analysis: shared memory Pascal 10 Shared Memory Fiji gridder gridder Performance [TOp/s] Haswell degridder degridder Device Memory 1 M degridder gridder A R D Pascal Fiji 0.1 Haswell 1 2 4 8 16 32 64 128 256 512 1 1 4 2 Operational intensity [Op/Byte] 18
Results: energy consumption/efficiency Haswell Haswell Pascal Pascal Fiji Fiji 0 5 10 15 20 0 10 20 30 Energy consumption [kJ] Energy efficiency [GFlop/W] gridder subgrid-ifft adder grid-fft gridder degridder splitter subgrid-fft degridder host Most energy spent in gridder/degridder GPUs perform > order of magnitude better than CPU 19
Results: AW-projection at the cost of W-projection 400 W-projection 2 . 4 × Image-Domain Gridding Throughput [MVisibilities/s] 2 . 1 × 300 200 1 . 8 × 100 1 . 2 × 1 . 3 × 1 . 4 × 1 . 4 × 1 . 1 × 0 64 56 48 40 32 24 16 8 W-kernel size N W 20
Conclusion First implementations of the IDG algorithm on CPUs and GPUs First efficient degridding implementation on GPUs ever A thorough (roofline) analysis of the achieved performance An assessment of energy efficiency IDG on GPUs is a candidate to meet the demanding computational and energy efficiency constraints imposed by future telescopes such as the Square Kilometre Array (SKA). Image-Domain Gridding on Graphics Processors, Bram Veenboer, Matthias Petschow and John. W Romein, IPDPS 2017 21
Recommend
More recommend