Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver Sinnen Haomiao Wang & Prabu Thiagaraj (Manchester Uni) Parallel and Reconfigurable Computing Department of Electrical and Computer Engineering University of Auckland Computing for SKA, 2017 Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Strong-field Test of Gravity using Pulsars Image credit: NASA . Image Credit: NASA/Tod Strohmayer (GSFC)/Dana Berry (Chandra X-Ray Observatory) Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Outline Overview and Task 1 FT Convolution Decomposition 2 High-level Techniques and Implementation 3 Evaluation 4 What’s Next 5 Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Outline Overview and Task 1 FT Convolution Decomposition 2 High-level Techniques and Implementation 3 Evaluation 4 What’s Next 5 Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Pulsar and Pulsar Search Observed radiation is a pulse Binary pulsar (Doppler effect) Acceleration search: 1) Time-domain 2) Frequency-domain . Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Pulsar and Pulsar Search Frequency-domain Using matched filtering technique in Fourier domain to recover the signal into single bin. [ r 0 ]+ m / 2 A k A ∗ ∑ A r 0 ⋍ r 0 − k , k =[ r 0 ] − m / 2 where frequency r 0 is unknown. Summation is computed at a range of frequencies r . Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Block Overview of Pulsar Search Engine Filterbank Dedispersion Dedispersed Flagged DB Data Chuncks Buffers Data Buffer (FDB) Dedispersion Dedispersion Periodicity (FDC) (DB) (DDB) Data Receptor RFI Mtigation Buffer Creator Transform Search Buffer (RCPT) (RFIM) (DDBC) (DDTR) Creator (PSBC) Beamformed Data Filterbank Data Dedispersed Periodicity (BFD) for Selected SP Data Buffer Search Buffer Candidates (DDB) (PSB) To SDP Candidate Data Complex Output Single Pulse Fourier Streamer Detector (SPCT) Transform (CDOS) (CXFT) Filterbank Data for Candidate From SDP Candidate Full Filterbank Single Pulse Single Pulse Folding Folding and Birdie Zapping Buffer Creator Optimiser Sifter Optimsation (BRDZ) (FFBC) (SPOPT) (SPSIFT) (FLDO) Time Domain Time Domain Candidate Harmonic Fourier Transform Inverse Complex Dereddening Candidate Resampler Sifting Summing and Power Fourier Transform Spectrum Optimisation Transform (SIFT) (HRMS) Spectrum (PWFT) (iCXFT) (DRED) (TDAO) (TDRT) Common Single Pulse Fourier Domain Fourier Domain Candidate Time‐domain Acc Acceleration Search Optimisation (FDAS) Freq‐domain Acc (FDAO) Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Fourier-domain Acceleration Search (FDAS) FDAS module is applied to search for (binary) pulsars with constant frequency derivatives in frequency-domain Beam i signals are PSS Engine_ i de-dispersed for 6,000 DMs . . . . Beam 2 . . · Single Pulse Search Modules DM 1 DM 1 . . Beam i . . · . . Time Domain Acceleration or Beam N DM 2 Pre- FDAS Module Processing ... FT Convolution Module .RFIM Over 2,000 beams .DDTR Post- are formed at 4,096 DM j FIR_1 Harmonic- .PSBC . processing channels/beam . . .CXFT sum FIR_ k . ... .BRDZ . Module . .DRED FIR_ 85 DM 6000 85 FIR filters, maximum length is 421-tap Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Specification of Task Parameter Destriiption Value B # of beams 1000 ∼ 2000 DM # of de-dispersion measure (DM) trails 6000 Observation period 540 s T obs t limit Time of executing one sample group 88 ms 2 22 N # of complex samples per group M # of templates/filter 85 K # of average template/filter length > 200 . Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Outline Overview and Task 1 FT Convolution Decomposition 2 High-level Techniques and Implementation 3 Evaluation 4 What’s Next 5 Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next FT Convolution Complex floating-point operations Multiple long FIR filters Large input size Strict time limit Number of acceleration devices ( CapEx ) Energy consumption ( OpEx ) Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Basic Element Time-domain FIR Filter (TDFIR) K − 1 ∑ y m [ i ] = x m [ i − k ] h m [ k ] , for i = 0 , 1 , ... N − 1 k = 0 Frequency-domain FIR Filter (FDFIR) F { f ∗ h } = F { f }· F { h } Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Hardware Limitation Naïve Time Domain DSP block Single precision floating-point (SPF) multiplications ( A + iB ) × ( C + iD ) = ( A × C − B × D )+ i ( A × D + B × C ) Naïve Frequency Domain Off-chip (global) memory Off-chip memory bandwidth RAM block On-chip (local) memory size 4-Million elements = 32MBytes . Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Decomposition Algorithms Overlap-add Algorithm Overlap-save Algorithm Split the coefficient array Split the input array –> OLA-TD –> OLS-FD Length =N coef -1 Zero Input Data Split . . . ID_1 Coefficients C_1 C_2 C_N Split the ID_2 input into N Length = N coef /N -1 Convolve with subset small groups coefficient group i ID_3 ... Input data Zero Output data_i Convolution ID_N Output data_1 with FIR filter ID_i PD_i Output data_2 . . . Discard the N coef -1 elements Length = N coef /N ... + Output data_N PD_1 PD_2 PD_3 PD_N Output data Output Data (a) OLA (b) OLS Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Outline Overview and Task 1 FT Convolution Decomposition 2 High-level Techniques and Implementation 3 Evaluation 4 What’s Next 5 Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next High-level Techniques Maxeler MaxCompiler using Java to develop FPGA ( HPCC2016 ) Open Computing Language ( OpenCL ) for FPGAs ( Intel FPGA Cards ), GPUs, and CPUs ( FPT2016 , best paper candidate ) 2GB DDR3 x 2 FPGA_i . Host . Memory DDR Controller & PHY . Core 1 (Global Memory) FPGA_0 DDR Controller & PHY ... Global Memory PCIe Interconnect Global Memory ... PCIe PCIe Core 4 Interconnect Kernel Pipeline Block Kernel Pipeline Kernel Pipeline Kernel Pipeline RAM Kernels Pipeline Kernel Pipeline Block Kernel Pipeline Kernel Pipeline Kernel Pipeline Kernels Pipeline RAM Local Memory Interconnect Memory (DDR3 and SSD) Local Memory Interconnect Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Kernel Structures–OLA ������������� �������� ������ Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL
Recommend
More recommend