Real-Time Resampling Processor for SWARM Mark Peryer Harvard Smithsonian Center for Astrophysics August 16, 2017
Presentation Overview Background Objectives Design Results Future work
Background Event Horizon Telescope (EHT) Image the event horizon of SgrA* Global network of telescopes Very Long Baseline Interferometry (VLBI)
Submillimeter Array Mauna Kea, Hawaii 8 element interferometer 32 GHz instantaneous bandwidth
SWARM ROACH2 platform ADCs record data at 4.576 GSps One Quadrant = ~38 Gigabits every second!
Compatibility Issue Frequency Domain β SMA EHT 4.576 GHz 4.096 GHz Time Domain
APHIDS Non-real-time GPU resampling system SWITCH SDBE Mark 6 GPU Server Mark 6 SWARM Mark 6 10 GbE Single Quadrant ROACH2 Data Recorder Data Recorder Data Recorder GeForce GTX 980 ROACH2 q2 Time VDIF UDP VDIF unpack 16K-pt FFT N-pt FFT M-pt FFT q2 ROACH2 GeForce GTX 980 ROACH2 q2 q2 Time VDIF UDP VDIF unpack 16K-pt FFT N-pt FFT M-pt FFT ROACH2 TCP VDIF TCP GeForce GTX 980 ROACH2 q2 q2 Time VDIF UDP VDIF unpack 16K-pt FFT N-pt FFT M-pt FFT ROACH2 GeForce GTX 980 4.69 Gbps ROACH2 9.38 Gbps 4.75 Gbps Time q2 VDIF UDP q2 VDIF unpack 16K-pt FFT N-pt FFT M-pt FFT 4.69 Gbps ROACH2 Into Switch/SDBE Into Mark 6 Disk transport Aggregate post-observation Data Rates 37.50 Gbps 18.99 Gbps
Improvements APHIDS Real-Time Resampler π Costly β Inexpensive π Time Inefficient β Instantaneous Results π Quantization Error β Limited Quantization Error
Target Hardware SKARAB Virtex 7 FPGA 40 GbE interface
High-Level Overview Packetize 40GbE Depacketize Data Data 40GbE (VDIF) (Rx) (Rx) (B-engine) Transpose Reprocessing 32768 point Resampling Requantize Inverse FFT
Resampling Input β¬ 3 4576 4096 = 143 128 Upsample by 128 LPF Downsample by 143 LPF β¬ 2 L M H(k)
Practicality β¬ 128 β¬ 143 585 billion samples every second! Throw away 581 billion samples
Solution β¬ 16 β¬ 17.875 128 143 4576 16 16 4096 = 143/8 = 17.875 16 8 13 11
0 Upsampling b 0 F 2F z -1 b 1 Inefficient z -1 β 0 Clock rate increased b 2 z -1 b 3 b 0 t0 t1 t2 F z -1 β 3b 0 +2b 2 4b 0 +3b 2 5b 0 +4b 2 b 1 Efficient F Clock rate unchanged b 2 t0 t1 t2 2b 1 +1b 3 3b 1 +2b 3 4b 1 +3b 3 z -1 β b 3
Time Scaling Up 4 samples every clock cycle 1 new sample every 16 clocks Pattern repeats for parallel inputs
FIR Filter Magnitude response of filter 63 rd order FIR filter Low pass filter 64 coefficients Least-squares linear phase F pass = 2.138 GHz F stop = 2.288 GHz
FIR Filter Design 16 Filters 1024 multiplies 768 adds
Downsampling 16 outputs every clock β¬ by 17 and 18 ROM stores mux select Repeats every 143 clocks
Simulated Input 1 GHz sine wave 4.576 GHz sample rate 16 parallel samples per clock
Simulated Results Theoretical output from MATLAB Output from Simulink Design
Bit Growth Full bit depth Reduced bit depth 16_14 bit input 8_7 bit input 16_14 bit coefficients 8_7 bit coefficients 34_28 bit output 18_14 bit output
Resource usage Reg. LUTs 0.5% 7% BRAMs DSPs 0.5% 0%
Conclusion Real Time Implementation Parallel FIR Filter Design Fits on Target Hardware
Future Work Remove invalid outputs Incorporate into Real-Time Resampling System
Acknowledgements Jonathan Weintroub Sheperd Doeleman AndrΓ© Young Rurik Primiani Bob Wilson Arash Roshanineshat SKA Team Casper Community
Thank You Questions?
Recommend
More recommend