atomic cots
play

Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. - PowerPoint PPT Presentation

22 nd September 2020 Accelerating astronomy using Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. Bengston, A.Bolin, Y. Chen, E.Troup Astronomy Instrumentation in general Astronomy instruments can be crudely divided into


  1. 22 nd September 2020 Accelerating astronomy using Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. Bengston, A.Bolin, Y. Chen, E.Troup

  2. Astronomy Instrumentation in general ● Astronomy instruments can be crudely divided into three stages: ○ The bespoke antenna part - RF components specific to the instrument ○ The bespoke receiver part - RF amplification and filtering, ADCs and possibly the first stage filterbank - coarse channels across the bank ○ This presentation focuses on the last stage - and the use of COTS FPGA accelerators ● It will be shown in context of SKA Low but could be applied to almost any astronomy instrument Array of ADC & N-antennas Digital Signal Output N-antennas Antennas Filterbank M-channels Processing products

  3. Custom SKA Gemini Solution Gemini is a custom system of subracks with backplanes carrying power/liquid and fibre, and a powerful FPGA card with HBM memory and optics. 5 boards have been produced.

  4. Did we reinvent the wheel? Gemini development has taken many years. Now COTS boards in PCIe standard exist. These boards have a higher TRL as they are sold to many customers in high quantity.

  5. Does a COTS solution exist for Low.CBF? Can SKA use a COTS product? Original down select said no – is this true now? Can the signal flow be modified such that a standard COTS product could be used?

  6. Why Change? 1 st reason - Cost Xilinx components sell to a high margin military/medical market - but Xilinx are moving into commercial markets like video and 5G – that means competition and high volume.

  7. FPGAs are all “Atomic” independent! Operations One FPGA can compute “Coarse” 384-coarse Channel 1-coarse Output all output Filterbank 1-station Select 512-stations products products for 1-coarse x512 x384 FPGAs SKA Low has 512 stations – if we can get them all into one FPGA for 1-coarse channel, then each becomes Atomic – independent of any other FPGAs processing or data flow.

  8. 2 nd Reason - Shorter Schedule - less to do H Doppler & SPEAD Correlator Fine PCIe AXI B Coarse decoder Filterbank Delay M Delay H H In-network 100GbE SDP Quality RFI B Correlator B Processor Interface Packetiser % Flag M M H H SPEAD Coarse PSS & PST Fine RFI PCIe AXI B B decoder Delay Filterbanks Delay Flag M M H Relative PSS & In-network 100GbE PSS/PST PSS PSS & PST B Weight PST Processor Interface Packetisers Jones Beamformers M % Jones Going Atomic removes a majority of the communications in the FPGA. What remains is all about astronomy, giving a significant reduction in coding and testing and hence schedule.

  9. Alveo uses HBM enabled FPGAs A key part of the DSP chain is HBM memory - it enables stages of DSP to be separated (and asynchronous) and implement large memory buffers and corner turns.

  10. Communications moves out M&C Comm DSP For Gemini, a third of our effort is about getting the right data at the right place at the right time. For Atomic COTS, communication moves out of the FPGA – but where did it go?

  11. COTS Communications using now A new technology is available called P4 – it directly controls the data plane of the switch and provides guaranteed performance at line rate using the Tofino ASIC.

  12. 3 rd Reason - Adaptable and Flexible comms LFAA Existing Gemini Method 8 Fixed FPGA Cube Evenly 288-FPGAs distributed 6 Correlator and Programmable 6 Beamformers Communications SDP/PSS/PST The optical data interconnect for Gemini is fixed. Using a P4 in-network processor enables an adaptable/flexible programmable data flow - it copes with failure & scales.

  13. P4 in-network processor P4 = Programming Protocol-Independent Packet Processors, some call it a switch, we call it an in-network processor The P4 match-action tables are key - it can look into the packet itself to decode custom protocols such as SPEAD. Here beam index directs a packet to a particular output port.

  14. 4 th reason - COTS servers ● Integrated redundant cooling and power ● 20 PCIe slots per 4U ● Local CPU & BMC A COTS server houses the FPGAs and uses standard software. Tango executes adjacent to the FPGA and superfast memory transfers are possible across the PCIe bus.

  15. Standard Alveo FPGA 4GB HBM 4GB HBM OpenCL API for accessing Alveo F Monitoring C Shell & Control 100GbE Kernels AXI Interface P Low.CBF Kernel PCIe Alveo applications are developed using accelerator concepts. Xilinx uses a standardised OpenCL SW stack to talk to the kernel and enables applications to be developed quickly.

  16. 5 th reason - Compact Solution Low.CBF Low.CBF Low.CBF Low.CBF Rack #1 Rack #2 Rack #3 Rack #4 I/O #1 I/O #3 I/O #5 I/O #7 I/O #2 I/O #4 I/O #6 I/O #8 Intermediate #1 Intermediate #4 Intermediate #7 I/O #9 Intermediate #2 Intermediate #5 Intermediate #8 Intermediate #10 Intermediate #3 Intermediate #6 Intermediate #9 Intermediate #11 Dual 3-phase PDU Dual 3-phase PDU Dual 3-phase PDU Dual 3-phase PDU M&C Switch #1 M&C Server #2 Spare Server M&C Server #1 M&C Switch #2 Server #16 Server #1 Server #6 Server #11 Server #2 Server #7 Server #17 Server #12 Server #3 Server #8 Server #13 Server #18 Server #4 Server #9 Server #19 Server #14 Server #5 Server #10 Server #15 Server #20 Atomic COTS uses a relatively small number of components. There are less cables, hot redundancy, scales easily, and spares are readily available.

  17. Summary and Conclusions ● The Atomic COTS evolution has begun! ○ No hardware to develop - Xilinx has done it already (and it's very low cost) ○ Use standard PCIe servers, power supplies and air cooling ○ Standard OpenCL software stack enables developers to focus on the astronomy not the communications ○ P4 in-network processor which provides line-rate performance with a minimal amount of coding ○ High speed 100GbE data direct into the FPGA, with configuration using PCIe monitoring and control ○ HBM memory enables greater freedom in the firmware design ○ Code can be easily migrated between Alveo boards ● Many potential projects being investigated currently … what was difficult before could now be low cost, scalable, compact and quick to develop ○ Prototyping is very promising … no show stoppers identified so far!

Recommend


More recommend