a new correlator for lofar
play

A new correlator for LOFAR l Chris Broekema l On behalf of the COBALT - PowerPoint PPT Presentation

l Netherlands Institute for Radio Astronomy COBALT A new correlator for LOFAR l Chris Broekema l On behalf of the COBALT team Chris Broekema l ASTRON / RUG / DELL NL On behalf of the COBALT team ASTRON / RUG / DELL NL ASTRON The Netherlands


  1. l Netherlands Institute for Radio Astronomy COBALT A new correlator for LOFAR l Chris Broekema l On behalf of the COBALT team Chris Broekema l ASTRON / RUG / DELL NL On behalf of the COBALT team ASTRON / RUG / DELL NL

  2. ASTRON The Netherlands Institute for radio astronomy lVerdana, 20 pts

  3. ASTRON Mission Statement To make discoveries in radio astronomy happen, via the development of novel and innovative technologies, the operation of world-class radio astronomy facilities, and the pursuit of fundamental astronomical research.

  4. Introduction radio astronomy (Slight Dutch bias) • First observations 1932 by Karl Jansky • Frst purpose built telescope 1937 by Grote Reber • 21 cm emission line of neutral hydrogen Predicted 1944 by van de Hulst Detected in 1951 by Ewen and Purcell (MIT) Published after confirmation by Muller and Oort • Opening Dwingeloo radio telescope in 1956 • Doppler effect (redshift) of fast moving objects shows structure of the local galaxy(1950's)

  5. Introduction radio astronomy

  6. LOFAR A distributed radiotelescoop

  7. The LOFAR “Superterp”

  8. Phased Arrays

  9. IBM Blue Gene/P To be retired early 2014

  10. Hardware design – Tasks 1. Receive LOFAR antenna field data l 10 GbE Ethernet; ~3 Gbps/station 2. Transpose data (ref. MPI_Alltoallv()) 3. Compute (correlate, beamform, filter, flag, etc) • Single precision floating point • Complex multiply-add 4. Forward results to storage l Storage cluster >100m, SM fibre l 10GbE or QDR Infiniband

  11. NVIDIA Tesla K10

  12. COBALT Preliminary design (Feb 2013) Strawman node l Dual Xeon E5 l 2x Nvidia K10 l 4x 10GbE l 2x FDR IB

  13. First prototype Dell PowerEdge R720

  14. First prototype Dell PowerEdge R720 PCIe

  15. Second prototype Dell PowerEdge T620

  16. Second prototype Dell PowerEdge T620

  17. GPU idle temperatures | NVIDIA-SMI 5.319.12 Driver Version: 319.12 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K10.G2.8GB Off | 0000:04:00.0 Off | N/A | | N/A 75C P0 43W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K10.G2.8B Off | 0000:05:00.0 Off | N/A | | N/A 76C P0 42W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K10.G2.8GB Off | 0000:45:00.0 Off | N/A | | N/A 62C P0 42W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K10.G2.8GB Off | 0000:46:00.0 Off | N/A | | N/A 46C P0 36W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+

  18. Prototype airflow guides Note: temperatures are under full load

  19. 3D-printed prototype designed and produced by ASTRON

  20. GPU temperatures with 3D-printed airflow guides +------------------------------------------------------+ | NVIDIA-SMI 5.319.12 Driver Version: 319.12 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K10.G2.8GB Off | 0000:04:00.0 Off | N/A | | N/A 48C P0 92W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K10.G2.8GB Off | 0000:05:00.0 Off | N/A | | N/A 52C P0 91W / ERR! | 2% 54MB / 3583MB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K10.G2.8GB Off | 0000:45:00.0 Off | N/A | | N/A 51C P0 92W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K10.G2.8GB Off | 0000:46:00.0 Off | N/A | | N/A 49C P0 95W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+

  21. Final COBALT system

  22. Current status  9 COBALT nodes operational (testing phase)  “Mass -produced ” airflow ducts/guides in place  Software development effort on schedule  Commissioning proceeding COBALT project passed performance review on 30th Aug COBALT Operational Readiness Review early December

  23. First Fringes with COBALT November 1st 2013

  24. Summary Or: problems faced  R720 PCIe imbalance  40 GbE ≠ 4x 10GbE  R720 doesn't fit 2x dual 10GbE  Dual port ConnectX3 IB PCIe bottleneck  Cooling issues T620 + Nvidia K10  Software optimizations → MPI stack  Accurate measuring performance/load  BUT: we are well on track to build a completely new correlator  within 12 months

  25. Questions?

Recommend


More recommend