l Netherlands Institute for Radio Astronomy COBALT A new correlator for LOFAR l Chris Broekema l On behalf of the COBALT team Chris Broekema l ASTRON / RUG / DELL NL On behalf of the COBALT team ASTRON / RUG / DELL NL
ASTRON The Netherlands Institute for radio astronomy lVerdana, 20 pts
ASTRON Mission Statement To make discoveries in radio astronomy happen, via the development of novel and innovative technologies, the operation of world-class radio astronomy facilities, and the pursuit of fundamental astronomical research.
Introduction radio astronomy (Slight Dutch bias) • First observations 1932 by Karl Jansky • Frst purpose built telescope 1937 by Grote Reber • 21 cm emission line of neutral hydrogen Predicted 1944 by van de Hulst Detected in 1951 by Ewen and Purcell (MIT) Published after confirmation by Muller and Oort • Opening Dwingeloo radio telescope in 1956 • Doppler effect (redshift) of fast moving objects shows structure of the local galaxy(1950's)
Introduction radio astronomy
LOFAR A distributed radiotelescoop
The LOFAR “Superterp”
Phased Arrays
IBM Blue Gene/P To be retired early 2014
Hardware design – Tasks 1. Receive LOFAR antenna field data l 10 GbE Ethernet; ~3 Gbps/station 2. Transpose data (ref. MPI_Alltoallv()) 3. Compute (correlate, beamform, filter, flag, etc) • Single precision floating point • Complex multiply-add 4. Forward results to storage l Storage cluster >100m, SM fibre l 10GbE or QDR Infiniband
NVIDIA Tesla K10
COBALT Preliminary design (Feb 2013) Strawman node l Dual Xeon E5 l 2x Nvidia K10 l 4x 10GbE l 2x FDR IB
First prototype Dell PowerEdge R720
First prototype Dell PowerEdge R720 PCIe
Second prototype Dell PowerEdge T620
Second prototype Dell PowerEdge T620
GPU idle temperatures | NVIDIA-SMI 5.319.12 Driver Version: 319.12 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K10.G2.8GB Off | 0000:04:00.0 Off | N/A | | N/A 75C P0 43W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K10.G2.8B Off | 0000:05:00.0 Off | N/A | | N/A 76C P0 42W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K10.G2.8GB Off | 0000:45:00.0 Off | N/A | | N/A 62C P0 42W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K10.G2.8GB Off | 0000:46:00.0 Off | N/A | | N/A 46C P0 36W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+
Prototype airflow guides Note: temperatures are under full load
3D-printed prototype designed and produced by ASTRON
GPU temperatures with 3D-printed airflow guides +------------------------------------------------------+ | NVIDIA-SMI 5.319.12 Driver Version: 319.12 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K10.G2.8GB Off | 0000:04:00.0 Off | N/A | | N/A 48C P0 92W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K10.G2.8GB Off | 0000:05:00.0 Off | N/A | | N/A 52C P0 91W / ERR! | 2% 54MB / 3583MB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K10.G2.8GB Off | 0000:45:00.0 Off | N/A | | N/A 51C P0 92W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K10.G2.8GB Off | 0000:46:00.0 Off | N/A | | N/A 49C P0 95W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+
Final COBALT system
Current status 9 COBALT nodes operational (testing phase) “Mass -produced ” airflow ducts/guides in place Software development effort on schedule Commissioning proceeding COBALT project passed performance review on 30th Aug COBALT Operational Readiness Review early December
First Fringes with COBALT November 1st 2013
Summary Or: problems faced R720 PCIe imbalance 40 GbE ≠ 4x 10GbE R720 doesn't fit 2x dual 10GbE Dual port ConnectX3 IB PCIe bottleneck Cooling issues T620 + Nvidia K10 Software optimizations → MPI stack Accurate measuring performance/load BUT: we are well on track to build a completely new correlator within 12 months
Questions?
Recommend
More recommend