The CBM First-level Event Selector, Timeslice Building and Availability Studies Helvi Hartmann hhartmann@fias.uni-frankfurt.de DPG Frühjahrstagung - Darmstadt 17.03.2016 HK 54 Prof. Dr. Volker Lindenstruth FIAS Frankfurt Institute for Advanced Studies Goethe-Universität Frankfurt am Main, Germany CBM http://compeng.uni-frankfurt.de 1
Introduction • CBM detectors are untriggered Challenge • free streaming data, expected data rate of ~1TB/s • Online event reconstruction using timeslices Timeslice FLES Timeslice Component Micro- Input overlap Compute MS slice + … Node 0/0 100/0 Compute Node … Infiniband 0/1 Micro- . . slice . Timeslice Component Input … + Compute Compute 100/ … Node Node 1000 0/1000 FIAS Frankfurt Institute 2 hhartmann@fias.uni-frankfurt.de for Advanced Studies
Input Timeslice Reconstruction Interface building & analysis Timeslice 1MB Timeslice Component Micro- overlap MS slice … 0/0 100/0 . . . Micro- Timeslice Component slice Micro- overlap MS slice 100/ … Input 0/1000 1000 Compute + Node Compute Node Infiniband Input + Compute Compute Node Node 3
Availability • MTBF - Meant time between failures • MTTR - Meant time to repair FIAS Frankfurt Institute 4 hhartmann@fias.uni-frankfurt.de for Advanced Studies
Availability MTTR 50% 99.9% 3h 1 0.99 0.98 16min 0.97 0.96 2min 0.95 0.94 0.93 10s 0.92 0.91 1s 0.9 10s 2min 16min 3h 1d 1w 4m MTBF FIAS Frankfurt Institute 5 hhartmann@fias.uni-frankfurt.de for Advanced Studies
Availability estimated MTBF for a crucial node failure is two weeks, extrapolated from real-world data the ALICE-HLT node failure 3h 1 0.99 0.98 0.97 30min 0.96 16min 0.95 0.94 0.93 Restart 5min Framework 0.92 0.91 2min 0.9 1d 1w 4m FIAS Frankfurt Institute 6 hhartmann@fias.uni-frankfurt.de for Advanced Studies
Availability 16min 1 0.99 5min 0.98 Restart 0.97 2min Framework 0.96 0.95 0.94 10s 0.93 0.92 0.91 1s 0.9 2d 16min 3h 1d 1w FIAS Frankfurt Institute 7 hhartmann@fias.uni-frankfurt.de for Advanced Studies
use native Infiniband Verbs implementation case: corrupted input data report error Input Timeslice Reconstruction interface building corrupted Timeslice ? Compute Node corrupted Input Timeslice + Compute Node Compute Infiniband Node 8
use native Infiniband Verbs implementation case: process failure report error Input Timeslice Reconstruction interface building Timeslice Micro- slice Compute Node Input + Compute Node Compute Infiniband Node 9
Can we use MPI as high-level API instead of low- level native Infiniband Verbs implementation? Input Timeslice Reconstruction interface building Timeslice Micro- slice 1MB Input Compute + Node Compute Node Infiniband 10
MPI Fault Tolerance In MPI: when one processes crashes all other processes within the same Communicator crash! Child Processes Intracommunicator MPI_COMM_World i k Process wit rank i of generation k 0 1 1 1 2 1 Parent to Child Intercommunicator 0 0 3 0 2 1 3 1 2 2 3 2 2 —> not possible to create independent Communicators FIAS Frankfurt Institute 11 hhartmann@fias.uni-frankfurt.de for Advanced Studies
Control System start/stop process on each node detect errors Input Timeslice Reconstruction interface building Timeslice Micro- slice 1MB Input Compute + Node Compute Node Infiniband 12
Conclusion and Outlook Availability • desired availability of 99.9% • higher failure rates during commissioning • no more failures than every 2 days • MPI is not fault tolerant • use native Infiniband Verbs implementation for timeslice building • add a control software to orchestrate processes and allow recovery from errors FIAS Frankfurt Institute 13 hhartmann@fias.uni-frankfurt.de for Advanced Studies
Recommend
More recommend