monitoring of the daq2 system
play

Monitoring of the DAQ2 system Remi Mommsen, FNAL DAQ2 Shift - PowerPoint PPT Presentation

DAQ2 Shift Tutorial 1 cDAQ group Monitoring of the DAQ2 system Remi Mommsen, FNAL DAQ2 Shift Tutorial 2 cDAQ group Monitoring tools RCMS/LVL0 interface 1. Has been covered by Hannes aDAQMon 2. Overview screen to see at a glance


  1. DAQ2 Shift Tutorial 1 cDAQ group Monitoring of the DAQ2 system Remi Mommsen, FNAL

  2. DAQ2 Shift Tutorial 2 cDAQ group Monitoring tools RCMS/LVL0 interface 1.  Has been covered by Hannes aDAQMon 2.  Overview screen to see at a glance the CMS running configuration and rates. DAQView 3.  Most comprehensive monitoring tool for issues with data flow. Here you can monitor the data from FEDs to BUs. Elastic Search / Filter Farm monitoring (File Merging & Transfers) 4.  Shows the progress of file merging and transfers to T0. Important monitor of file-based filter farm (FFF). CPM controller 5.  Central Partition Manager for the TCDS system. Good place to see rates, state of detector inputs, etc. HotSpot 6.  Central display for sentinel messages for errors from all processes.

  3. DAQ2 Shift Tutorial 3 cDAQ group aDAQmon – DAQ Summary http://cmsonline.cern.ch/daqStatusSCX/DAQstatusGre.html Status bar gives a quick overview of the DAQ

  4. DAQ2 Shift Tutorial 4 cDAQ group FED-RU data stream DAQ Sub-Sys configuration Main systems (LHC, DCS,...) status Box color: FED RU configuration Sub-Sys ID FED IN RU bandwidth plot FED OUT BU bandwidth plot RU/BU box color: CPU 0 100% # Ev. in BU BU RAM disk % RU/BU box RED frame: flash data not updated BU OUT disk % Event storage summary

  5. DAQ2 Shift Tutorial 5 cDAQ group DAQView

  6. DAQ2 Shift Tutorial 6 cDAQ group DAQView http://cmsdaqweb.cms/local/daqview/cdaq/DAQ.html Status & navigation FFF Appliances BU & FU FED Builder Event Builder FEROL/FMM RU/EVM Age of monitor data

  7. DAQ2 Shift Tutorial 7 cDAQ group DAQView - Navigation Duration and start time of run (or last restart of DAQView) Current run Stop refreshing page Switch pages between Last update of page must be current! If it is stale, you need to restart DAQView FEDbuilder, FFF, and all You only need cDAQ Start DAQView if it is not running

  8. DAQ2 Shift Tutorial 8 cDAQ group DAQView – FED builder Confused? Try the table help button! FED builder name %warning, %busy in TTS partition FEROL PC TTC min/max # fragments received by FEROL. (link to hyperdaq page) partition Highlighted in yellow if different to trigger. name & no. Current TTS Min is only displayed if not equal to max. state of FED information partition (see next page)

  9. DAQ2 Shift Tutorial 9 cDAQ group DAQView – FEROL and FMM Entries are of form   FRL_geoslot: FEDSourceID or  FRL_geoslot: FEDSourceID1, FEDSourceID2 or  FEDSourceID For a pseudo-FED (=TTS link only, but no data is read out by DAQ) Additional info may be displayed next to the FEDSourceID  (from left to right)  Percentage of time during which FED was in Warning ( ) or Busy ( ) during the last 3 seconds (if B:0.2% W:9.9% non-zero) W  Current state of TTS if other than Ready FEDSourceID (expected) 601  Grey if FRL input not enabled (FMM not enabled in case of pseudo-FED)  Highlighted in color of current TTS state if other than Ready   Percentage of time with DAQ backpressure <6.9% Use this to judge whether a FED is during last update interval (5s) if non-zero creating dead-time because of a FED  Warnings problem or because of DAQ-backpressure Received source ID different to expected  FED or SLINK CRC errors #FCRC=69  9605 Number of fragments received by FRL if no data is flowing and this FRL is lagging “behind”  uTCA FEDs (TCDS and HF lumi) do not have an FMM   Busy/warning are not visible in DAQView! Check the CPM controller

  10. DAQ2 Shift Tutorial 10 cDAQ group DAQView – RU/EVM Information Throughput (MB/s) Rate (kHz) Super-fragment size (kB) # requests by BU normal EVM >> 1 && RUs < 10 First row is TCDS / EVM Shaded values Each row is one mean FEDbuilder FEDbuilder is not in readout EVM/RU host # incomplete fragments (link to hyperdaq page) # events currently in RU >> 1 indicates a problem on >>1 indicates problem in IB # fragments built the RU by RU/EVM since start of run

  11. DAQ2 Shift Tutorial 11 cDAQ group DAQView – FFF/BU Confused? Try the table help button! # LS on FUs Event size (kB) Current LS number Events built since start of run BU host (link to hyperdaq page) Each line is one Resource information Throughput (MB/s) Appliance (see next page) # LS for which # events being built there is a file Rate per BU (kHz) # files written

  12. DAQ2 Shift Tutorial 12 cDAQ group DAQView – BU Resources BU resources are used for requesting events  Each resource corresponds to multiple events  Less resources mean less event requests to EVM  Load balancing between independent appliances   Backpressure mechanism if FFF/HLT cannot keep up Each BU has a number of resources (#resources)  Resources can be blocked (#blocked)   RAM disk becomes full Not enough FU CPU cores are available to process data  FU processing lags behind  Resources for which no event data has been received are counted under #requests   If #requests > 0, the BU is able to accept new events

  13. DAQ2 Shift Tutorial 13 cDAQ group DAQView – Running, or not? LVL0: DAQ is running No, rate is 0 kHz None of the HF FEDs No fragments in RU has sent any events Many events requested No data flow as HF has not sent any data  Talk to HF expert

  14. DAQ2 Shift Tutorial 14 cDAQ group DAQView – Who Blocks the Run? ECAL is 100% Rate is 0 kHz There’s backpressure FED 602 is in warning in Warning and last event is 9605 from DAQ RU waits for data from FED 59 FED 59 has not sent any data FED 59 is the culprit  Talk to Tracker expert

  15. DAQ2 Shift Tutorial 15 cDAQ group DAQView – DAQ backpressure Very few events There’s backpressure ECAL is 50% requested by BUs in Warning from DAQ The rate is 10 kHz All BUs are RAM disk is full RAM disk is nearly full “blocked” or “throttled” All resources blocked 25/32 resources blocked Only a few FU cores available No FU cores available 26/32 resources are blocked All resources blocked FFF is blocked  Try to figure out what is wrong (and call DAQ oncall)

  16. DAQ2 Shift Tutorial 16 cDAQ group F3 Monitor

  17. DAQ2 Shift Tutorial 17 cDAQ group Storage & Transfer System Aggregate files (event data, DQM histograms & metadata) as they  appear Micro-merger on each FU aggregates the data from all processes on the FU Mini-merger on the BU aggregates the data from all FUs Mega-merger(s) aggregate the data from all BUs Data and meta-data are aggregated per luminosity section  Each luminosity section and stream treated independently If previous step has completed successfully, input data can be deleted 1 7

  18. DAQ2 Shift Tutorial 18 cDAQ group F3 Monitor http://cmsdaq0/daqfff/ecd/ Both boxes must be green Confused? Try the guide! Active run Stream rates vs LS Completeness of data Alert DAQ oncall when multiple boxes are not green (this situation is okay) List of recent runs Stream names (click to hide them) Time chart of HLT activity Access old runs Nice demo available at http://cmsdaq0/daqfff/ecd/doc/presentation/

  19. DAQ2 Shift Tutorial 19 cDAQ group Storage Manager Page 1 http://cmsonline.cern.ch/portal/page/portal/CMS%20online%20system/Storage%20Manager Gives an overview of the data transfer to tier 0 for recent runs  Number of files, sizes and event rates per stream  Totals per run  Check that files are injected, transferred and checked (in future also repacked & deleted)   Suspicious values are color coded  Make an elog entry and send an email to cms-storagemanager-alerting@cern.ch in case of error

  20. DAQ2 Shift Tutorial 20 cDAQ group Central Partition Manager

  21. DAQ2 Shift Tutorial 21 cDAQ group TCDS  Combines the pre-LS1:  Trigger Control System (TCS) The conductor of all CMS triggering and data-taking  Trigger Timing and Control (TTC) The distributor of clock, L1As, and synchronisation signals  Trigger Throttling System (TTS) The feedback of readiness states from FEDs to TCS  Many-legged creature:  The ‘head’ is the Central Partition Manager (controlled by central DAQ)  Many different legs (i.e., partitions) across the different subsystems (controlled by the subsystems)

  22. DAQ2 Shift Tutorial 22 cDAQ group TCDSCentral tcds-control-central.cms:2000/urn:xdaq-application:lid=100

  23. DAQ2 Shift Tutorial 23 cDAQ group TCDSCentral tcds-control-central.cms:2000/urn:xdaq-application:lid=100 TTC machine interface applications Provide the connection between the LHC RF and timing signals and CMS.

  24. DAQ2 Shift Tutorial 24 cDAQ group TCDSCentral tcds-control-central.cms:2000/urn:xdaq-application:lid=100 Central Partition Manager (CPM) Drives CMS. Controls triggers, calibration sequence, timing and synchronisation, … This application should tell you what and how many triggers are flowing, or why not.

  25. DAQ2 Shift Tutorial 25 cDAQ group CPMController tcds-control-central.cms:2050/urn:xdaq-application:lid=100 Hardware status tab Running state shows if triggers are flowing or why not: Stopped Running Blocked by TTS Blocked by DAQ backpressure etc.

  26. DAQ2 Shift Tutorial 26 cDAQ group CPMController tcds-control-central.cms:2050/urn:xdaq-application:lid=100 TTS and trigger blockers tab Running state: Stopped Running shows what can/will block triggers Blocked by TTS Blocked by DAQ backpressure etc.

Recommend


More recommend