Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15, 2014
Caveat … these are bits pieces Overview which I am aware of and which seem relevant to the discussion Limited time to collect metrics This is an open discussion ! Corrections, additions: welcome ! Problem: MC Overlay jobs cause Frontier overload on the grid Aspects of the issue: Conditions aspects of MC Overlay jobs Conditions folders of interest Conditions DB & COOL Conditions deployment on the grid (Frontier & DB Releases) MC Overlay Task deployment on the grid Reconstruction Software how these aspects, in combination, result in overload April 2014 E.Gallas 2
MC Overlay jobs Overlay real “zero bias” events on simulated events An exception to the norm wrt Conditions Access data from multiple Conditions Instances: COMP200 (Run 1 real data conditions) OFLP200 (MC conditions) This is not thought to contribute to the problem What seems exceptional and notable is that Conditions data volume needed by each job to reconstruct these events Is much greater (10-200x) conditions volume of typical reco Estimates vary … Is greater than event data volume of each job Event volume: A few hundred events ? x 1.5MB the metadata is larger than the data itself … April 2014 E.Gallas 3
Conditions deployment on the grid 2 modes of access (direct Oracle access demoted) DB Release files or Frontier MC Overlay can’t use just the default DB Release It doesn’t contain real data conditions So it is using Frontier Alastair: An unusual aspect to the overload is that it is actually bringing down Frontier servers: reboot required ! try to understand cause of comatosis (more later on this) Could we use DB Release (for some/all conditions)? From Misha: Yes, sure it's possible to make DBRelease for these data DB access also could be mixed in any way (frontier + DB Release) Finding the folder list: main role of DBRelease-on-Demand system it's release and jobOption specific. Problem: is how distribute it. Before HOTDISK was used Now it should be CVMFS: requires new approach. DB Release size … can’t be known without studying Alastair: CVMFS likes small files, not large ones … April 2014 E.Gallas 4
Conditions DB and Athena IOVDbSvc IOVDbSvc: gets conditions in a time window wider than the actual request So each conditions retrieval contains probably a bit more data than might be needed by the job This mechanism is generally very effective in reducing subsequent queries in related time windows Unsure if this mechanism is helping here It depends on if subsequent zero bias events in the same job are in the Run/LB range of the retrieved conditions April 2014 E.Gallas 5
MC Overlay task deployment Assumptions about how these tasks are deployed: Related jobs are (same MC process id) deployed to specific sites (or clouds) and each requires a unique set of zero bias events over all the jobs Each of the “related” jobs Are in clouds using the same Squids and/or Frontier Accesses the conditions needed for the zero bias events being overlayed The conditions being accessed is always distinct this completely undermines any benefit of Frontier caching (queries are always unique) Multiply this by the hundred/thousand jobs in the task, each retrieving distinct conditions obvious stress on the system April 2014 E.Gallas 6
Query evaluation: Folder of interest: Identified via Frontier logs: ATLAS_COOLONL_TDAQ.COMP200_F0063_IOVS IOV: 1351385600000000000-1351390400000000000 Evaluate this specific query: RunLB range: 213486 LB 612 – 700 (part of run 213486) Folder: COOLONL_TDAQ/COMP200 /TDAQ/OLC/BUNCHLUMIS IOV basis: TIME (not Run/LB) Channel Count: 4069 channels retrieved generally less … depends on IOV Payload: Bunch-wise RunLB (UInt63) Luminosity ! AverageRawInstLum (Float) BunchRawInstLum (Blob64k) – > LOB !! Large Object !! Valid (UInt32) The query retrieves 2583 rows, each including LOBs number of rows >> number of LBs (~80) This is the nature of the folder being used April 2014 E.Gallas 7
… more about LOBs … Folder accessed has a LOB payload (Large Object) Back to COOL ( and via Frontier): LOB access from COOL not the same as access to other payload column types There is some more back/forth communication Between client (Frontier) and Oracle Rows are retrieved individually Always a question: can LOB access be improved ? Also: is there something about Frontier and LOBs Something that might cause the Frontier failure ? It doesn’t happen with single jobs Only seems to occur when loaded above a certain level no individual query in these jobs results in data throughput beyond the system capacity it is somehow triggered by load April 2014 E.Gallas 8
No system has infinite capacity General ATLAS Database Domain Goal: Develop and deploy systems which can deliver any data in databases needed by jobs Even large volumes when needed In reality: Capacity, bandwidth, etc … are not infinite So consider ways to moderate requests but still satisfy use cases This case, bunch-wise luminosity is being retrieved More channels are being retrieved than being used Inefficiency in the COOL callback mechanism Already planned improvement to folders for Run 2 Thanks to Eric, Mika (lumi experts), Andy (MC experts) for critical feedback I asked in email off- thread … answers on the next slide: Is bunch-wise luminosity really needed ? April 2014 E.Gallas 9
Is bunch-wise lumi needed ? Andy: … not doing anything special for overlay for lumi info … running standard reco … must be the default for standard reco of data as well … What's unusual for overlay is that each event can be from a different LB, whereas for data the events are mostly from the same LB within a job. Eric: Yes of course, and this could trigger the IOVDBSvc to constantly reload this information for every event from COOL. … per - BCID luminosity … used by LAr as a part of standard reco since (early) 2012 … used to predict the LAr noise as a function of position in the bunch train from out of time pileup. I don't know exactly what happens in the overlay job, but presumably it also accesses this information to find the right mix of events. April 2014 E.Gallas 10
Attempt at a summary Aspects of conditions implementation, usage and deployment all seem to conspire … no one smoking gun DB caching mechanisms: completely undermined by this pattern of access Software: using default reconstruction for luminosity Bunch-wise corrections needed for real data reco (Lar) NO problems with this in deployment – should not change ! But is the default overkill for zero bias overlay ? Would the BCID average luminosity suffice (use different folder) ? Eliminates the need for LOB access in this use case Conditions DB/COOL and Frontier COOL side: no obvious culprit … BLOB sizes vary Frontier: Evaluate cause of failure w/ LOBs under high load Task deployment: DB Release option ? any other ideas ? Please be patient Must find the best overall long term solution for this case Without undermining software which is critical for other use cases Use this use case for studying the bottlenecks April 2014 E.Gallas 11
Recommend
More recommend