Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15, 2014
Caveat … these are bits pieces Overview  which I am aware of and  which seem relevant to the discussion  Limited time to collect metrics  This is an open discussion !  Corrections, additions: welcome !  Problem:  MC Overlay jobs cause Frontier overload on the grid  Aspects of the issue:  Conditions aspects of MC Overlay jobs  Conditions folders of interest  Conditions DB & COOL  Conditions deployment on the grid (Frontier & DB Releases)  MC Overlay Task deployment on the grid  Reconstruction Software  how these aspects, in combination, result in overload April 2014 E.Gallas 2
MC Overlay jobs  Overlay real “zero bias” events on simulated events  An exception to the norm wrt Conditions  Access data from multiple Conditions Instances:  COMP200 (Run 1 real data conditions)  OFLP200 (MC conditions)  This is not thought to contribute to the problem  What seems exceptional and notable is that  Conditions data volume needed by each job to reconstruct these events  Is much greater (10-200x) conditions volume of typical reco  Estimates vary …  Is greater than event data volume of each job  Event volume: A few hundred events ? x 1.5MB  the metadata is larger than the data itself … April 2014 E.Gallas 3
Conditions deployment on the grid 2 modes of access (direct Oracle access demoted)   DB Release files or Frontier MC Overlay can’t use just the default DB Release   It doesn’t contain real data conditions  So it is using Frontier  Alastair: An unusual aspect to the overload is that it is actually bringing down Frontier servers: reboot required !   try to understand cause of comatosis (more later on this) Could we use DB Release (for some/all conditions)? From Misha:   Yes, sure it's possible to make DBRelease for these data  DB access also could be mixed in any way (frontier + DB Release)  Finding the folder list: main role of DBRelease-on-Demand system  it's release and jobOption specific.  Problem: is how distribute it.  Before HOTDISK was used  Now it should be CVMFS: requires new approach.  DB Release size … can’t be known without studying Alastair:   CVMFS likes small files, not large ones … April 2014 E.Gallas 4
Conditions DB and Athena IOVDbSvc  IOVDbSvc:  gets conditions in a time window wider than the actual request  So each conditions retrieval contains probably a bit more data than might be needed by the job  This mechanism is generally very effective in reducing subsequent queries in related time windows  Unsure if this mechanism is helping here  It depends on if subsequent zero bias events in the same job are in the Run/LB range of the retrieved conditions April 2014 E.Gallas 5
MC Overlay task deployment  Assumptions about how these tasks are deployed:  Related jobs are (same MC process id) deployed to specific sites (or clouds) and each requires a unique set of zero bias events over all the jobs  Each of the “related” jobs  Are in clouds using the same Squids and/or Frontier  Accesses the conditions needed for the zero bias events being overlayed  The conditions being accessed is always distinct  this completely undermines any benefit of Frontier caching (queries are always unique)  Multiply this by the hundred/thousand jobs in the task, each retrieving distinct conditions  obvious stress on the system April 2014 E.Gallas 6
Query evaluation:  Folder of interest: Identified via Frontier logs:  ATLAS_COOLONL_TDAQ.COMP200_F0063_IOVS  IOV: 1351385600000000000-1351390400000000000  Evaluate this specific query:  RunLB range: 213486 LB 612 – 700 (part of run 213486)  Folder: COOLONL_TDAQ/COMP200 /TDAQ/OLC/BUNCHLUMIS  IOV basis: TIME (not Run/LB)  Channel Count: 4069  channels retrieved generally less … depends on IOV  Payload: Bunch-wise  RunLB (UInt63) Luminosity !  AverageRawInstLum (Float)  BunchRawInstLum (Blob64k) – > LOB !! Large Object !!  Valid (UInt32)  The query retrieves 2583 rows, each including LOBs  number of rows >> number of LBs (~80)  This is the nature of the folder being used April 2014 E.Gallas 7
… more about LOBs … Folder accessed has a LOB payload (Large Object)  Back to COOL ( and via Frontier):  LOB access from COOL  not the same as access to other payload column types  There is some more back/forth communication  Between client (Frontier) and Oracle  Rows are retrieved individually  Always a question: can LOB access be improved ?  Also: is there something about Frontier and LOBs  Something that might cause the Frontier failure ?  It doesn’t happen with single jobs  Only seems to occur when loaded above a certain level  no individual query in these jobs results in data throughput beyond the system capacity  it is somehow triggered by load April 2014 E.Gallas 8
No system has infinite capacity  General ATLAS Database Domain Goal:  Develop and deploy systems which can deliver any data in databases needed by jobs  Even large volumes when needed  In reality: Capacity, bandwidth, etc … are not infinite  So consider ways to moderate requests but still satisfy use cases  This case, bunch-wise luminosity is being retrieved  More channels are being retrieved than being used  Inefficiency in the COOL callback mechanism  Already planned improvement to folders for Run 2  Thanks to Eric, Mika (lumi experts), Andy (MC experts) for critical feedback  I asked in email off- thread … answers on the next slide:  Is bunch-wise luminosity really needed ? April 2014 E.Gallas 9
Is bunch-wise lumi needed ?  Andy:  … not doing anything special for overlay for lumi info … running standard reco … must be the default for standard reco of data as well … What's unusual for overlay is that each event can be from a different LB, whereas for data the events are mostly from the same LB within a job.  Eric:  Yes of course, and this could trigger the IOVDBSvc to constantly reload this information for every event from COOL.  … per - BCID luminosity … used by LAr as a part of standard reco since (early) 2012 … used to predict the LAr noise as a function of position in the bunch train from out of time pileup. I don't know exactly what happens in the overlay job, but presumably it also accesses this information to find the right mix of events. April 2014 E.Gallas 10
Attempt at a summary Aspects of conditions implementation, usage and deployment all seem to conspire … no one smoking gun  DB caching mechanisms:  completely undermined by this pattern of access  Software: using default reconstruction for luminosity  Bunch-wise corrections needed for real data reco (Lar)  NO problems with this in deployment – should not change !  But is the default overkill for zero bias overlay ?  Would the BCID average luminosity suffice (use different folder) ?  Eliminates the need for LOB access in this use case  Conditions DB/COOL and Frontier  COOL side: no obvious culprit … BLOB sizes vary  Frontier: Evaluate cause of failure w/ LOBs under high load  Task deployment:  DB Release option ? any other ideas ?  Please be patient  Must find the best overall long term solution for this case  Without undermining software which is critical for other use cases  Use this use case for studying the bottlenecks April 2014 E.Gallas 11
Recommend
More recommend