overload of frontier lpad by mc overlay
play

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) - PowerPoint PPT Presentation

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15, 2014 Caveat these are bits pieces Overview which I am aware of and which seem relevant to the discussion Limited time to collect


  1. Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15, 2014

  2. Caveat … these are bits pieces Overview  which I am aware of and  which seem relevant to the discussion  Limited time to collect metrics  This is an open discussion !  Corrections, additions: welcome !  Problem:  MC Overlay jobs cause Frontier overload on the grid  Aspects of the issue:  Conditions aspects of MC Overlay jobs  Conditions folders of interest  Conditions DB & COOL  Conditions deployment on the grid (Frontier & DB Releases)  MC Overlay Task deployment on the grid  Reconstruction Software  how these aspects, in combination, result in overload April 2014 E.Gallas 2

  3. MC Overlay jobs  Overlay real “zero bias” events on simulated events  An exception to the norm wrt Conditions  Access data from multiple Conditions Instances:  COMP200 (Run 1 real data conditions)  OFLP200 (MC conditions)  This is not thought to contribute to the problem  What seems exceptional and notable is that  Conditions data volume needed by each job to reconstruct these events  Is much greater (10-200x) conditions volume of typical reco  Estimates vary …  Is greater than event data volume of each job  Event volume: A few hundred events ? x 1.5MB  the metadata is larger than the data itself … April 2014 E.Gallas 3

  4. Conditions deployment on the grid 2 modes of access (direct Oracle access demoted)   DB Release files or Frontier MC Overlay can’t use just the default DB Release   It doesn’t contain real data conditions  So it is using Frontier  Alastair: An unusual aspect to the overload is that it is actually bringing down Frontier servers: reboot required !   try to understand cause of comatosis (more later on this) Could we use DB Release (for some/all conditions)? From Misha:   Yes, sure it's possible to make DBRelease for these data  DB access also could be mixed in any way (frontier + DB Release)  Finding the folder list: main role of DBRelease-on-Demand system  it's release and jobOption specific.  Problem: is how distribute it.  Before HOTDISK was used  Now it should be CVMFS: requires new approach.  DB Release size … can’t be known without studying Alastair:   CVMFS likes small files, not large ones … April 2014 E.Gallas 4

  5. Conditions DB and Athena IOVDbSvc  IOVDbSvc:  gets conditions in a time window wider than the actual request  So each conditions retrieval contains probably a bit more data than might be needed by the job  This mechanism is generally very effective in reducing subsequent queries in related time windows  Unsure if this mechanism is helping here  It depends on if subsequent zero bias events in the same job are in the Run/LB range of the retrieved conditions April 2014 E.Gallas 5

  6. MC Overlay task deployment  Assumptions about how these tasks are deployed:  Related jobs are (same MC process id) deployed to specific sites (or clouds) and each requires a unique set of zero bias events over all the jobs  Each of the “related” jobs  Are in clouds using the same Squids and/or Frontier  Accesses the conditions needed for the zero bias events being overlayed  The conditions being accessed is always distinct  this completely undermines any benefit of Frontier caching (queries are always unique)  Multiply this by the hundred/thousand jobs in the task, each retrieving distinct conditions  obvious stress on the system April 2014 E.Gallas 6

  7. Query evaluation:  Folder of interest: Identified via Frontier logs:  ATLAS_COOLONL_TDAQ.COMP200_F0063_IOVS  IOV: 1351385600000000000-1351390400000000000  Evaluate this specific query:  RunLB range: 213486 LB 612 – 700 (part of run 213486)  Folder: COOLONL_TDAQ/COMP200 /TDAQ/OLC/BUNCHLUMIS  IOV basis: TIME (not Run/LB)  Channel Count: 4069  channels retrieved generally less … depends on IOV  Payload: Bunch-wise  RunLB (UInt63) Luminosity !  AverageRawInstLum (Float)  BunchRawInstLum (Blob64k) – > LOB !! Large Object !!  Valid (UInt32)  The query retrieves 2583 rows, each including LOBs  number of rows >> number of LBs (~80)  This is the nature of the folder being used April 2014 E.Gallas 7

  8. … more about LOBs … Folder accessed has a LOB payload (Large Object)  Back to COOL ( and via Frontier):  LOB access from COOL  not the same as access to other payload column types  There is some more back/forth communication  Between client (Frontier) and Oracle  Rows are retrieved individually  Always a question: can LOB access be improved ?  Also: is there something about Frontier and LOBs  Something that might cause the Frontier failure ?  It doesn’t happen with single jobs  Only seems to occur when loaded above a certain level  no individual query in these jobs results in data throughput beyond the system capacity  it is somehow triggered by load April 2014 E.Gallas 8

  9. No system has infinite capacity  General ATLAS Database Domain Goal:  Develop and deploy systems which can deliver any data in databases needed by jobs  Even large volumes when needed  In reality: Capacity, bandwidth, etc … are not infinite  So consider ways to moderate requests but still satisfy use cases  This case, bunch-wise luminosity is being retrieved  More channels are being retrieved than being used  Inefficiency in the COOL callback mechanism  Already planned improvement to folders for Run 2  Thanks to Eric, Mika (lumi experts), Andy (MC experts) for critical feedback  I asked in email off- thread … answers on the next slide:  Is bunch-wise luminosity really needed ? April 2014 E.Gallas 9

  10. Is bunch-wise lumi needed ?  Andy:  … not doing anything special for overlay for lumi info … running standard reco … must be the default for standard reco of data as well … What's unusual for overlay is that each event can be from a different LB, whereas for data the events are mostly from the same LB within a job.  Eric:  Yes of course, and this could trigger the IOVDBSvc to constantly reload this information for every event from COOL.  … per - BCID luminosity … used by LAr as a part of standard reco since (early) 2012 … used to predict the LAr noise as a function of position in the bunch train from out of time pileup. I don't know exactly what happens in the overlay job, but presumably it also accesses this information to find the right mix of events. April 2014 E.Gallas 10

  11. Attempt at a summary Aspects of conditions implementation, usage and deployment all seem to conspire … no one smoking gun  DB caching mechanisms:  completely undermined by this pattern of access  Software: using default reconstruction for luminosity  Bunch-wise corrections needed for real data reco (Lar)  NO problems with this in deployment – should not change !  But is the default overkill for zero bias overlay ?  Would the BCID average luminosity suffice (use different folder) ?  Eliminates the need for LOB access in this use case  Conditions DB/COOL and Frontier  COOL side: no obvious culprit … BLOB sizes vary  Frontier: Evaluate cause of failure w/ LOBs under high load  Task deployment:  DB Release option ? any other ideas ?  Please be patient  Must find the best overall long term solution for this case  Without undermining software which is critical for other use cases  Use this use case for studying the bottlenecks April 2014 E.Gallas 11

Recommend


More recommend