Computing Operation A stab at improving production through-put

Outlook ● Figure out cleaning procedures, people would like to use DDM for managing dataops space : will require some work , headache and and testing ● Alert system : to get an indication that something is being held for assignment ● Time monitoring : to see the dynamic of workflows passing through the system ● Figure out a better “# of copies” strategy, size ? Estimated CPU ? Priority ? … transfer are parallel = no delays added ? ● Tune parameters to prevent starvation ● We were almost there, but then we added several T2s to the digi-reco pool and things are going very fast through. ● Let it run and have Matteo (co-L3) take care of it every other week 05/19/15 CalTech Group Meeting 2

Overview ● McM is the service for organizing, configuring and book-keeping the production requests from all PAG/POG/DPG https://cms-pdmv.cern.ch/mcm/ https://twiki.cern.ch/twiki/bin/view/CMS/PdmVMcM ● Request Manager is the production service for book-keeping actual production requests https://cmsweb.cern.ch/reqmgr/ ● Why two ? ➢ PREP/reqmgr development went in parallel. Reqmgr aimed at doing prep job ➢ McM rewrote PREP with more integration to reqmgr ➢ Reqmgr is production oriented while mcm is book-keeping and information oriented ➢ Chaining of workflows is not a concept of reqmgr ➢ In a nutshell One does the preparation/book-keeping one does the production ➢ More integration possible (mcm under cmsweb, simplify the interface,...) ● Wmagent is/are ➢ pulling workload from request manager and pushing production jobs to sites ➢ Injecting data to dbs & phedex ● This is not enough to do the job 05/19/15 CalTech Group Meeting 3

What was missing ● What sites to use for what purpose ● How much to queue to sites ● Where to locate input data when needed ● When is the data placed and ready to be used ● Is the production complete and sane ● Where to place the output for users ● All this, or most has been done by hand ● Lots of automation were put in “gen-sim” production (including fastsim) ● Not much was done for “digi-reco” 05/19/15 CalTech Group Meeting 4

Goals and Strategy ● Reduce manual intervention to the minimum (that always fail in commissioning part) ➔ Adopt a set of generic rules that should lead to a stable operation ● Reduce latency for delivery of samples ➔ Automatize all steps for requests not having any issue ● Reduce re-shuffling of priorities ➔ Spread the load using multi-site white list systematically ● Increase throughput ➔ Use as many sites as possible 05/19/15 CalTech Group Meeting 5

Implementation ● Python modules developed from previous scripts https://github.com/CMSCompOps/WmAgentScripts ● Unify handling of all request type to unique software https://github.com/CMSCompOps/WmAgentScripts/tree/master/Unified ● Documentation from the beginning https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompOpsWorkflowL3Responsibilities#Automatic_Assignment_and_Unified ● Monitoring from day one https://cmst2.web.cern.ch/cmst2/unified/ ● Adopt a set of representative statuses (see next slide) ● Use a simple database back-end sqlite file with python sqlalchemy library http://www.sqlalchemy.org/ ● Hourly polling cycles ● Configuration global or by campaign ● All modules can be run by hand with option to push the system when necessary 05/19/15 CalTech Group Meeting 6

From assignment-approved Cloned Modules considered ● injector Input needed ● transferor ● stagor No input staging ● Assignor needed Input available Cloned ● checkor ● closor staged ● cleanor Cloned forget ● rejector Assignment Rejected trouble ● Jean-Roch Vlimant away issues completed Aborted assistance completed close Closed-out done Remove input clean 05/19/15 CalTech Group Meeting 7

Rules ● Look at workflows in term of input needed ➢ if none go with all T2s and T1s ➢ If primary input but no secondary, same ➢ If secondary (PU) go with T1s and strong T2s ● Distribute secondary systematically to all site in whitelist ● Distribute primary inputs to a produce a certain number of copies (3) of the input dataset across all sites in whitelist ➢ Dataset are chopped in 1TB chunk, and these chunks are spread ● Once initiated transfers complete, use also residual blocks of input at other site to inflate site whitelist ● Assign to sites, restricted to where secondary is “wmagent business interlude” ● Once completed, check for processing completion, data injection completion to dbs, phedex, check lumisize, lumi duplications, custodial replication requests ➢ If checking out, output passed on to ddm (when applicable) and back to McM ➢ If not, initiate custodial, wait for data injection, or let operators create recovery workflows ● Two days from completion, clean the input dataset from disk (except for one copy?) ● Clean the output ● Completely from disk if one copy under analysis ops (DDM) ● Keep a full copy on disk if none under analysis ops 05/19/15 CalTech Group Meeting 8

Computing Operation A stab at improving production through-put - PowerPoint PPT Presentation

Computing Operation A stab at improving production through-put Outlook Figure out cleaning procedures, people would like to use DDM for managing dataops space : will require some work , headache and and testing Alert system : to get an

Experimental binding Irene . Liyana . Shi Teng STAB . ACCORDION . SADDLE STITCH STAB BINDING

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Arterial Lines Helping to prevent a stab in the dark Reasons for arterial line insertion

NetXplorace: An Introduction A stab at competition-based activity for soft skill assessment in

E ro Q) lam a. ('f) N ::l Om CD c ~ueEmsland C~rent Strat~gy . ~k~lth pay ran staB

Monetary Policy Committee Assessi essing ng Ri Risks s to Financi ancial al Stab abil

St Stab abiliz ilizin ing g CMFD D wit ith Lin Linear ar Prolongation: : lpC lpCMFD

Improving N Management Improving N Management in Crop Production in Crop Production Paul E.

SIP Operation in SIP Operation in SIP Operation in 2003 2003 2003 Iptel.org builders of

Operation: River Watch Presented By: Alex Mrotek Operation: River Watch Our Purpose: Operation:

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

for innovation improving for innovation improving Design Thinking for innovation improving New

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production

PRODUCTION EXECUTION PRODUCTION EXECUTION Table of contents Course Map Module 1: Production

Materials Production Materials Production Materials Production Materials Production

Materials Production Materials Production Materials Production Materials Production T. G.

Oded Green Going to talk about 2 things A scalable and dynamic data structure for graph

Maximizing Opportunities in Coffee & Cacao in the Americas (MOCCA) MOCCA at a

1 2 3 4 5 ' 0 # % ' + % 6 ( , & ' 7 ) , , & '

Business Valuation in India & Emerging Opportunities Chander Sawhney FCA, ACS, Certified

OBServ Open Library of Pollinator Biodiversity and Ecosystem Services Scenarios

Annual Shareholders Meeting - 2020 There are statements in this Report that are forward

Path to Shareholder Value Creation VIATRIS TSR Execution Plan August 2020 Forward Looking

Biomimetic Manufacturing Tony Schmitz UNC Charlotte Biomimetics uses models, systems, and

Sambuz

Useful Links

Newsletter

Mail Us