SDC DB Support for Distributed Computing PanDAMon Integration in CMS Workshop on Analysis Tools Development May 16 th 2013 Nicolò Magini CERN IT-SDC-OL CERN IT Department CH-1211 Geneva 23 date Author etc Switzerland www.cern.ch/i t
SDC Outline • Status after the prototype • Current status of the testbed deployment • Plans for the integration testbed • Next steps CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 2 2013-05-16 Switzerland www.cern.ch/i t
SDC Monitoring of PanDA jobs • Reminder: “Monitoring of jobs in PanDA ” is more than “ PanDA Monitor” • ATLAS ops and users take advantage of Dashboard (populated from PanDA DB) to complement PanDA Monitor, especially for – Task monitoring – Historical view • Here I’m going to look only at the “ PanDA Monitor” itself, in particular for job debugging CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 3 2013-05-16 Switzerland www.cern.ch/i t
SDC PanDAMon for the prototype • Using ATLAS PanDA Monitor as-is, with minimal updates by V. Fine (ATLAS PanDAMon developer) to make it functional for CMS jobs • Already working successfully by CMS power users in proof of concept phase CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 4 2013-05-16 Switzerland www.cern.ch/i t
SDC PanDAMon for the prototype • viewlogfiles: perform LFN2PFN conversion with PhEDEx datasvc to find log file location (instead of looking up in central ATLAS catalog) – Recently had an issue with logfile retrieval, now fixed by V. Fine CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 5 2013-05-16 Switzerland www.cern.ch/i t
SDC Testbed deployment vocms09 Panda Mon Preslav VM 2 cores, 8 GB prototype (varnish) SLC6 mem, 500 GB LB disk vocms35 Panda Mon Preslav VM 2 cores, 8 GB prototype (varnish) SLC6 mem, 500 GB LB disk vocms33 Panda Mon Preslav 23-JAN-14 24 cores, 32 prototype SLC6 - power GB mem, node LB 2x750 GB disk vocms100 Panda Mon Preslav 27-JAN-14 8 cores, 24 GB spare SLC6 LB mem, 3x1TB ( temporary disk node, this is the ASO spare ) • Additional 2 core, 8 GB VM could be useful as PanDA Mon “development instance” to test deployment and new modules CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 6 2013-05-16 Switzerland www.cern.ch/i t
SDC Testbed status • Basic quattor configuration performed by VOC on all machines following ATLAS templates • Now in contact with ATLAS Distributed Computing operators for software deployment and configuration procedures CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 7 2013-05-16 Switzerland www.cern.ch/i t
SDC PanDAMon testbed goals • During testbed phase – Reproduce working PanDA Monitor setup from prototype phase in CMS instance – Identify “ATLAS” assumptions in monitoring, assess usability for CMS • Some examples found by developers in job debugging views reported in the following • More surely to be found by CMS ops and users, will gather feedback – Produce new PandaMon custom modules for CMS integration for items not covered by current PanDAMon or Dashboard CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 8 2013-05-16 Switzerland www.cern.ch/i t
SDC Navigation • A lot of information on the website is aggregated by cloud • For CMS, more useful to look at sites rather than clouds? CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 9 2013-05-16 Switzerland www.cern.ch/i t
SDC Dataset info • Dataset info linking to DQ2 CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 10 2013-05-16 Switzerland www.cern.ch/i t
SDC Dataset info • Need to update to link to DAS/DBS CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 11 2013-05-16 Switzerland www.cern.ch/i t
SDC Task monitoring • Linked to ATLAS Task Monitoring • Integrate with CMS Task Monitoring CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 12 2013-05-16 Switzerland www.cern.ch/i t
SDC Output file links • Links to log and output file locations working in “ viewlogfile ” page, need to fix in “ findfile “ • (do we want to update output location in PanDA DB from /store/temp/user to /store/user after ASO?) CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 13 2013-05-16 Switzerland www.cern.ch/i t
SDC Error reporting • ASO failures reported to DB and visible in monitoring but not in “Error details” • CMS transformation (job wrapper) exit code visible in PanDAMon, but not detailed error message - includes cmsRun messages • Update links to support mail… CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 14 2013-05-16 Switzerland www.cern.ch/i t
SDC Next steps • Next week: deploy PanDAMon as-is on dev server in testbed setup • When testbed setup is ready, start looking into reported issues • Interact with PanDAMon developers to learn how to integrate new modules if needed by CMS – First session already done • Reproduce deployment on prod server CERN IT Department CH-1211 Geneva 23 Workshop on Analysis Tools Nicolò Magini CERN IT-SDC-OL Author etc 15 2013-05-16 Switzerland www.cern.ch/i t
Recommend
More recommend