wmsmonitor a tool to monitor glite wms lb cluster status
play

WMSMonitor: a tool to monitor gLite WMS/LB cluster status and job - PowerPoint PPT Presentation

Enabling Grids for E sciencE Enabling Grids for E-sciencE WMSMonitor: a tool to monitor gLite WMS/LB cluster status and job workflow Daniele Cesini, Danilo Dongiovanni, Enrico Fattibene INFN-CNAF EGEE08 22-26 Sept. 2008 - Istanbul www eu


  1. Enabling Grids for E sciencE Enabling Grids for E-sciencE WMSMonitor: a tool to monitor gLite WMS/LB cluster status and job workflow Daniele Cesini, Danilo Dongiovanni, Enrico Fattibene INFN-CNAF EGEE08 – 22-26 Sept. 2008 - Istanbul www eu egee org www.eu-egee.org EGEE and gLite are registered trademarks EGEE-III INFSO-RI-222667

  2. Motivation of the work Enabling Grids for E-sciencE • Workload Management System (WMS) and Logging and Bookkeeping (LB) Service have a complex internal structure and knowing their status who and how is using them is challenging knowing their status, who and how is using them is challenging • A site can run many WMS/LB instances • At the same time WMS/LB services are an interesting source of information about Job Lifecycle and resource usage by the VOs y g y • The middleware is not currently providing any monitoring facilities Importance of having an efficient monitoring system aggregating information from internal components and aggregating information from internal components and from various instances WMSMonitor - EGEE08, Istanbul - Turkey 2 EGEE-III INFSO-RI-222667

  3. Target Users Enabling Grids for E-sciencE • WMS/LB administrators to check the cluster status, who is using it and how • WMS developers and advanced users to benchmark the service performance and test its scalability • Resource Center managers that need per-VO aggregated statistics on usage and service availability t d t ti ti d i il bilit • VO managers to obtain aggregated job statistics, e.g. VO t bt i t d j b t ti ti to cross check their monitoring systems WMSMonitor - EGEE08, Istanbul - Turkey 3 EGEE-III INFSO-RI-222667

  4. Web presentation: cluster overview Enabling Grids for E-sciencE WMSMonitor - EGEE08, Istanbul - Turkey 4 EGEE-III INFSO-RI-222667

  5. WMS/LB instance details view Enabling Grids for E-sciencE • Textual boxes report latest series of acquired data o acqu ed da a •Top charts represent status p history of Condor Jobs (left) and WMS internal components components queues (right) • Bottom charts • Bottom charts represent history of job flow rates between components •A CMS use case using collections and BulkMM WMSMonitor - EGEE08, Istanbul - Turkey 5 EGEE-III INFSO-RI-222667

  6. WMS instance details/ Daily Report Enabling Grids for E-sciencE • Daily summary of Job flow through the WMS components, including: g - Resubmission of failed jobs - Number of jobs in N b f j b i successful final state - Number of jobs in aborted final status. status. WMSMonitor - EGEE08, Istanbul - Turkey 6 EGEE-III INFSO-RI-222667

  7. WMS instance details/ Custom Plot Enabling Grids for E-sciencE WMSMonitor - EGEE08, Istanbul - Turkey 7 EGEE-III INFSO-RI-222667

  8. WMS cluster VO stats Enabling Grids for E-sciencE • Statistics on per WMS usage by a usage by a single VO (chart or tabular tabular format). Time interval is configurable WMSMonitor - EGEE08, Istanbul - Turkey 8 EGEE-III INFSO-RI-222667

  9. Working on… Enabling Grids for E-sciencE User level statistics � Dynamical VO discover • • Resource Usage Statistics: – Destination CE – Number of matched CE per job • DB redesign • Distributed instances monitoring WMSMonitor - EGEE08, Istanbul - Turkey 9 EGEE-III INFSO-RI-222667

  10. Enabling Grids for E-sciencE Architecture/Implementation • SNMP based data transport • M SQL b MySQL backend k d • Sensors and data collector written mostly in PYTHON • Web interface developed in PHP Web interface developed in PHP • Open Flash Chart libraries based plots • Periodically sends information to a NAGIOS server which acts as a notification system WMSMonitor - EGEE08, Istanbul - Turkey 10 EGEE-III INFSO-RI-222667

  11. Contacts and Acknowledgments Enabling Grids for E-sciencE • CNAF Production Instance: https://cert-wms-01.cnaf.infn.it:8443/wmsmon/main/main.php p p p • PADOVA/EU-INDIA Production Instance: https://eu-india-01.pd.infn.it:50080/wmsmon/main/main.php • Wiki, Documentation, Download, Support: https://twiki.cnaf.infn.it/cgi-bin/twiki/view/WMSMonitor/WebHome wms-support<at>cnaf.infn.it Special Thanks to all gLite WMS / LB developers WMSMonitor - EGEE08, Istanbul - Turkey 11 EGEE-III INFSO-RI-222667

  12. Enabling Grids for E-sciencE Backup slides WMSMonitor - EGEE08, Istanbul - Turkey 12 EGEE-III INFSO-RI-222667

  13. gLiteWMS / gLiteLB architecture Enabling Grids for E-sciencE Interface Core C Cache of f Grid Logging & Information Bookkeeping Bookkeeping S System t B Back End k E d WMSMonitor - EGEE08, Istanbul - Turkey 13 EGEE-III INFSO-RI-222667

  14. Metrics considered Enabling Grids for E-sciencE • Adopted metrics are of three types: – Grid service metrics: daemons status, number of opened file descriptors entries in component queues number of available CE descriptors, entries in component queues, number of available CE queues, open connections on ports, Condor Job stats – System metrics: CPU load average, % occupacy of disk partitions – Job flow metrics: Job submitted from users, Job Input/Output for each J b fl t i J b b itt d f J b I t/O t t f h component in the WMS, Job Successfully Completed / Aborted WMS Service Component - Daemons Status - File Descriptors - Queues - Open connections on ports Jobs in Jobs out - Available Grid Information - Treated Job status WMSMonitor - EGEE08, Istanbul - Turkey 14 EGEE-III INFSO-RI-222667

Recommend


More recommend