the network monitoring in grid context
play

The network monitoring in grid context Operations Perspective Emir - PowerPoint PPT Presentation

Enabling Grids for E-sciencE The network monitoring in grid context Operations Perspective Emir Imamagic /SRCE EGEE09, Barcelona, Spain www.eu-egee.org


  1. Enabling Grids for E-sciencE The network monitoring in grid context Operations Perspective Emir Imamagic /SRCE EGEE’09, Barcelona, Spain www.eu-egee.org ������������������������ ����������������������������������������

  2. Overview Enabling Grids for E-sciencE • Monitoring In Operations • Service Availability Monitoring – Architecture – Network Monitoring • Performance Monitoring • Possible Future Work • Possible Future Work • Conclusion EGEE-III INFSO-RI-222667 2

  3. Enabling Grids for E-sciencE Monitoring In Operations • Provide means to site and grid operators to monitor their resources • Focus on improving availability and reliability by spotting problems and issuing alarms • Define procedures for escalation and resolution of • Define procedures for escalation and resolution of more complex problems EGEE-III INFSO-RI-222667 3

  4. Service Availability Monitoring Enabling Grids for E-sciencE Schema provided by Karolis Eigelis EGEE-III INFSO-RI-222667 4

  5. The New Architecture Enabling Grids for E-sciencE Schema provided by Karolis Eigelis EGEE-III INFSO-RI-222667 5

  6. The New Architecture Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 6

  7. Which Other Systems Are Used? Enabling Grids for E-sciencE • Database components – Aggregated Topology Provider (ATP) – Metric Description Database (MDDB) • Operations services – GOCDB, ENOC, OIM • Grid information services – BDII EGEE-III INFSO-RI-222667 7

  8. What Do We Check? Enabling Grids for E-sciencE • SAM probes – various grid services (CE, WN and SRM) • WLCG probes (SRCE, CERN) – various grid services (e.g. GridFTP, LFC) • BDII & Gstat probes – validation of content in information system BDII • Nagios native probes – standard services (e.g. web, ftp, ssh servers) EGEE-III INFSO-RI-222667 8

  9. Network Monitoring Enabling Grids for E-sciencE • Collaboration with ENOC – integration of ENOC Downcollector features into SAM • Added lightweight service checks – based on nmap – executed with high frequency – used for masking other alarms EGEE-III INFSO-RI-222667 9

  10. Network Monitoring Enabling Grids for E-sciencE • Integrated network topology data – ENOC provided static list of border routers for all sites – Nagios supports network hierarchy – in case of router failure site resources flagged as unreachable EGEE-III INFSO-RI-222667 10

  11. Performance Monitoring - Grid Enabling Grids for E-sciencE • Several grid systems gather performance – BDII, GridFTP transfers – Dashboards and VO-specific systems • Some raise alarms based on performance data EGEE-III INFSO-RI-222667 11

  12. Performance Monitoring - Network Enabling Grids for E-sciencE • Majority of sites are without dedicated links – without SLAs what should we alarm on? • Severe degradation of network performance – e.g. failure of primary link – interpreted as service unavailability EGEE-III INFSO-RI-222667 12

  13. Possible Future Work – Availability Monitoring Enabling Grids for E-sciencE • Lightweight checks improvement? • Dynamic network topology info? • Better integration with networking monitoring systems? systems? • End-to-end monitoring between sites? EGEE-III INFSO-RI-222667 13

  14. Possible Future Work – Performance Monitoring Enabling Grids for E-sciencE • Dynamic performance testing – to distinguish between failure and severe degradation – interesting for grid services (job & file transfer management) • With dedicated links – monitoring network parameters – raising alarms in case of degradation • Monitoring dynamic link reservation EGEE-III INFSO-RI-222667 14

  15. Conclusion Enabling Grids for E-sciencE • Multilevel monitoring provide the means for administrators to better monitor their services • Integration with existing components to automate operations of monitoring instances • Network monitoring mainly focused on end-to-end links EGEE-III INFSO-RI-222667 15

  16. Links Enabling Grids for E-sciencE • OAT web page https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III • OAT Multi-level monitoring architecture https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMon itoringOverview EGEE-III INFSO-RI-222667 16

  17. Enabling Grids for E-sciencE Thank You! Questions? EGEE-III INFSO-RI-222667 17

Recommend


More recommend