db es
play

DB ES Experiment Support Experience in Grid Site Testing for HEP - PowerPoint PPT Presentation

DB ES Experiment Support Experience in Grid Site Testing for HEP with HammerCloud Ramn Medrano Llamas, Daniel van der Ster, Johannes Elmsheuser, Federica Legger, Mario Ubeda and Andrea Sciaba CERN IT Department CH-1211 Geneva 23


  1. DB ES Experiment Support Experience in Grid Site Testing for HEP with HammerCloud Ramón Medrano Llamas, Daniel van der Ster, Johannes Elmsheuser, Federica Legger, Mario Ubeda and Andrea Sciaba CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  2. ES Agenda ■ Grid functional testing. ■ Introduction to HammerCloud. ■ HammerCloud as a common solution. ■ HammerCloud and the HEP VOs. ■ Automatic Site Exclusion. ■ Results and correlations. ■ Scaling problems. ■ Conclusions. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  3. ES Grid functional testing ■ Testing is crucial in Grid operations: ○ Software development cycle (unit testing). ○ Release testing ( nightlies ). ○ Operations testing ( DevOps ). ○ Functional testing. ○ Site monitoring. ■ Functional testing: ○ Testing of "live systems". ○ Improves systems while in production. ○ Automates shifters' operations. ■ Testing of cloud environments: ○ R&D project at CERN IT-ES (check Fernando CERN IT Department Barreiro's and Bob Jones' talks) . CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  4. ES Introduction to HammerCloud ■ What HammerCloud (HC) is: ○ Functional testing engine for Grid sites. ○ Behaves like an end-user. ○ On demand tests. ○ Continuous testing. ■ What HC is not: ○ A software integration platform. ○ A software deployment platform. ○ A software unit testing platform. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  5. ES HC as common solution ■ HC was originated by ATLAS. ■ Other experiments showed interest: ○ CMS, to replace JobRobot. ○ LHC b . ■ Having three codebases is not an option. ■ On 2010, HC was rewritten: ○ As a VO-independent tool, ○ for Grid testing, ○ but not only for HEP. ○ To have a VO tested, just a simple HC plugin is needed. ■ Since 2011 HC v4 is running on all VOs. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  6. ES HC as common solution CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  7. ES HC and ATLAS ■ ATLAS was the originator of HC. ■ Still, is the VO with heaviest use, ■ and the most feature requester. ■ HC is critical for ATLAS Computing (ADC): ○ For operations, ○ for software validation on Grid sites. ■ HC performs the site autoexclusion: ○ Helps shifters by reducing operations. ○ Provides early detection of errors. ■ HC-ATLAS ran 5.000.000+ jobs in 2012. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  8. ES HC and ATLAS ■ HC is also used for software validation: ○ The Athena nightlies are tested with HC. ○ Each day, a functional test validates: Daily software developments, ■ in real Grid environments. ■ ■ This validation reduces errors: ○ While distributing software to sites, ○ reduces the 'hotfixes' for non tested bugs, ○ speed-ups software adoption. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  9. ES HC and CMS ■ CMS swaped JobRobot for HC. ■ HC is now critical in CMS opertions, ■ in site readiness: ○ HC provides the overview of site status. ■ and in software validation: ○ CRAB3 is going to be tested using HC. ■ HC-CMS ran 1.400.000+ jobs in 2012. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  10. ES HC and LHC b ■ LHC b is moving fast into HC. ■ Development ramping up, ■ integrating into computing operations. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  11. ES Automatic Site Exclusion ■ Automating operations through HC. ■ Sites that have problems are excluded: ○ No human operation (no delay). ○ Based on consistent policies. ○ User jobs go to other sites that are working: Users' jobs do not fail. ■ ○ Whitelisted when problems are solved. ■ Continuous testing is crucial: ○ Reduces the time to detect a problem, ○ and the time to reuse the site after. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  12. ES Automatic Site Exclusion ■ Results are promising: ○ Increased the Grid efficiency with simple policies. ○ Optimized policies reduced app fails 50%. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  13. ES Results and correlations ■ HC job errors for different VOs in 2012: CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  14. ES Results and correlations ■ The scale of the VOs is not comparable: ○ ATLAS is testing production queues also. ○ CMS tests are by region with one kind of job, ○ while ATLAS tests 9 kinds of jobs. ■ The final question: ○ Can we learn from one VO to another? CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  15. ES Scaling problems ■ HC users are growing fast: ○ Predicted ~30.000.000 jobs for 2012. ○ This generates ~3.000.000.000 job metrics. ■ For this data: ○ Analysis and correlation is harder. ○ Relational databases get slower. ○ Data curation is out of scale. ■ First solutions: ○ High-scalable MySQL service from CERN IT. ○ Partitioning and sharding of data. ○ Improvement of the schema. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  16. ES Scaling problems ■ VOs are requesting more tests: ○ Very good news! ■ The current estimations will grow. ■ Further solutions: ○ NoSQL data stores? ○ Hadoop and HDFS for analysis and curation? ○ Summaries and archival? CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  17. ES Conclusions ■ HC has become the default Grid testing: ○ For availability monitoring, ○ for grid site optimization and validation, ○ for testing pre-release testing, ○ for critical operations monitoring. ■ HC is common solution across VOs: ○ Reduced maintenance, ○ common features, problems, requests. CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

  18. DB ES Experiment Support Experience in Grid Site Testing for HEP with HammerCloud Q&A CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t

Recommend


More recommend