Enabling Grids for E sciencE Enabling Grids for E-sciencE CREAM CE Certification and CREAM CE Certification and Testing Di Qing SA3 Academia Sinica & CERN Academia Sinica & CERN Geneva, 2008 www eu egee org www.eu-egee.org EGEE and gLite are registered trademarks EGEE-III
Introduction Enabling Grids for E-sciencE • Goals – Verify installation and configuration – Pass normal certification procedures – 5 days unattended continuously stress test � 50 multiple users 50 lti l � Less than 0.5% failures • Patches Patches – 1755, CREAM server – 1790, CREAM client • Test started at the end of May • The test scripts provided by INFN • One wiki page setup for test results – https://twiki.cern.ch/twiki/bin/view/EGEE/CREAMTest • CE checklist – https://twiki.cern.ch/twiki/bin/view/EGEE/CECheckList EGEE-III CREAM CE certification and testing - Qing 2
Testbed setup Enabling Grids for E-sciencE • One separated Torque server – BLParser server installed there by hand • 11 WNs – 110 Virtual CPUs • One UI • One physical CREAM CE for stress testing – 4 2.2 GHz cores – 4GB memory • One CREAM CE for installation and configuration test • One CREAM CE for installation and configuration test EGEE-III CREAM CE certification and testing - Qing 3
Test performed Enabling Grids for E-sciencE • Installation and configuration – Followed the formal installation procedure, specially check the package dependency – Tested different installation scenario – Configured it by YAIM Configured it by YAIM • Basic functionalities – Submission through CLI job status check delegate proxy etc Submission through CLI, job status check, delegate proxy etc. • Stress testing – Submission through CLI g – 9800 jobs per day with 49 users – Jobs accumulated in queues as fast as possible EGEE-III CREAM CE certification and testing - Qing 4
Test results Enabling Grids for E-sciencE • Basic functionality tests passed • Dependency missing on some packages p y g p g – tomcat, mysql-server and mysql-connector-java • Configuration issues – Configurations of BLPaser server and blah – LCAS and LCMAPS generate too many logs – Issue with the upgrade of Glexec I ith th d f Gl � fixed • Other issues Other issues – Need to restart tomcat if CAs are updated � problems in trustmanager – too many files left under the home directories of pool accounts – Authentication fails with new type of VO attributes in VOMS proxy � fixed fi ed EGEE-III CREAM CE certification and testing - Qing 5
Stress test results Enabling Grids for E-sciencE • System load – CPU load is quite low, even less than 1 for most of time, only when submitting massive jobs to cream CE, it can reach 9 – Memory usage is low too, less than 2GB – Disk usage can increase Disk usage can increase � Can be solved by purging jobs and limiting the log level of services • Job submission – Job submission to CREAM CE can fail � Happened in last two days test, even more than 3% of jobs could not be submitted b b itt d – Job success rate � More than 99.5% jobs succeeded in 6 of 10 days test j y � The worst failure rate is about 9.5% • Most of failures only give error message, “blah error” EGEE-III CREAM CE certification and testing - Qing 6
Conclusion Enabling Grids for E-sciencE • Still need more works to reach production quality p q y – We never reached 5 days unattended continuously stress test with less than 0.5% failure rate on certification testbed • Now can be released to PPS for users to test and get N b l d t PPS f t t t d t some experiences after sorting out the installation and configuration issues configuration issues – In principle, it can be done today • The tests have been done only by CREAM CLI y y – Tests through WMS will be done when ICE is ready � A recipe on how to setup a WMS plus ICE is available • A new CREAM CE patch is in preparation for certification EGEE-III CREAM CE certification and testing - Qing 7
Recommend
More recommend