Enabling Grids for E-sciencE EGEE Asia Pacific Regional Operation Center Min-Hong Tsai ASGC ISGC 2007 March 29, Taipei http://www.eu-egee.org/ http://www.twgrid.org/aproc/ www.eu-egee.org EGEE-II INFSO-RI-031688
Agenda Enabling Grids for E-sciencE • APROC Introduction • Status • Joining EGEE EGEE-II INFSO-RI-031688 2
APROC Introduction I Enabling Grids for E-sciencE • APROC Mission – Provide deployment support facilitating Grid expansion – Maximize the availability of Grid services • Supports EGEE sites in Asia Pacific since April 2005 – 20 production sites, 8 countries – 9 sites joined EGEE since last ISGC: recently HKU, KISTI – 3 sites in certification process Philippines: Advanced Science and Technology Institute Korea: KONKUK Mongolia: (MAS IPT) Mongolian Academy of Sciences EGEE-II INFSO-RI-031688 3
APROC Services Enabling Grids for E-sciencE • Site Deployment Support – Registration – Installation – Certification Operations Support • – Monitoring, troubleshooting – Problem tracking – Software updates and security coordination – Regional VO services - VOMS and LFC • ASGCCA CA Service – provide certificates for AP EGEE/LCG sites without domestic CA. • EGEE Operations – CIC-on-duty: EGEE global operations – Monitoring tool development: GStat and GGUS Search – TPM: Front line user support (Q4 2006) – OSCT: Incident Response duty (Dec 2006) EGEE-II INFSO-RI-031688 4
APROC Usage Enabling Grids for E-sciencE • New Active VOs: Belle and TWGrid • This year: 200 KSI2K Years Last year: 41 KSI2K Years • EGEE-II INFSO-RI-031688 5
APROC Availability Enabling Grids for E-sciencE JS from LHC OPN Remove SSH Hardware Slow BDII upgrade Failure • Daily snapshots of SAM results of 2.4 2.6 2.7 3.0 region 100% Availability increased to 70-80% range – 80% from 60-70% a half year ago SD 60% CT • CT mostly replica management JL 40% failure JS 20% ER – Sensitive to Information System OK access/performance 0% 2005-04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 – Request that data management clients can failover to secondary BDII • Network Issues – Often the root cause of CT, JL and JS 100 80 – Network congested site set up local top- level BDII 60 40 Increase default update timeout and avail breath time 20 reliab avail 0 2005-04 2005-06 2005-08 2005-10 2005-12 2006-02 2006-04 2006-06 2006-08 2006-10 2006-12 2007-02 EGEE-II INFSO-RI-031688 6
Monitoring and Notification Enabling Grids for E-sciencE • Planned integration of Asset DB • Nagios plugins developed CE LFC VOMS Storage IT services OS Notification via Email • – SMS transmission device currently being tested EGEE-II INFSO-RI-031688 7
Nagios Regional Monitoring Enabling Grids for E-sciencE • Tests run at faster frequency – 5-10 minutes – Faster response to faults Add customized plugins • – Run low level tests for faster isolation of problems – Tests may not be available in global monitoring tools yet – Ability to run tests on the target host via NRPE • Management Interface – Acknowledgement – On demand execution of tests – Historical availability – Test dependencies http://lists.grid.sinica.edu.tw/apwiki/Nagios_monitoring_-_APROC_sites http://lists.grid.sinica.edu.tw/apwiki/Nagios_Plugins_Description EGEE-II INFSO-RI-031688 8
Plans Enabling Grids for E-sciencE • Increase monitoring coverage – Information System – Network performance monitoring available/achievable bandwidth Full mesh monitoring Improve troubleshooting tools • – http://lists.grid.sinica.edu.tw/apwiki/APROC/Troubleshooting_Guides – FAQ system – Service diagnostic scripts • Integration of ticketing system with GGUS • Training – EGEE Induction at GridAsia 2007. June 5, 2007 Singapore. EGEE-II INFSO-RI-031688 9
Joining EGEE Infrastructure Enabling Grids for E-sciencE • Contact APROC • If domestic CA is not available – Register as a ASGCCA RA during ISGC • Dedicated an administrator with Unix experience • Allocate servers – 5: UI, CE, WN, DPM, MON – 3: CE/WN, MON, DPM UI can be installed in user account Consider Virtual Machine for MON Study user guide and installation manual • • Send configuration file to APROC for review before deployment Complete registration and certification process • EGEE-II INFSO-RI-031688 10
Long Term Operations Enabling Grids for E-sciencE • Establish domestic CA if none exists • Increase availability and resource levels • Establish domestic operations structure – Operations procedures – Tools: monitoring and notification, ticketing system – User and administrator support • Training for administrators and users Collaborate with APROC in Regional operations • • Q: Need for regional experimental Grid? EGEE-II INFSO-RI-031688 11
Issues in AsiaPacific Enabling Grids for E-sciencE • No regional projects to promote collaboration in EGEE • Network bandwidth – Low capacity: regional and last mile – Usage based billing Need for training • – Training for trainers – Application Training – E-Learning material • However EGEE already provides – M/W development and integration – Operations structure, coordination and support – Close to 200 user communities EGEE-II INFSO-RI-031688 12
Summary Enabling Grids for E-sciencE • APROC Provides EGEE operations support services to AsiaPacific • EGEE sites in region has grown to 20 sites with utilization of 200 ksi2k years • We have also improved availability but still is significant room for improvement • We look forward to more site joining EGEE in the region and eht possibility for further collaboration – Applications – Operations • Feedback on what we can improve EGEE-II INFSO-RI-031688 13
Thanks You for Your Attention! Enabling Grids for E-sciencE • Questions? – roc@lists.grid.sinica.edu.tw – http://www.twgrid.org/aproc/ • Thanks to efforts from: – T1/APROC Team Jason Shih Dave Wei Felix Lee Joanna Huang Aries Hong Hung-Che Jen Jinny Chien Shu-Ting Liao Yi-Ping Wu Min Tsai EGEE-II INFSO-RI-031688 14
Recommend
More recommend