development of e science application framework
play

Development of e-Science Application Framework Eric Yen, Simon C. - PowerPoint PPT Presentation

Development of e-Science Application Framework Eric Yen, Simon C. Lin & Hurng-Chun Lee ASGC Academia Sinica Taiwan 24 Jan. 2006 LCG and EGEE Grid Sites in the LCG Asia-Pacific Region 4 LCG sites in Taiwan IHEP Beijing IHEP Beijing


  1. Development of e-Science Application Framework Eric Yen, Simon C. Lin & Hurng-Chun Lee ASGC Academia Sinica Taiwan 24 Jan. 2006

  2. LCG and EGEE Grid Sites in the LCG Asia-Pacific Region 4 LCG sites in Taiwan IHEP Beijing IHEP Beijing 12 LCG sites in Asia/ PAEC PAEC KEK Tsukuba KEK Tsukuba Pacific NCP NCP ICEPP Tokyo ICEPP Tokyo KNU Daegu KNU Daegu Islamabad Islamabad Academia Sinica Grid VECC Kolkata VECC Kolkata Taipei - ASGC, IPAS Taipei - ASGC, IPAS Computing Centre NTU, NCU NTU, NCU Tata Inst. Tata Inst. -- Tier-1 Centre for the LHC Mumbai Mumbai Computing Grid (LCG) GOG Singapore GOG Singapore -- Asian Operations Centre for LCG and EGEE -- Coordinator of the Asia/Pacific Federation in EGEE Univ. Melbourne Univ. Melbourne LCG LCG site site other site other site  AP Federation now shares the e-Infrastructure with WLCG last update 01/11/06 04:29 AM

  3. Perspectives of ASGC, as Tier-1  Help trouble shooting of Tier-2s on various of services/functionality before involving in SC (IS, SRM, SE etc)  Reaching persistent data transfer rate  Increasing reliability and availability of T2 computing facilities, sort of stress testing  Close communication with T2s, T0 and T1s  Gaining experiences before LHC experiments begins SC Workshop, Taipei, 30-31 Oct 2005 Jason Shih, ASGC

  4. Plan of AP Federation  VO Services: deployed from April 2005 in Taiwan (APROC)  LCG: ATLAS, CMS  BioInformatics, BioMed  Geant4  APeSci : for collaboration general e-Science services in Asia Pacific Areas  APDG: for testing and testbed only  TWGRID: established for local services in Taiwan  Potential Applications  LCG, Belle, nano, biomed, digital archive, earthquake, GeoGrid, astronomy, Atmospheric Science 2005/12/16 Simon C. Lin / ASGC

  5. Plans for T1/T2  T1-T2 test plan  what services/functionality need to test  recommendation for T2 sites, checklist  What have to be done before join SC  Communication methods, and how to improve if needed  Scheduling of the plans, candidates of sites  Timeline for the testing  SRM + FTS functionality testing  Network performance tuning (jumbo framing!?)  T1 expansion plan  Computing power/storage  storage management, e.g. CASTOR2 + SRM  Network improvement SC Workshop, Taipei, 30-31 Oct 2005 Jason Shih, ASGC

  6. Enabling Grids for E-SciencE Enabling Grids for E-sciencE A Worldwide Science Grid • >200 sites • >15 000 CPUs (with peaks >20 000 CPUs) • ~14 000 jobs successfully completed per day • 20 Virtual Organisations • >800 registered users, representing 1000s of scientists INFSO-RI-508833

  7. EGEE Asia Pacific Services by Taiwan  Production CA Services  AP CIC/ROC  VO Support  Pre-production site  User Support  MW and technology development  Application Development  Education and Training  Promotion and Outreach  Scientific Linux Mirroring and Services

  8. APROC  Taiwan acts as Asia Pacific CIC and ROC in EGEE  APROC established in early 2005  Supports EGEE sites in Asia Pacific  Australia, Japan, India, Korea, Singapore, Taiwan  8 sites, 6 countries  Provides Global and Regional services

  9. APROC EGEE Wide Services  GStat  Monitoring Application to check health of Grid Information System  http://goc.grid.sinica.edu.tw/gstat/  GGUS Search  Performs Google search targeted at key Grid knowledge bases  GOCWiki  Hosted Wiki for User and Operations related FAQ and Guides

  10. APROC EGEE Regional Services  Site Registration and Certification  Monitoring and Daily Operations  Problem diagnosis, tracking and troubleshooting  Middleware certification test-bed  New release testing, supplemental documentation  Release support and coordination  updates, upgrades and installation  Security coordination  With Operational Security Coordination Team (OSCT)  VO Services  CA for collaborators in Asia-Pacific  VOMS, LFC, RB, BDII support for new VO in region  Support Services  Web portal and documentation  User and Operations Ticketing System

  11. Education and Training Event Date Attendant Venue China Grid LCG Training 16-18 May 2004 40 Beijing, China ISGC 2004 Tutorial 26 July 2004 50 AS, Taiwan Grid Workshop 16-18 Aug. 2004 50 Shang-Dong, China NTHU 22-23 Dec. 2004 110 Shin-Chu, Taiwan NCKU 9-10 Mar. 2005 80 Tainan, Taiwan ISGC 2005 Tutorial 25 Apr. 2005 80 AS, Taiwan Tung-Hai Univ. June 2005 100 Tai-chung, Taiwan EGEE Workshop Aug. 2005 80 20th APAN, Taiwan Note: gLite and the development of EGEE were introduced in all the events which are run by ASGC

  12. Service Challenge 24 Jan. 2006 21st APAN, Japan

  13. ASGC Usage  GOC APEL Accounting  excluding non-LHC VO (Biomed) 2005/11/20 Min-Hong Tsai / ASGC 13

  14. SRM Services  Increase to four pool nodes for more parallel Gridftp transfers  SRMCP’s stream and TCP buffer option did not function  Work around by configuring SRM server  And transfer rate can reach 80MB/s, the average is 50MB/s. 2005/11/20 Min-Hong Tsai / ASGC 14

  15. Atlas SC3 DDM - ASGC VOBOX Average throughput per day Total cumulative data (01/18, 2006) transferred (01/18, 2006) Latest update can be found at: http://atlas-ddm-monitoring.web.cern.ch/atlas-ddm-monitoring/all.php

  16. Tasks/deliverables Batch services   Deliver prod quality batch services  Frontline consultancy and support for batch scheduler  Customized tool suites for secure and consistent management Manage hierarchical storage   Prod quality DM services, including planning, procurement, and operation both for SW and HW  Meet data transfer rate requirement declared in MoU  OP experiences/procedures sharing with Tier-1s, 2s  HA + L/B Middleware support   Frontline consultancy and support for other tiers in tweaking configurations, trouble shooting, and maintenance procedures.  Certification testing for pre-released tag of LCG  Installation guide/note if lack from official release  Training courses

  17. ARDA  Goal: Coordinate to prototype distributed analysis systems for the LHC experiments using a grid.  ARDA-ASGC Collaboration: since mid 2003  Building push/pull model prototype(2003)  Integrate Atlas/LHCb analysis tool to gLite(2004)  Provide first integration testing and usage document on Atlas tools:Dial (2004)  CMS monitoring system development (2005)  Monitoring system to integrate RGMA & MonaLisa  ARDA/CMS Analysis Prototype: Dashboard  ARDA Taiwan Team: http://lcg.web.cern.ch/LCG/activities/arda/team.html  4 FTEs participated: 2 FTEs at CERN, the other 2 are in Taiwan

  18. mpiBLAST-g2 24 Jan. 2006 21st APAN, Japan

  19. mpiBLAST-g2 ASGC, Taiwan and PRAGMA http://bits.sinica.edu.tw/mpiBlast/index_en.php A GT2-enabled parallel BLAST runs on Grid  GT2 GASSCOPY API  MPICH-g2  The enhancement from mpiBLAST by ASGC  Performing cross cluster scheme of job • execution Performing remote database sharing • Help Tools for • – database replication – automatic resource specification and job submission (with static resource table) – multi-query job splitting and result merging Close link with mpiBLAST development • team – The new patches of mpiBLAST can be quickly applied in mpiBLAST -g2

  20. SC2004 mpiBLAST-g2 demonstration KISTI 28 April, 2005 ISGC 2005, Taiwan

  21. mpiBLAST-g2 current deployment -- From PRAGMA GOC http://pragma-goc.rocksclusters.org

  22. mpiBLAST-g2 Performance Evaluation (perfect case) Elapsed time Speedu p — Searching + Merging Database: est_human ~ 3.5 GBytes — BioSeq fetching Queries: 441 test sequences ~ 300 KBytes — Overall • Overall speedup is approximately linear

  23. mpiBLAST-g2 Performance Evaluation (worse case) Elapsed time Speedu p Database: drosophila NT ~ 122 MBytes — Searching + Merging Queries: 441 test sequences ~ 300 KBytes — BioSeq fetching — Overall • The overall speedup is limited by the unscalable BioSeq fetching

  24. Summary Two grid-enabled BLAST implementations (mpiBLAST -g2 and DIANE-  BLAST) were introduced for efficient handling the BLAST jobs on the Grid Both implementations are based on the Master-Worker model for  distributing BLAST jobs on the Grid The mpiBLAST -g2 has good scalability and speedup in some cases   Require the fault-tolerance MPI implementation for error recovery  In the unscalable cases, BioSeq fetching is the bottleneck DIANE-BLAST provides flexible mechanism for error recovery   Any master-worker workflow can be easily plugged into this framework  The job thread control should be improved to achieving the good performance and scalability

  25. 25

  26. DataGrid for Digital Archives 24 Jan. 2006 21st APAN, Japan

  27. Data Grid for Digital Archives 27

Recommend


More recommend