ISGC2004 July 2004, Taipei Grid Datafarm Architecture and Standardization of Grid File System Osamu Tatebe Tatebe Osamu Grid Technology Research Center, AIST Grid Technology Research Center, AIST National Institute of Advanced Industrial Science and Technology
[Background] Petascale Data Intensive Computing High Energy Physics � CERN LHC, KEK Belle � ~MB/collision, 100 collisions/sec Detector for LHCb experiment � ~PB/year � 2000 physicists, 35 countries Detector for ALICE experiment Astronomical Data Analysis � data analysis of the whole data � TB~PB/year/telescope � SUBARU telescope � 10 GB/night, 3 TB/year National Institute of Advanced Industrial Science and Technology
Petascale Data-intensive Computing Requirements Peta/Exabyte scale files, millions of millions of files scale files, millions of millions of files Peta/Exabyte Scalable computational power Scalable computational power > 1TFLOPS, hopefully > 10TFLOPS Scalable parallel I/O throughput Scalable parallel I/O throughput > 100GB/s, hopefully > 1TB/s within a system and between systems Efficiently global sharing with group with group- -oriented oriented Efficiently global sharing authentication and access control authentication and access control Fault Tolerance / Dynamic re / Dynamic re- -configuration configuration Fault Tolerance Resource Management and Scheduling Resource Management and Scheduling System monitoring and administration System monitoring and administration Global Computing Environment Global Computing Environment National Institute of Advanced Industrial Science and Technology
Goal and feature of Grid Datafarm Goal Goal Dependable data sharing among multiple organizations High-speed data access, High-performance data computing Grid Datafarm Datafarm Grid Gfarm File System – Global dependable virtual file system Federates scratch disks in PCs Parallel & distributed data computing Associates Computational Grid with Data Grid Features Features Secured based on Grid Security Infrastructure Scalable depending on data size and usage scenarios Data location transparent data access Automatic and transparent replica selection for fault tolerance High-performance data access and computing by accessing multiple dispersed storages in parallel (file affinity scheduling) National Institute of Advanced Industrial Science and Technology
Grid Datafarm (1): Gfarm file system - World- wide virtual file system [CCGrid 2002] Transparent access to dispersed file data in a Grid Transparent access to dispersed file data in a Grid POSIX I/O APIs Applications can access Gfarm file system without any modification Automatic and transparent replica selection for fault tolerance and access-concentration avoidance /grid Virtual Directory File system metadata Tree ggf jp aist gtrc file1 file2 mapping file2 file1 file3 file4 File replica creation Gfarm File System National Institute of Advanced Industrial Science and Technology
Grid Datafarm (2): High-performance data access and computing support [CCGrid 2002] Parallel and distributed file I/O Do not separate Storage and CPU National Institute of Advanced Industrial Science and Technology
Trans-Pacific Grid Datafarm testbed: Network and cluster configuration [SAINT2004] SuperSINET Trans-Pacific thoretical peak 3.9 Gbps Indiana Gfarm disk capacity 70 TBytes Univ Titech disk read/write 13 GB/sec 147 nodes 16 TBytes 10G SuperSINET 4 GB/sec SC2003 2.4G Univ 2.4G New Phoenix Tsukuba NII 10G York 10 nodes 2.4G(1G) 1 TBytes 10G [950 Mbps] 300 MB/sec Abilene Abilene [500 Mbps] KEK OC-12 ATM 1G 7 nodes 622M 3.7 TBytes Chicago 200 MB/sec 1G 10G APAN APAN/TransPAC Maffin 1G Tokyo XP 1G 32 nodes 5G 2.4G AIST Los Angeles 23.3 TBytes [2.34 Gbps] 10G 2 GB/sec SDSC Tsukuba 16 nodes 16 nodes Kasetsart WAN 11.7 TBytes Univ, 11.7 TBytes Thailand 1 GB/sec 1 GB/sec National Institute of Advanced Industrial Science and Technology
Scientific Application (1) ATLAS Data Production ATLAS Data Production Distribution kit Atlfast – fast simulation Input data stored in Gfarm file system not NFS G4sim – full simulation (Collaboration with ICEPP, KEK) Belle Monte- -Carlo Production Carlo Production Belle Monte 30 TB data needs to be generated 10 M events generated in a few days using a 50-node PC cluster Simulation data will be generated distributedly in tens of universities and KEK (Collaboration with KEK, U-Tokyo) National Institute of Advanced Industrial Science and Technology
Scientific Application (2) Astronomical Object Survey Astronomical Object Survey Data analysis on the whole archive 652 GBytes data observed by SUBARU telescope Large configuration data from Lattice QCD Large configuration data from Lattice QCD Three sets of hundreds of gluon field configurations on a 24^3*48 4-D space-time lattice (3 sets x 364.5 MB x 800 = 854.3 GB) Generated by the CP-PACS parallel computer at Center for Computational Physics, Univ. of Tsukuba (300Gflops x years of CPU time) National Institute of Advanced Industrial Science and Technology
File transfer via standard protocol Multiple GridFTP GridFTP servers for a single servers for a single Gfarm Gfarm file file Multiple system system In collaboration with AIST and Kasetsart University, Thailand National Institute of Advanced Industrial Science and Technology
GridFTP data transfer performance GridFTP download performance 120 Transfer rate [MB/sec] 100 local file system 80 1 server for Gfarm 60 file system 40 2 servers for Gfarm file system 20 0 1 2 3 4 5 6 7 Number of clients Two GridFTP servers can provide almost peak performance (1 Gbps) National Institute of Advanced Industrial Science and Technology
File Sharing by Windows PCs Multiple Samba servers for Gfarm Gfarm file system file system Multiple Samba servers for National Institute of Advanced Industrial Science and Technology
Development Status and Future Plan Gfarm v1 Gfarm v1 Reference implementation of Grid Datafarm architecture Version 1.0.3.1 released on July 5, 2004 (http://datafarm.apgrid.org/) Gfarm v2 Gfarm v2 – – towards *true* global virtual file system towards *true* global virtual file system POSIX compliant - supports read-write mode, advisory file locking, . . . Robustness improved, Security enhanced. Can be substituted for NFS, AFS, . . . Application area Application area Scientific application (High energy physics, Astronomic data analysis, Bioinformatics, Computational Chemistry, Computational Physics, . . .) Business application (Dependable data computing in eGovernment and eCommerce, . . .) Other applications that needs dependable file sharing among several organizations Standardization effort with GGF Grid File System WG (GFS- -WG) WG) Standardization effort with GGF Grid File System WG (GFS Foster (world-wide) storage sharing and integration dependable data sharing, high-performance data access among several organizations National Institute of Advanced Industrial Science and Technology
Global Grid Forum Grid File System WG https://forge.gridforum.org/projects/gfs- -wg/ wg/ https://forge.gridforum.org/projects/gfs National Institute of Advanced Industrial Science and Technology
Charter of GFS-WG Two goals (two documents) Two goals (two documents) File System Directory Services Manage namespace for files, access control, and metadata management Architecture for Grid File System Services Provides functionality of virtual file system in grid environment Facilitates federation and sharing of virtualized data Uses File System Directory Services and standard access protocols National Institute of Advanced Industrial Science and Technology
File System Directory Services Recommendation Document: Recommendation Document: File System Directory Services Specification ” “ File System Directory Services Specification ” “ provides global namespace manages virtual file system directories in association with access permission, and other application-specific metadata http://www.ggf.org/Meetings/GGF11/Documents/F http://www.ggf.org/Meetings/GGF11/Documents/F ile_System_Directory_Service_Specification.pdf ile_System_Directory_Service_Specification.pdf Will be submitted in GGF13 (March 2005) Will be submitted in GGF13 (March 2005) National Institute of Advanced Industrial Science and Technology
Architecture of Grid File System Services Recommendation Document: Recommendation Document: Architecture for Grid File System Services ” “ Architecture for Grid File System Services ” “ provides a composition of services to realize Grid File System using File System Directory Services and other common services https://forge.gridforum.org/projects/gfs- - https://forge.gridforum.org/projects/gfs wg/document/Architecture_Specification_for_Grid_ wg/document/Architecture_Specification_for_Grid_ File_System_Services/en/2 File_System_Services/en/2 Will be submitted in GGF14 (June 2005) Will be submitted in GGF14 (June 2005) National Institute of Advanced Industrial Science and Technology
Recommend
More recommend