A virtualized approach to Mass Storage System Dorin Lobontu, Jos van Wezel and Martin Beitzinger STEINBUCH CENTRE FOR COMPUTING KIT – University of the State of Baden-Wuerttemberg and www.kit.edu National Research Center of the Helmholtz Association
Presentation Overview GridKa Storage Overview TSM as Management System for MSS Tape Library Virtualization with ERMM Tape Reports 2 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
GridKa Storage Overview GridKa Storage System dCache GridFTP optimized tape access LHC-Centers temporary storage analysis Stage-pools NameSpace Operations MSS monte-carlo-simulation Access Controls 1 GB/s Storage Management GridFTP Pool Management an extension of the standard FTP for Grid applications keep file copies for performance improvement authentification over GSI (Grid Security Infrastructure) encryption by SSL Read-Pools partial file transfer automatic TCP optimization parallel and striped transfers 1 GB/s write on tape by TSM 350 MegaByte dCache is a storage management system: GridKa Storage System per second manages a large amount of data 75 fileservers in 3 dCache stores data on distributed media (disk, tape) installations hierarchical storage management 3 tape libraries - 10 PB tape has automatic load balancing Write-Pools capacity 8 PB disk capacity 3 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
MSS Requirements Components Mass Storage System dCache TSS+STA+DM dCache Arch. Library Manager manager dCache TSS+STA+DM dCache TSS+STA+DM Xrootd Arch. LSDF Arch. Manager Manager • MSS has to have a scalable architecture xrootd • MSS has to uncouple tape resources and TSS+STA+DM applications xrootd • MSS has to share the same resources for different TSS+STA+DM LSDF Clients applications • MSS has to provide security mechanisms to prevent/grant applications access to its resources 4 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Presentation Overview GridKa Storage Overview TSM as Management System for MSS Tape Library Virtualization with ERMM Tape Reports 5 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
TSM as Library Manager TSM Server & Library Manager dCachePool tss StorageAgent IBM TS3500 dCachePool tss StorageAgent dCachePool tss Grau ITL-XL StorageAgent dCachePool tss StorageAgent STK SL-8500 on the TSM server one path for every agent and every tape drive must be defined (65 agents x 26 drives = 1690 paths) these paths must be manually maintained 6 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Distributing Data over all Libraries IBM TS3500 TSS TSM dCache DevC-Grau MGMTC1 StorageClass1 StorageClass1 <-> TSM MGMTC1 STGPOOL1 StorageClass2 DevC-IBM MGMTC2 Grau ITL-XL StorageClass2 <-> TSM MGMTC2 STGPOOL2 StorageClassX DevC-STK StorageClassN MGMTCN StorageClassN <-> TSM MGMTCN STK SL-8500 STGPOOLN data is statically distributed by TSS (Tape Staging Server) over the libraries drives load-balancing is not possible a library crash interrupts the processes assigned to this library 7 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Presentation Overview GridKa Storage Overview TSM as Management System for MSS Tape Libraries Virtualization with ERMM Tape Reports 8 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
ERMM as Library Manager TSM ERMM ERMM-Client IBM TS3500 dCache-Pool tss StorageAgent ERMM-Client Grau ITL-XL dCache-Pool tss StorageAgent STK SL-8500 ERMM-Client dCache-Pool tss StorageAgent ERMM : ERMM-Client takes over the entire management of the libraries dCache-Pool coordinates the access to drives and tapes tss logs all activities in an own DB2 database StorageAgent provides a single point of control of tape resources 9 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Distributing Data over all Libraries ERMM dCache TSS TSM GRAU MGMTC1 StorageClass1 StorageClass1 <-> TSM MGMTC1 STGPOOL1 IBM StorageClass2 MGMTC2 DevC-LTO STK StorageClass2 <-> TSM MGMTC2 STGPOOL2 StorageClassX drives group StorageClassN MGMTCN StorageClassN <-> TSM MGMTCN tapes group STGPOOLN • TSM has only one external library • TSM defines only one path for every storage agent to the external library • ERMM maintains dynamically all path from the storage agents to all drives • ERMM spreads the data over all phisycal libraries • ERMM makes dynamic drives load balancing 10 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Presentation Overview GridKa Storage Overview TSM as Management System for MSS Tape Libraries Virtualization with ERMM Tape Reports 11 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Collecting Statistics Data Sense data Sense data request collector ERMM event pipe library manager all drives ERMM drive information all cartridges library information temporary DCA cartridge information drive cartridge access record for every operation archive DB TSM one external library dCache no drive no scrtach Mass Storage System MySql DB 12 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Generate Tape Reports Complete history af Drive Cartridge Access amount of data written/read per mount Statistics generator mout and unmount time number of soft/hard error per mount (perl program) MySql DB Statistics: Web throuput reports per drive, cartridge, library and time unit graphics plot generator number of mounts per drive, cartridge, library and time unit number of concurrent drives in use per library and time unit error reports per drive, cartridge, library and time unit 13 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Activity Reports DriveActivity LibraryActivity VolumeInfo Home 14 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Activity Reports DriveActivity LibraryActivity VolumeInfo Home 15 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Activity Reports DriveActivity LibraryActivity VolumeInfo Home 16 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Error Reports - per Library per month iwr_grau1_lto3(16 drives) iwr_grau1_lto4(8 drives) 17 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Tape Errors Since November 2009 about 100 cartrigdes removed due to increasing correctable errors (~25 LTO3 from a total of ~5000 ~75LTO4 from a total of ~5000) 4 drives(from ~64) replaced due to bad performance and increasing error rate Lost 4 cartrigdes with internal label destroyed TSM: ANR8355E Error reading label for volume … 18 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
19 01.06.2011 Dorin Lobontu Steinbuch Centre for Computing
Recommend
More recommend