Introduction to PC- Cluster Hardware (II) Russian-German School on High Performance Computer Systems, June, 27 th until July, 6 th 2005, Novosibirsk 1. Day, 27 th of June, 2005 HLRS, University of Stuttgart High Performance Computing Center Stuttgart
Outline • I/O – Bus – PCI – PCI-X – PCI-express • Network Interconnects – Ethernet – Myrinet – Quadrics Elan4 – Infiniband • Mass Storage – Hard disks and RAIDs • Cluster File Systems High Performance Computing Center Stuttgart
High Performance Computing Center Stuttgart I/O Bus Layout Example
PCI • Stands for P eriphal C omponent I nterconnect • Standard for I/O interface in PC’s since 1992 • 32 Bit wide • 33.33 MHz clock • Max. 133 MB/s throughput • Extented to 64 Bit, 266 MB/s throughput • Several adapters can share the bus (and the bandwidth) High Performance Computing Center Stuttgart
PCI and PCI-X overview Width Clock Throughput Voltage PCI 32 Bit 33.33 MHz 133 MB/s 3.3 and 5 V PCI 64 Bit 33.33 MHz 266 MB/s 3.3 and 5 V PCI(-X) 66 64 Bit 66.66 MHz 533 MB/s 3.3 V PCI-X 100 64 Bit 100 MHz 800 MB/s 3.3 V PCI-X 133 64 Bit 133 MHz 1066 MB/s 3.3 V (PCI-X 266) 64 Bit 266 MHz 2133 MB/s 3.3 and 1.5V (PCI-X 533) 64 Bit 533 MHz 4266 MB/s 3.3 and 1.5V Also additional features within PCI-X and PCI-X 2.0 High Performance Computing Center Stuttgart
PCI-Express (PCIe) • Formerly known as 3GIO • Bases on PCI programming concepts • Uses serial communication system – Much faster • 2.5 GBit per lane (5 and 10 GBit in future) 8B/10B encoding � max. 250 MB/s per lane • • Standard allows for 1, 2, 4, 8, 12, 16 and 32 lanes • Point-to-point connection Adapter - Chipset • Allows for 95 % of peak rate for large transfers High Performance Computing Center Stuttgart
PCI-Express (II) • Performance Clock Throughput Throughput Bidir. Unidir. 1 lane 2.5 GHz 250 MB/s 500 MB/s 4 lanes 2.5 GHz 1 GB/s 2 GB/s 8 lanes 2.5 GHz 2 GB/s 4 GB/s 16 lanes 2.5 GHz 4 GB/s 8 GB/s 32 lanes 2.5 GHz 8 GB/s 16 GB/s High Performance Computing Center Stuttgart
Outline • I/O – Bus – PCI – PCI-X – PCI-express • Network Interconnects – Ethernet – Myrinet – Quadrics – Infiniband • Mass Storage – Hard disks and RAIDs • Cluster File Systems High Performance Computing Center Stuttgart
Ethernet • Gigabit Ethernet – Standard – Available within neraly every PC – Mostly copper – Cheap – But costs CPU performance • 10 GBit Ethernet – First Adapters available – Copper/fibre – Currently expensive – Eats up to 100% CPU – TCP offloading to decrease CPU load High Performance Computing Center Stuttgart
Myrinet • Prefferd used cluster interconnect for a quite long time • Bandwidth higher than with Gigabit Ethernet • Lower Latency than Ethernet Has a processor on each adapter � overlap of computation and • communication possible • RDMA capability • Link aggrgation possible • Myrinet 10G planned • Only one supplier, Myricom High Performance Computing Center Stuttgart
Quadrics Elan 4 • Interconnect used for high performance clusters • Higher bandwidth than Myrinet • Lower Latency • Has a processor on each adapter � overlap of computation and communication possible • RDMA capability • Link aggrgation possible • Quite expensive • Only one supplier, Quadrics High Performance Computing Center Stuttgart
Infiniband • Specified standard protocol • Interconnect often used today for high performance clusters • Bandwith like Quadrics • Latency like Myrinet • RDMA capability • Link aggrgation possible • Same costs like Myrinet, is planned to be as cheap as GigE • Many vendors High Performance Computing Center Stuttgart
Bandwidth 2000,00 1800,00 1600,00 Myrinet Throughput MB/s 1400,00 Myrinet dual 1200,00 Q Elan 4 1000,00 Elan 4 dual rail 800,00 IB PCI-X 600,00 IB PCI-Express 400,00 200,00 0,00 6 1 6 k k M 1 4 5 4 1 2 6 Message Size High Performance Computing Center Stuttgart
Bandwidth Infiniband System Interface unidirectional bidirectional PCI-X 830 MB/s 900 MB/s PCI Express 930 MB/s 1800 MB/s High Performance Computing Center Stuttgart
Network latency Gigabit Ethernet min. 11 us up to 40 10 G Ethernet ? Myrinet 3.5 to 6 Quadrics Elan 4 2.5 Infiniband PCI-X 4.5 Infiniband PCIe 3.5 High Performance Computing Center Stuttgart
Outline • I/O – Bus – PCI – PCI-X – PCI-express • Network Interconnects – Ethernet – Myrinet – Quadrics Elan4 – Infiniband • Mass Storage – Hard disks and RAIDs • Cluster File Systems High Performance Computing Center Stuttgart
Technologies to connect HDD (I) • IDE/PATA – Bus – max. 2 devices – max. 133 MB/s (ATA/133) – Typically system internal • SATA – Point-to-point – 150 MB/s (300 MB/s SATA 2.0) – Typically system internal High Performance Computing Center Stuttgart
Technologies to connect HDD (II) • SCSI – Bus – max. 7/15 devices – Up to 320 MB/s Throughput per bus – System internal and external • FC (Fibre Channel) – Network (fabric) and Loop – Max 127 devices per loop – Used for storage area networks – 2 Gbit, near future 4 GBit – 8 and 10 GBit planned High Performance Computing Center Stuttgart
Storage Area Network (SAN) • Fabric – HBA’s – Switches – Today typically Fibre Channel, but also IP (iSCSI) SAN High Performance Computing Center Stuttgart
Storage Media • Single Harddisks • RAID Systems (Disk Arrays) – Fault tolerance • RAID 1 • RAID 3 • RAID 5 – Higher Throughput • RAID 0 • RAID 3 • RAID 5 – FC, SCSI and SATA (performance and reliability <-> costs) High Performance Computing Center Stuttgart
File Systems for Cluster High Performance Computing Center Stuttgart
Topologies • Roughly two classes • Shared storage class • Network Centric class – Shared nothing class High Performance Computing Center Stuttgart
Shared Storage Class • Sharing physical devices (disks) – Mainly by using a Fibre Channel Network (SAN) – IP SAN with iSCSI is also possible – SRP within Infiniband – (Using a metadata server to organize disk access) SAN High Performance Computing Center Stuttgart
Shared Storage Class - Implementation • Topologie (CXFS, OpenGFS, SNFS and NEC GFS) CXFS Clients CXFS Server (privat) IP Network SAN High Performance Computing Center Stuttgart
Network Centric Class • The storage is in the network – On storage nodes – May be on all nodes Network High Performance Computing Center Stuttgart
mgr Network High Performance Computing Center Stuttgart PVFS - topology
File Systems for Clusters Distributed File System Symmetric Clustered C C C C C C e.g. NFS/CIFS File System Server is bottleneck e.g. GPFS Lock management Server Is bottleneck Parallel File System C C C SAN based C C C like PFS File Systems Asymmetric san like SANergy MD Server is bottleneck Server Component Component Metadata Server Server Server Server is bottleneck Scale limited
Lustre Solution Asymmetric Cluster File System Scalable MDS handles object allocation, C C C OSTs handle block allocation OST OST MDS Cluster
Necessary features of a Cluster File System • Accessability/Global Namespace • Access method • Authorization (and Accounting), Security • Maturity • Safety, Reliability, Availability • Parallelism • Scalability • Performance • Interfaces for Backup, Archiving and HSM Systems • (Costs) High Performance Computing Center Stuttgart
HLRS File System Benchmark • The disk-I/O Benchmark – Allows throughput measurements for reads and writes • Arbitrary file size • Arbitrary I/O chunk size • Arbitrary number of performing processes – Allows metadata performance measurements • file creation, file status (list), file deletion rate • with an arbitrary number of processes (p-threads or MPI) High Performance Computing Center Stuttgart
Measurement Method • Measurement with disk I/O – HLRS file system benchmark • Measuring of Throughput – depending on the I/O chunk size (1, 2, 4, 8, and 16 MB chunks) – for clients and server • Measuring of metadata performance – Essential for cluster file systems – Measuring of clients and servers – file creation, file status and file deletion rate – with 1, 5, 10 and 20 processes on a client – 50 files per process High Performance Computing Center Stuttgart
Measurement Environment • CXFS – Server: Origin 3000, 8 procs, 6 GByte buffer cache, 2x2Gbit FC – Client: Origin 3000, 20 procs, 20 GByte buffer cache, 2x2Gbit FC – RAID: Data Direct Networks • PVFS – 4 systems IA-64, 2procs, 8 GB Memory, 36 GB local disk each symmetric setup • Lustre – 7 systems IA-32, Pentium III 1 GHz, 18 GB local disk, 1 MDS, 2 OST, 4 clients, Lustre 0.7 (Old version !!!) • Measurements have been performed 2003 High Performance Computing Center Stuttgart
Recommend
More recommend