CHiC 2007 Frank Mietke Design and Evaluation of a 2048 Introduction Core Cluster System The CHiC Project Benchmarks Summary Frank Mietke , Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December 12, 2007
Outline CHiC 2007 Frank Mietke Introduction 1 Introduction The CHiC Project Benchmarks Summary The CHiC Project 2 Benchmarks 3 Summary 4
Supercomputing in General CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary Clusters are dominant (81.2%) Power Consumption problematic (Green500)
Supercomputing at Chemnitz CHiC 2007 Since 1994 Frank Mietke Introduction Growing User Community The CHiC Project Benchmarks Summary Parsytec – 20 GFlop/s CLiC – 221.6 GFlop/s
Cluster Design CHiC 2007 Frank Mietke 512 compute nodes 12 graphics nodes ... ... Introduction �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� The CHiC Project Benchmarks InfiniBand Fabric Summary 2 cables each �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� � � compute node (w/o hdd) max. 8 cables max. 8 cables � � Campus network � � (Redundancy) login node (with hdd) access gateway � � � � management node (with hdd) � � storage complex � � IO node (w/o hdd) 6 cables each � � � � graphics node (with hdd) � � InfiniBand cable GigaBit−Ethernet cable Campus network
Network Design CHiC 2007 Frank Mietke ��� ��� GbE module with 6 ports ��� ��� Campus network ��� ��� Firewall module ��� ��� ��� ��� ��� ��� Introduction ��� ��� ��� ��� GigaBit−Ethernet cable Cisco Cisco 6500 6500 InfiniBand cable ��� ��� ��� ��� The CHiC Project ��� ��� ��� ��� ��� ��� 24−Port InfiniBand switch InfiniBand / GbE gateway Benchmarks ��� ��� ��� ��� 6 cables each Summary Campus network ��������������������� ��������������������� ��������������������� ��������������������� access ��������������������� ��������������������� InfiniBand Fabric InfiniBand− InfiniBand− Switch Switch (288−Port) (288−Port) 6 cables each ... ����� ����� ����� ����� ����� ����� ����� ����� 12 cables each
Storage Design CHiC 2007 Frank Mietke Introduction InfiniBand The CHiC Project Benchmarks Summary MDS OSS OSS ����� ����� IBM x3455 ����� ����� ����� ����� RAID−Controller RAID−Controller IBM x3455 ����� ����� SAS 5x
The CHiC – Top500 CHiC 2007 Rank 80 (Nov. 2006 - inofficial) Frank Mietke Rank 117 (Jun. 2007) Rank 237 (Nov. 2007) Introduction The CHiC Project CHiC – 8.21 TFlop/s Benchmarks Summary
But we provide more ... CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary 12+ TFlops (Single Precision) www.gpgpu.org
Experiences CHiC 2007 Hardware Frank Mietke Very good Hardware Reliability (so far) Introduction IB-Eth-Gateway or Fabric Inconsistencies (Load The CHiC Project Sit.) Benchmarks Complex IB Fabric (3,5,7-stage CLOS) Summary RAID-Controller in Storage Hardware (Config. Issues) Software Lustre-1.6b7 and Lustre-1.6.3 (Bugs) OFED-1.1 and IPoIB Failover MPI Start-Up (Failed Processes and Scalability) TORQUE and ulimit Values
STREAM – Triad a [ i ] = b [ i ] + q · c [ i ] CHiC 2007 peak floating ops / s Frank Mietke balance = sustained memory ops / s Introduction The CHiC Project pathscale-3.0 Benchmarks Opteron Woodcrest Summary BW (MB/s) Balance BW (MB/s) Balance 2 Ds 5655.7 7.3 3672.8 17.4 1 T 4 Ds 5572.9 7.4 3896.4 16.4 8 Ds 5769.8 7.2 3959.6 16.2 2 Ds 6056.0 13.7 3967.9 32.2 2 Ts 4 Ds 6114.7 13.6 5061.7 25.3 8 Ds 6520.9 12.7 5876.6 21.8 2 Ds 5025.1 33.1 3949.3 64.8 4 Ts 4 Ds 11527.4 14.4 5111.2 50.1 8 Ds 12796.4 13.0 5653.6 45.3
HPL CHiC 2007 8.21 TFlop/s (76%) measured (2080 Cores) Frank Mietke Introduction The CHiC Project HPL Results for 4 Nodes (16 Cores) 72 Benchmarks OpenIB_4DIMMS OpenIB_8DIMMS TCP_4DIMMS Summary 70 Floating Point Performance (Gflop/s) 68 66 64 62 60 58 1_16 2_8 4_4 P_Q Grid
IOR CHiC 2007 20 Object Storage Targets (RAID-5 a 8 HDDs) Frank Mietke Introduction 3.2 GiB/s Write Performance The CHiC Project Benchmarks Summary 2.6 GiB/s Read Performance IOR Results for 1MB Transfer size READ IOR Results for 1MB Transfer size WRITE 3000 3500 3000 2500 Aggregate I/O Throughput (MiB/s) Aggregate I/O Throughput (MiB/s) 2500 2000 2000 1500 1500 1000 1000 500 500 fpp_1OST fpp_1OST fpp_20OST fpp_20OST seg_20OST seg_20OST str_20OST str_20OST 0 0 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 No. of Nodes No. of Nodes
Latest IOZone Results CHiC 2007 18 Object Storage Targets (RAID-6) Frank Mietke 9 RAID-6 with 10 HDDs Introduction 9 RAID-6 with 6 HDDs The CHiC Project Benchmarks Summary 120 Clients (Lustre-1.6.3) 5GB Data File each 3.7GiB/s Read Performance 3.2GiB/s Write Performance
Application Benchmarks CHiC 2007 ABINIT: Frank Mietke AMD Cluster Intel Cluster Time in s 1,384.6 1,454.2 Introduction The CHiC Project NAMD: Benchmarks Summary NAMD Results for 16 Nodes 110 Opteron System Woodcrest System 100 90 80 Running Time in s 70 60 50 40 30 20 16 32 64 No. of Cores
Summary CHiC 2007 Extremely Good Price-Performance Ratio Frank Mietke Achieved Introduction The CHiC Project Ambitious Project Deadlines (Compromises) Benchmarks Summary Self-Design vs. Self-Made Performance Numbers of Intel/AMD Processor (Memory Bandwidth more important for us) Lustre Failover Configuration Expensive (Backup Strategy)
CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Thank You! Summary Any Questions?
CHiC 2007 Frank Mietke Introduction The CHiC Project Benchmarks Summary Backup Slides
Software–Environment CHiC 2007 Frank Mietke Scientific Linux 4.4 / 5.0 Introduction Open Fabcris Enterprise Ed. 1.2 The CHiC Project Lustre 1.6.3 –> Lustre 1.6.4 Benchmarks Summary Open MPI 1.2.4, MVAPICH-1.0beta and MVAPICH2-1.0.1 GNU Compiler 3.4.6 and 4.2.2, and EKOPath Compiler 3.1 TORQUE 2.1.8 and Maui 3.2.6p13 Nagios 2.9 xCAT 1.2.0 and Warewulf 2.6
Cluster Installation CHiC 2007 Frank Mietke Introduction 1 Month Deployment The CHiC Project Benchmarks Summary 21,6 Tons Material (Racks + Components) 4200 Nuts and 4600 Skrews necessary 4900 Cables with 9800 Connectors (8km Length) 300 Man-Days Effort
Recommend
More recommend