hpc x a new resource for uk computational science
play

HPC x : A New Resource for UK Computational Science Mike Ashworth, - PowerPoint PPT Presentation

HPC x : A New Resource for UK Computational Science Mike Ashworth, Ian J. Bush, Martyn F. Guest, Martin Plummer and Andrew G. Sunderland CLRC Daresbury Laboratory, UK m.f.guest@dl.ac.uk and Stephen Booth, David S. Henty, Lorna Smith and Kevin


  1. HPC x : A New Resource for UK Computational Science Mike Ashworth, Ian J. Bush, Martyn F. Guest, Martin Plummer and Andrew G. Sunderland CLRC Daresbury Laboratory, UK m.f.guest@dl.ac.uk and Stephen Booth, David S. Henty, Lorna Smith and Kevin Stratford EPCC, University of Edinburgh, UK http://www.hpcx.ac.uk/

  2. Outline • HPCx Overview – HPCx Consortium – HPCx Technology - Phases 1, 2 and 3 (2002-2007) • Performance Overview of Strategic Applications: – Computational Materials – Molecular Simulation Applicat ions, and not – Molecular Electronic Structure H/ W dr iven – Atomic and Molecular Physics – Computational Engineering – Environmental Science • Evaluation across a range of Current High-End Systems: – IBM SP/p690, SGI Origin 3800/R14k-500, HP/Compaq AlphaServer SC ES45/1000 and Cray T3E/1200E • Summary HPCS 2003, Sherbrooke 14th May 2003 2

  3. HPCx Project overview • A joint venture between the Edinburgh Parallel Computing Centre (EPCC) at the University of Edinburgh and the Daresbury Laboratory of the Central Laboratory for the Research Councils (CLRC) • Project funded to £53M (~$120M) by UK Government • Established to operate and support the principal academic and research computing service for the UK • Principal objective being to provide a Capability Computing service to run scientific applications that could not be run on any other available computing platform • Six-year project with defined performance requirements at year 0, year 2 and year 4 so as to match Moore’s Law • IBM chosen as the technology partner with Power4 based p690 platform, and the “ best available interconnect” HPCS 2003, Sherbrooke 14th May 2003 3

  4. Consortium partners • EPCC (University of Edinburgh) – established in 1991 as the University’s interdisciplinary focus for high-performance computing and commercial exploitation arm – has hosted specialised HPC services for the UK’s QCD community since 1989. 5Tflop QCDOC system due 2003 in project with Columbia, IBM and Brookhaven National Laboratory – operated and supported UK national services on CRAY T3D and T3E systems from 1994 until 2002 • CLRC (Daresbury Laboratory) – HPC service provider to the UK academic community for > 25 yrs – research, development & support centre for leading edge academic engineering and physical science simulation codes – distributed computing support centre for COTS processor & network technologies, evaluating scalability and performance – UK grid support centre HPCS 2003, Sherbrooke 14th May 2003 4

  5. HPCx Technology Phase 1 Phase 1 (Dec. 2002): 3 TFlop/s Rmax Linpack – 40 Regatta-H SMP compute systems (1.28 TB memory) • 32 x 1.3GHz processors, 32 GB memory; 4 x 8-way LPARs – 2 Regatta-H I/O systems • 16 x 1.3GHz processors (Regatta-HPC), 4 GPFS LPARS • 2 HSM/backup LPARS, 18TB EXP500 fibre-channel global filesystem – Switch Interconnect • Existing SP Switch2 with "Colony" PCI adapters in all LPARs (20 us latency, 350 MB/s bandwidth) • Each compute node has two connections into switch fabric (dual plane) • 160 x 8-way compute nodes in total – Ranked #9 in the TOP500 list (November 2002) HPCS 2003, Sherbrooke 14th May 2003 5

  6. HPCx Technology Phases 2 & 3 Phase 2 (2004): 6 TFlop/s Rmax Linpack – >40 Regatta-H+ compute systems • 32 x 1.8GHz processors, 32 GB memory, full SMP mode (no LPAR) – 3 Regatta-H I/O systems (Double the capabilities of Phase 1) – "Federation" switch fabric • bandwidth quadrupled, ~5-10 microsecond latency, Connect to GX bus directly Phase 3 (2006): 12 TFlop/s Rmax Linpack – >40 Regatta-H+ compute systems • unchanged from Phase 2 – >40 additional Regatta-H+ compute systems • double the existing configuration – 4 Regatta I/O systems (Double the capabilities of Phase 2) Open to Alternative Technology Solutions (IPF, BlueGene/L ..) HPCS 2003, Sherbrooke 14th May 2003 6

  7. HPCx - Phase 1 Technology at Daresbury November 2002 November 2002 July 2002 July 2002 HPCS 2003, Sherbrooke 14th May 2003 7

  8. IBM p-series 690Turbo:Multi-chip Module (MCM) Four POWER4 chips (8 pr ocessor s) on an MCM, wit h t wo associat ed memor y slot s M M Mem E Mem E L3 Ctrl Ctrl M L3 M Shared L2 Shared L2 O O GX Distributed switch Distributed switch GX R R Bus Bus Y Y S S GX GX L L Bus Bus O Distributed switch O Distributed switch T T Shared L2 Shared L2 Mem Mem L3 L3 Ctrl Ctrl 4 GX Bus links f or external L3 cache shared connections across all processors HPCS 2003, Sherbrooke 14th May 2003 8

  9. Serial Benchmark Summary Perf ormance relat ive t o t he SGI Origin 3800/ R12k-400 7 X Cray T3E 7 X Cray T3E OVERALL 3 X SGI Origin 3800 3 X SGI Origin 3800 GAMESS-UK DLPOLY Intel Tiger Madison 1.2GHz † † Cray T3E/1200E SGI Origin 3800/R12k-400 HP/Compaq ES45 1 GHz Chem. Kernels IBM SP/p690 1.3 GHz MATRIX-97 0 100 200 300 400 HPCS 2003, Sherbrooke 14th May 2003 9

  10. SPEC CPU2000: SPECfp vs SPECfp _ rate (32 CPUs) Values relat ive t o t he I BM 690 Turbo 1.3 GHz Compaq Alpha GS320/731 Compaq Alpha GS320/1000 SP SP ECf p ECf p SGI Origin3800/R12k-400 SP SP ECf p_r at e ECf p _r at e SGI Origin3800/R14k-500 SGI Origin3800/R14k-600 HP Superdome/PA8600-552 HP Superdome/PA8700-750 IBM 690 Turbo 1.3 GHz 0 20 40 60 80 100 120 HPCS 2003, Sherbrooke 14th May 2003 10

  11. Interconnect Benchmark - EFF_BW 530 SGI Origin 3800/R14k-500 964 AlphaServer SC ES45/1000 (1CPU) QSNet 507 AlphaServer SC ES45/1000 (4CPU) 264 IBM SP/WH2-375 362 IBM SP/NH2-375 (16 - 8+8) 1057 IBM SP/Regatta-H (16 - 16X1) 825 IBM SP/Regatta-H (16 - 8 + 8) IBM Regatta-H 1217 Cray T3E/1200E 2151 698 CS2 QSNet AlphaEV67 (1 CPU) QSNet QSNet 456 CS2 QSNet AlphaEV67 (2 CPU) 635 16 CPUs CS9 P4/2000 Xeon - Myrinet (1CPU) 16 CPUs 476 CS9 P4/2000 Xeon - Myrinet (2CPU) Myrinet 2k 2k Myrinet 667 CS8 Itanium/800 - Myrinet (1CPU) 516 CS8 Itanium/800 - Myrinet (2CPU) 367 AMD K7/1000 MP - SCALI (2 CPU) SCALI SCALI 84.2 CS4 AMD K7/1200 - LAM 44.2 CS6 PIII/800 - MPICH MBytes/ sec / sec MBytes Fast Ethernet Fast Ethernet 77.9 CS6 PIII/800 - LAM 0 500 1000 1500 2000 HPCS 2003, Sherbrooke 14th May 2003 11

  12. Capability Benchmarking and Application Tuning • Materials Science – CASTEP, AIMPRO & CRYSTAL • Molecular Simulation – DL-POLY & NAMD HPCx Terascale • Atomic & Molecular Physics Applicat ions – PFARM and H2MOL Team • Molecular Electronic Structure – GAMESS-UK & NWChem • Computational Engineering – PDNS3D • Environmental Science – POLCOMS HPCS 2003, Sherbrooke 14th May 2003 12

  13. Systems Used In Performance Analysis • IBM Systems – IBM SP/Regatta-H (1024 procs, 8-way LPARs) HPCx system at DL – Regatta-H (32-way) and Regatta HPC (16-way) (Montpelier) – SP/Regatta-H (8-way LPARs, 1.3 GHz) at ORNL • HP/Compaq AlphaServer SC – 4-way ES40/667 (APAC) and 833 MHz SMP nodes ; – TCS1 system at PSC: 750 4-way ES45 nodes - 3,000 EV68 1 GHz CPUs, with 4 GB memory per node – Quadrics “fat tree” interconnect ( 5 usec latency, 250+ MB/sec B/W ) • SGI Origin 3800 – SARA (1000 CPUs) - NumaLink - with R14k/500 and R12k/400 CPUs – CSAR (512 CPUs) - NumaLink - R12k/400 • Cray T3E/1200E – CSAR (788 CPUs) HPCS 2003, Sherbrooke 14th May 2003 13

  14. Materials Science AIMPRO (Ab Initio Modelling PROgram) Patrick Briddon et al, Newcastle University http://aimpro.ncl.ac.uk/ CRYSTAL Properties of crystalline systems periodic HF or DFT Kohn-Sham Hamiltonian various hybrid approximations http://www.cse.clrc.ac.uk/cmg/CRYSTAL/ CASTEP CAmbridge Serial Total Energy Package http://www.cse.clrc.ac.uk/cmg/NETWORKS/UKCP/ HPCS 2003, Sherbrooke 14th May 2003 14

  15. The AIMPRO benchmark 6.0 SGI Origin 3800/R12k-400 IBM SP/p690 Performance (10000/time) 5.0 x1.6 4.0 x2.3 x4.3 3.0 216 at oms : C impurit y in a Si lat t ice; 2.0 5180 basis f unct ions; 1.0 limit ed by ScaLaPack rout ine PDSYEVX 0.0 0 32 64 96 128 160 192 224 256 Number of processors HPCS 2003, Sherbrooke 14th May 2003 15

  16. Scalability of Numerical Algorithms I. SGI Origin 3800/R12k-400 (“green”) SGI Origin 3800/R12k-400 (“green”) Real symmetric 50 Fock matrix (N = 1152) eigenvalue problems 120 PeIGS 2.1 Fock matrix PeIGS 2.1 PeIGS 3.0 PeIGS 2.1 - Cray T3E/1200 (N = 3888) 40 PDSYEV (Scpk 1.5) PeIGS 3.0 PDSYEVD (Scpk 1.7) 100 PDSYEV (Scpk 1.5) BFG-Jacobi (DL) PDSYEVD (Scpk 1.7) Time (sec) 30 80 60 20 40 10 20 0 0 16 32 64 128 256 512 2 4 8 16 32 Number of processors Number of processors HPCS 2003, Sherbrooke 14th May 2003 16

  17. Scalability of Numerical Algorithms II. IBM SP/p690 and SGI Origin O3800/R12k IBM SP/p690 and SGI Origin O3800/R12k N = 3,888 N = 3,888 Time (secs.) Real symmetric 100 PeIGS 3.0 O3K eigenvalue problems PeIGS 3.0 IBM SP/p690 PDSYEV O3K PDSYEV IBM SP/p690 N = 9,000 N = 9,000 Time (secs.) 80 PDSYEVD O3K PDSYEVD IBM SP/p690 800 739 IBM SP/p690 60 PDSYEV 600 PDSYEVD 40 407 400 251 20 200 147 99 99 75 57 0 0 32 64 128 256 512 16 32 64 128 256 Number of processors Number of processors HPCS 2003, Sherbrooke 14th May 2003 17

Recommend


More recommend