Tier 2 Computer Centres CSD3 Cambridge Service for Data Driven Discovery www.hpc-uk.ac.uk
Tier 2 Computer Centres A community resource……….founded on cooperation and collaboration Each centre will give a short introduction covering (some of): • USP • Contact Details • Hardware • Access Mechanisms • RSE Support Open Access Call – 12 th Oct (Technical Assessment – 21 st Sep) https://www.epsrc.ac.uk/funding/calls/tier2openaccess/
Andy Turner, EPCC a.turner@epcc.ed.ac.uk
Callum Bennetts/Maverick Photography Simple access routes • Free Instant Access for testing • (Driving Test access coming soon) • EPSRC RAP: Open Access Call 280 node HPE(SGI) ICE XA: • 10,080 cores (2 18-core Xeon per node) • 128 GiB memory per node • DDN Lustre file system • Single rail FDR Infiniband hypercube 1.9 PiB Tier-2 Data Facility: • DDN Web Object Scalar Appliances • Link to other Tier-1/2 facilities http://www.cirrus.ac.uk
Cirrus RSE Support User Support Technical Projects • Freely available to all users from • Explore new technologies, any institution software, tools • Provided by EPCC experts in a • Add new capabilities to Cirrus wide range of areas • Benchmark and profile commonly • Easily accessed through helpdesk: used applications just ask for the help you need • Work with user community and • Help provided directly to researcher other RSE’s or to RSE working with researchers Keen to work with RSE’s at other institutions to help them support local users on Cirrus
http://gw4.ac.uk/isambard James Price, University of Bristol j.price@bristol.ac.uk
The System • Exploring Arm processor technology • Provided by Cray • 10,000+ ARMv8 cores • Cray software tools • Compiler, math libraries, tools... • Technology comparison: • x86, Xeon Phi (KNL), NVIDIA P100 GPUs • Sonexion 3000 SSU (~450 TB) • Phase 1 installed March 2017 • The Arm part arrives early 2018 • Early access nodes from September 2017
Target codes • Will focus on the main codes from ARCHER • Already running on Arm: • VASP Access • CP2K • 25% of the machine time will be available • GROMACS to users from the EPSRC community • Unified Model (UM) • EPSRC RAP: Open Access Call • OpenFOAM • CloverLeaf • TeaLeaf User Support • SNAP • 4 x 0.5 FTEs from GW4 consortium • Many more codes ported by the wider Arm • Cray/Arm centre of excellence HPC user community • Training (porting/optimising for Arm) • Hackathons
HPC Midlands Plus www.hpc-midlands-plus.ac.uk Prof. Steven Kenny Loughborough University s.d.kenny@lboro.ac.uk
Centre Facilities • System supplied by Clustervision-Huawei • x86 system • 14,336 x86 cores • consisting of 512 nodes each with • 2 x Intel Xeon E5-2680v4 cpus with 14 cores per cpu • 128 GB RAM per node • 3:1 blocking EDR Infiniband network • giving 756 core non-blocking islands • 1 PB GPFS filestore • 15% of the system made available by EPSRC RAP and seedcorn time
Centre Facilities • OpenPower System • 5 x (2 x 10) core 2.86 GHz POWER8 systems each with 1 TB RAM connected to the Infiniband network • one with 2 x P100 GPGPUs • Dedicated 10 TB SSD GPFS filestore for prestaging files • Aim of the system is threefold • Data analysis of large datasets • Test bed for codes that are memory bandwidth limited • On-the-fly data processing • Comprehensive software stack installed www.hpc-midlands-plus.ac.uk/software-list • 4 FTE RSE support for academics at consortium Universities
http://www.jade.ac.uk Dr Paul Richmond EPSRC Research Software Engineering Fellow
The JADE System • 22 NVIDIA DGX-1 • 3.740 PetaFLOPs (FP16) • 2.816 Terabytes HBM GPU Memory • 1PB filestore • P100 GPUs - Optimised for Deep Learning • NVLink between devices • PCIe to Host (dense nodes) • Use cases • 50% ML (Deep Learning) • 30% MD • 20% Other
Hosting and Access • ATOS have been selected as the provider • Following procurement committees review from tender • Running costs to be recouped through selling time to industrial users • Hosted at STFC Daresbury • Will run SLURM scheduler for scheduling at the node level • Resource allocation • Open to all without charge • Some priority to supporting institutions • Light touch review process (similar to DiRAC)
Governance and RSE Support • All CIs have committed RSE support time for their local institutions • To support local users of JADE system • Training: Some commitment to training offered by come CIs (EPCC, Paul Richmond EPSRC RSE Fellow) • Organisation Committee: RSE Representative from each institution • Software Support and Requests via Github issue tracker • Governance via steering committee • Responsible for open calls http://docs.jade.ac.uk
Tier 2 Hub in Materials and Molecular Modelling (MMM Hub) Thomas www.thomasyoungcentre.org
Rationale for a Tier 2 Hub in MMM • Increased growth in UK MMM research created an unprecedented need for HPC, particularly for medium-sized, high- throughput simulations • These were predominantly run on ARCHER (30% VASP). Tier 3 sources were too constrained • The aim of the installation of “Thomas” was to rebalance the ecosystem for the MMM community • It has created a UK-wide Hub for MMM that serves the entire UK MMM community • The Hub will build a community to foster collaborative research and the cross- fertilisation of ideas • Support and software engineering training is offered
Thomas Service Architecture … “Thomas” Cluster 17,280 cores, 720 nodes; … 24 cores/node, 128GB RAM/node Performance Intel OPA 1:1 36 node blocks - Technical performance 3:1 between blocks x16 slot - 523.404 Tflop/s - 5.5 GiB/s IO bandwidth x16 x16 OSS x16 OSS x16 Thomas scratch (428TB) home and software www.thomasyoungcentre.org
Access and Sharing • Access models/mechanisms : 75% of machine cycles are available to the university partners providing funding for Thomas’ hosting and operations costs Funding partners Imperial, King’s, QMUL and UCL, Belfast, Kent, Oxford, Southampton 25% of cycles are available to the wider UK MMM Community Allocations to non-partner researchers and groups across the UK will be handled via existing consortia (MCC & UKCP), not T2 RAC Tier 2 – 1 integration via SAFE will be developed over the coming year www.thomasyoungcentre.org
Thomas Support Team Coordinator (Karen Stoneham) based at the TYC UCL RITS Research Computing Team support (x9) Online training & contact details User group oversee service at regular meetings ‘Points of Contact’ at each partner Institution managing allocations and account approval www.thomasyoungcentre.org
CSD3 Cambridge Service for Data Driven Discovery www.csd3.cam.ac.uk Mike Payne, University of Cambridge resources@csd3.cam.ac.uk
USPs • co-locate ‘big compute’ and ‘big data’ • facilitate complex computational tasks/workflows Hardware • 12,288 cores (2 x 16 core Intel Skylake/384 GB per node) • 12,288 cores (2 x 16 core Intel Skylake/192 GB per node) • 342 Intel Knights Landing/96 GB CSD3 • Intel Omnipath • 90 x Intel Xeon/4 Nvidia P100 (16GByte)/96 GB • EDR Infiniband Cambridge Service for • 50 node Hadoop cluster Data Driven Discovery • Hierarchical storage (burst buffers/SSDs/etc) www.csd3.cam.ac.uk • 5 PB disk + 10PB tape
CSD3 Access Mechanisms Aspirations • Pump priming/Proof of Concept It is our intention that over the lifetime of the • EPSRC Open Access CSD3 service an increasing proportion of the • EPSRC Grants (other RCs?) computational workload will be more • Cash (for academic/industrial/commercial complex computational tasks that exploit users) multiple capabilities on the system. resources@csd3.cam.ac.uk l You, as RSEs, are uniquely placed to develop RSE Support new computational methodologies, along with the innovative researchers you know. • Led by Filippo Spiga The CSD3 system is available to you for • 3 FTEs (plus additional support in some of developing and testing your methodology our partner institutions) and for demonstrating its capability. • Collaborative/cooperative support model
Recommend
More recommend