HPC Operations at the Cyprus Institute George Tsouloupas, PhD Head of HPC Facility George Tsouloupas (Nicosia 2015)
Overview ● Organization ● Hardware resources (clusters, storage, networking) ● Software (OS deployment, services and cloud infrastructure) ● Libraries and scientific software deployment using EasyBuild ● Tools George Tsouloupas (@JSC 2014)
Short History ● The Cyprus Institute est. 2007 ● CaSToRC (Director: Dina Alexandrou) ○ Central goal: To develop world-class research and education in computational science serving the Eastern Mediterranean in collaboration with other regional institutions ○ Development of a national High Performance Computing centre ● CyTera commissioned in Dec 2011 George Tsouloupas (@JSC 2014)
CyTera ● Cy-Tera is the first large cluster as part of a Cypriot National HPC Facility ● Cy-Tera Strategic Infrastructure Project ○ A new research unit to host a HPC infrastructure ○ RPF funded project (i.e. Nationally funded) ● LinkSCEEM leverages Cy-Tera ● Contributes resources to PRACE George Tsouloupas (@JSC 2014)
LinkSCEEM George Tsouloupas (@JSC 2014)
Projects and Resource Allocation ● Cyprus Meteorology Service George Tsouloupas (@JSC 2014)
Projects and Resource Allocation ● Semi-annual Allocation process ○ Internal technical reviews ○ External scientific reviews ● 43 Production projects to date ● 75 Preparatory projects to date George Tsouloupas (@JSC 2014)
HPC Ops George Tsouloupas (@JSC 2014)
Organization: Responsibilities ● Stelios Erotokritou ○ Project Liaison , Networking, System Administration ● Thekla Loizou ○ User Support , Scientific Software, System Administration. ● Andreas Panteli ○ PRACE services , System Administration, Scientific Software, Networking. ● George Tsouloupas ○ System Administration, User Support, Scientific Software, PRACE services, Networking, NCSA Liaising, HPC Ops head. George Tsouloupas (@JSC 2014)
Maintenance and downtimes ● Scheduled downtime - Monthly Maintenance ○ 0.7% downtime ● Unscheduled downtime ○ <0.1% due to operator blunders ○ UPS Issues: an additional 15-20 hours of downtime ● Downtime for Rebuilding CyTera: ○ Estimated <5% ● Still well within the promised 80% uptime George Tsouloupas (@JSC 2014)
Hardware Resources George Tsouloupas (@JSC 2014)
Resources -- Cytera ● Hybrid CPU/GPU Linux Cluster ● Computational Power ● 98 x 2 x 6-core compute nodes ● Each compute node = 128GFlops ● 18 x 2 x 6-core + 2 x NVIDIA M2070 GPU nodes ● Each GPU node = 1 Tflop ● Theoretical Peak Performance (TPP) = 30.5Tflops ● 48 GB memory per node ● MPI Messaging & Storage Access ● 40Gbps QDR Infiniband ● Storage: 360TB raw disk
Resources -- Prometheus ● ex PRACE Prototype ● Hybrid CPU/GPU Linux Cluster ● Computational Power ● 8 x 2 x 6-core + 2 x NVIDIA M2070 GPU nodes ● 24 GB memory per node ● MPI Messaging & Storage Access ● 40Gbps QDR Infiniband ● Storage: 40TB raw disk
Euclid -- Training Cluster ● Hybrid CPU/GPU Linux Cluster ● Training Cluster of the LinkSCEEM project ● Computational Power ○ 6 eight-core compute nodes + 2 NVIDIA Tesla T10 processors ● 16 GB memory per node ● MPI Messaging & Storage Access ● Infiniband Network ● Storage: 40TB raw disk ● In-house + Universities in Cyprus, Jordan; workshops...
Prototype Clusters ● Dell C8000 chassis ○ 2 nodes * 2 Xeon Phi + 2 nodes * 2x NVIDIA K20m ● MIC MEGWARE ○ 12 Xeon Phi Accelerators in 4 nodes.
Post-Processing ● post01 , post02 ○ 128GB Ram ○ Access to all filesystems ● Same software and modules as the clusters ○ compiled specifically for each node
Storage
Storage ● DDN9900 (LTS) ○ GPFS ● Cytera storage (IBM) ○ 200TB ○ 360TB (raw) ○ being phased out ■ 100TB scratch ● “ONYX” ○ 4.7GBytes/s ○ Commodity Hardware ○ GPFS ○ FhGFS/ BeeGFS ○ project storage ○ 360TB ● DDN7700 (LTS) ● BACKUP ○ GPFS ○ 80TB ○ 1GB/s ● DDN9550 (Auxiliary) ○ 180TB ○ room for another 400TB ○ NFS, Lustre ○ 40TB
“ONYX” storage integrated from scratch -- BeeGFS over ZFS over JBODs = Very good value for money! (<100euro/TB including the servers!) IB RDMA / IB TCP / Ethernet TCP Metadata SAS SAS Multipath 90x 4TB disks + 4 SSD disks
- Up to 3GB/s Writes (iozone) - Around 14000 directory creates/ second
Software (System) -- Filesystems ● Four GPFS filesystems ○ On three storage systems ○ GPFS multiclustering ○ Project Storage + LTS ● FhGFS/BeeGFS ○ Home directories on Euclid ○ Home directories on Prometheus ○ New 360TB system George Tsouloupas (@JSC 2014)
Software (System) -- Deployment ● XCAT ○ Two deployment servers (Separate VLAN’s) ■ Cytera ■ Everything else ○ “Thin” deployment ● Ansible ○ Infrastructure as code , git maintained ○ Manual configuration prohibited George Tsouloupas (@JSC 2014)
Software (System) -- Services ● Cy-Tera ○ RHEL 6 x86_64 ○ Torque/Moab SLURM ● Prometheus ○ CentOS 6.5 ○ SLURM ● Euclid ○ CentOS 6.5 ○ Torque/Maui SLURM ● Planck (Testing Cluster) ○ SLURM George Tsouloupas (@JSC 2014)
Software -- Workload Management ● 1st SLURM test on prototype cluster in 2012 ○ Basic configuration (single queue, etc.) ● Decision to move to SLURM ○ Save up on MOAB Licensing 50K over three years ■ Thats 1/2 of an engineer in terms of cost ○ Uniform scheduler across systems ○ It’s much easier to set up a test environment if you don’t have to worry about licensing... ● Transition from Moab to slurm ○ Gave users a 4-month head-start with access to SLURM ○ 80% of users only made the transition after they could no longer run on MOAB... George Tsouloupas (@JSC 2014)
SLURM Migration ● GOAL: Implement the exact functionality that we had in MOAB ○ Routing queues for gpu - cpu , job-size ○ Low-priority queues ○ Standing reservations + triggers George Tsouloupas (@JSC 2014)
SLURM Migration ● Requested memory on gpu nodes (gres). When a user was asking for mem-per-cpu more than 4 megabytes the nodes were allocated but they were remaining idle. To solve this we always make the requested memory per cpu equal to "0" for gpu jobs , in the job submission plugin. ● No triggers to start job in a reservation. We used cron. ● No routing queues as in Torque. We implemented the functionality in the job_submit plugin . ● Bug in IntelMPI with slurm, concerning hostlist parsing. Solved after IntelMPI Version 4.1 Update 3. ● Standing reservations locked into specific nodes. George Tsouloupas (@JSC 2014)
Software Available on all systems... ● Intel Compiler Suite (optimised on Intel architecture) ● PGI Compiler Suite (including OpenACC for GPU’s) (WIP for all systems) ● CUDA ● Optimised math libraries George Tsouloupas (@JSC 2014)
Scientific software and Libraries How I Learned to Stop Worrying and Love EasyBuild Facts: ● Modules provided to users: 641 a2ps Bonnie++ CUDA GDB guile LAPACK MCL numpy Qt TiCCutils ABINIT Boost cURL Geant4 gzip libctl MEME NWChem QuantumESPRESSO TiMBL ABySS Bowtie DL_POLY_Classic GEOS Harminv libffi MetaVelvet Oases R TinySVM AMOS Bowtie2 Doxygen gettext HDF libgtextutils METIS OpenBLAS RAxML Tk ant BWA EasyBuild GHC HDF5 libharu Mothur OpenFOAM RNAz Trinity aria2 byacc Eigen git HH-suite Libint MPFR OpenMPI SAMtools UDUNITS arpack-ng bzip2 ELinks GLib HMMER libmatheval mpiBLAST OpenPGM ScaLAPACK util-linux ATLAS cairo EMBOSS GLIMMER HPL libpng MrBayes OpenSSL ScientificPython Valgrind Autoconf ccache ESMF glproto hwloc libpthread- stubs MUMmer PAML SCons Velvet bam2fastq CD-HIT ETSF_IO GMP Hypre libreadline MUSCLE PAPI SCOTCH ViennaRNA BamTools CDO expat gmvapich2 icc libsmm MVAPICH2 parallel SHRiMP VTK Bash cflow FASTA gmvolf iccifort libtool NAMD ParFlow Silo WPS bbFTP cgdb FASTX-Toolkit gnuplot ictce libunistring nano ParMETIS SOAPdenovo WRF bbftpPRO Chapel FFTW goalf ifort libxc NASM PCRE Stacks xorg-macros beagle-lib Clang FIAT gompi imkl libxml2 NCL Perl Stow xproto BFAST ClangGCC flex google-sparsehash impi libxslt nco PETSc SuiteSparse YamCha binutils CLHEP fontconfig goolf Infernal libyaml ncurses pixman Szip Yasm biodeps ClustalW2 freeglut goolfc iomkl likwid netCDF pkg-config Tar ZeroMQ Biopython CMake freetype gperf Iperf LZO netCDF-Fortran PLINK tbb zlib Bison Corkscrew g2clib grib_api JasPer M4 nettle Primer3 Tcl zsync BLACS CP2K g2lib GROMACS Java makedepend NEURON problog tcsh BLAT CRF++ GCC GSL JUnit mc numactl Python Theano ● Modules that can be provided within hours: 2238 George Tsouloupas (@JSC 2014)
Software ● Automated reproducible build processes ● Maintain multiple compilers/versions ● 1000’s of software packages
Targetting communities e.g. bioinformatics ● Local team has contributed tens of bioinformatics- related packages to EasyBuild. (posters at BBC13 and CSC2013) ● Galaxy server ○ tested last summer ○ to be deployed.
Recommend
More recommend