Moreno Baricevic CNR-IOM DEMOCRITOS Trieste, ITALY Installation Installation Procedures Procedures for Clusters for Clusters PART 1 – Cluster Services and Installation Procedures
Agenda Agenda Cluster Services Cluster Services Overview on Installation Procedures Overview on Installation Procedures Configuration and Setup of a NETBOOT Environment Troubleshooting Cluster Management Tools Notes on Security Hands-on Laboratory Session 2
What's a cluster? What's a cluster? INTERNET Commodity Commodity Cluster Cluster HPC HPC LAN CLUSTER CLUSTER LAN NETWORK NETWORK servers, workstations, laptops, ... master-node computing nodes 3
What's a cluster? What's a cluster? A cluster needs : – Several computers, nodes, often in special cases for easy mounting in a rack – One or more networks (interconnects) to hook the nodes together – Software that allows the nodes to communicate with each other (e.g. MPI) – Software that reserves resources to individual users A cluster is : all of those components working together to form one big computer 4
Cluster example (internal network) Cluster example (internal network) 32 blades I/O srv STORAGE masternode 12x600GB (2x6 cores, 36x2TB I/O srv 24,48,96GB RAM) I/O srv STORAGE GPU node 12x600GB 36x2TB I/O srv GPU node FAT node (2TB RAM) 1 GB Ethernet (SP/iLO/mgmt) 1 GB Ethernet (NFS) 40 GB Infiniband (LUSTRE/MPI) 10 GB Ethernet (iSCSI) 1 GB (LAN) 5
What's a cluster from the HW side? What's a cluster from the HW side? RACKs + rack mountable SERVERS PC / WORKSTATION LAPTOP 1U Server (rack mountable) BLADE Servers :-( HP c7000 IBM Blade Center SUN Fire B1600 8-16 bays in 10U 6 14 bays in 7U 2x 16 bays in 3U 5x
What's a cluster from the HW side? What's a cluster from the HW side? 7
CLUSTER SERVICES CLUSTER SERVICES CLUSTER INTERNAL NETWORK NTP NTP CLUSTER-WIDE TIME SYNC DNS DNS DYNAMIC HOSTNAMES RESOLUTION E D DHCP O N INSTALLATION / CONFIGURATION LAN R (+ network devices configuration and backup) E TFTP T S A M NFS SHARED FILESYSTEM / R E REMOTE ACCESS V SSH SSH R FILE TRANSFER E PARALLEL COMPUTATION (MPI) S LDAP/NIS/... LDAP/NIS/... AUTHENTICATION ... 8
HPC SOFTWARE INFRASTRUCTURE HPC SOFTWARE INFRASTRUCTURE Overview Overview Users' Parallel Applications Users' Serial Applications Parallel Environment: MPI/PVM Software Tools for Applications e (compilers, scientific libraries) r a w t f Resources Management Software o s g n System Management Software i l b (installation, administration, monitoring) a n e - D U O O.S. Network Storage L C + (fast interconnection (shared and parallel services among nodes) file systems) 9
HPC SOFTWARE INFRASTRUCTURE HPC SOFTWARE INFRASTRUCTURE Overview (our experience) Overview (our experience) Fortran, C/C++ codes Fortran, C/C++ codes MVAPICH / MPICH / openMPI / LAM INTEL, PGI, GNU compilers BLAS, LAPACK, ScaLAPACK, ATLAS, ACML, FFTW libraries PBS/Torque batch system + MAUI scheduler k c a t SSH, C3Tools, ad-hoc utilities and scripts, IPMI, SNMP S n Ganglia, Nagios e p O NFS Gigabit Ethernet LUSTRE, LINUX Infiniband GPFS, GFS Myrinet SAN 10
CLUSTER MANAGEMENT CLUSTER MANAGEMENT Installation Installation Installation can be performed: - interactively - non-interactively Interactive installations: - finer control Non-interactive installations: - minimize human intervention and let you save a lot of time - are less error prone - are performed using programs (such as RedHat Kickstart) which: - “simulate” the interactive answering - can perform some post-installation procedures for customization 11
CLUSTER MANAGEMENT CLUSTER MANAGEMENT Installation Installation MASTERNODE Ad-hoc installation once forever (hopefully), usually interactive: - local devices (CD-ROM, DVD-ROM, Floppy, ...) - network based (PXE+DHCP+TFTP+NFS/HTTP/FTP) CLUSTER NODES One installation reiterated for each node, usually non-interactive. Nodes can be: 1) disk-based 2) disk-less (not to be really installed) 12
CLUSTER MANAGEMENT CLUSTER MANAGEMENT Cluster Nodes Installation Cluster Nodes Installation 1) Disk-based nodes - CD-ROM, DVD-ROM, Floppy, ... Time expensive and tedious operation - HD cloning: mirrored raid, dd and the like (tar, rsync, ...) A “template” hard-disk needs to be swapped or a disk image needs to be available for cloning, configuration needs to be changed either way - Distributed installation: PXE+DHCP+TFTP+NFS/HTTP/FTP More efforts to make the first installation work properly (especially for heterogeneous clusters), (mostly) straightforward for the next ones 2) Disk-less nodes - Live CD/DVD/Floppy - ROOTFS over NFS - ROOTFS over NFS + UnionFS - initrd (RAM disk) 13
CLUSTER MANAGEMENT CLUSTER MANAGEMENT Existent toolkits Existent toolkits Are generally made of an ensemble of already available software packages thought for specific tasks, but configured to operate together, plus some add-ons. Sometimes limited by rigid and not customizable configurations, often bound to some specific LINUX distribution and version. May depend on vendors' hardware. Free and Open - OSCAR (Open Source Cluster Application Resources) - NPACI Rocks - xCAT (eXtreme Cluster Administration Toolkit) - Warewulf/PERCEUS - SystemImager - Kickstart (RH/Fedora), FAI (Debian), AutoYaST (SUSE) Commercial - Scyld Beowulf - IBM CSM (Cluster Systems Management) - HP, SUN and other vendors' Management Software... 14
Network-based Distributed Installation Network-based Distributed Installation Overview Overview PXE DHCP TFTP INITRD RAM ROOTFS over NFS INSTALLATION CLONING ramfs or initrd NFS Kickstart/Anaconda SystemImager Customized at Customization Customization Customization creation time and through a through happens before through ad-hoc post- dedicated mount post-installation deployment, conf procedures point for each node when the of the cluster golden-image is created 15
Network-based Distributed Installation Network-based Distributed Installation Basic services Basic services Deployment PXE : network booting ● DHCP : IP binding + NBP (pxelinux.0) ● TFTP : pxe configuration file (pxelinux.cfg/<HEXIP>), alternative ● boot-up images (memtest, UBCD, ...) NFS : kickstart + RPM repository (with little modification HTTP(S) ● or FTP can be used too) Maintenance passive updates: post-boot updates using port-knocking, ssh, ● distributed shells, wget, ... active configuration/package updates: ssh, distributed shells ● advanced IT automation tools: Ansible, CFEngine, ... ● 16
Customization layers Customization layers Installation process Installation process 17
Customization layers Customization layers Ramdisk/Ramfs for disk-less nodes, rescue and HW test Ramdisk/Ramfs for disk-less nodes, rescue and HW test 18
Network booting (NETBOOT) Network booting (NETBOOT) PXE + DHCP + TFTP + KERNEL + INITRD PXE + DHCP + TFTP + KERNEL + INITRD DHCPDISCOVER PXE DHCP DHCPOFFER IP Address / Subnet Mask / Gateway / ... Network Bootstrap Program (pxelinux.0) E E D DHCPREQUEST D O PXE DHCP O N N DHCPACK G R PXE N E I T T S DHCP tftp get pxelinux.0 U PXE TFTP A P M M TFTP O / R C INITRD E / tftp get pxelinux.cfg/HEXIP V PXE+NBP TFTP T R N E E S I L C tftp get kernel foobar PXE+NBP TFTP tftp get initrd foobar.img kernel foobar TFTP 19
Network-based Distributed Installation Network-based Distributed Installation NETBOOT + KICKSTART INSTALLATION NETBOOT + KICKSTART INSTALLATION get NFS:kickstart.cfg kernel + initrd NFS get RPMs anaconda+kickstart NFS E E D D O O N tftp get tasklist N TFTP kickstart: %post G R n N E o I T T i S t U a A tftp get task#1 P l l M kickstart: %post TFTP a M t O / s n R C I E / V tftp get task#N T R kickstart: %post TFTP N E E S I L C tftp get pxelinux.cfg/default kickstart: %post TFTP tftp put pxelinux.cfg/HEXIP kickstart: %post TFTP 20
Diskless Nodes NFS Based Diskless Nodes NFS Based NETBOOT + NFS NETBOOT + NFS mount /nodes/rootfs/ kernel + initrd NFS E E D D O O N N G R mount /nodes/IPADDR/ N kernel + initrd NFS E I T S T S F U A N P M M r bind /nodes/IPADDR/FS e O / kernel + initrd NFS v R C o E / S V T F R N T E O E mount /tmp S I kernel + initrd TMPFS O L R C RW (volatile) /tmp/ as tmpfs (RAM) /nodes/10.10.1.1/var/ RW (persistent) RW (persistent) /nodes/10.10.1.1/etc/ RO /nodes/rootfs/ RW RO RW RO RW RO Resultant file system 21
Drawbacks Drawbacks Removable media (CD/DVD/floppy): not flexible enough – needs both disk and drive for each node (drive not always available) – ROOTFS over NFS: NFS server becomes a single point of failure – doesn't scale well, slow down in case of frequently concurrent accesses – requires enough disk space on the NFS server – RAM disk: need enough memory – less memory available for processes – Local installation: upgrade/administration not centralized – need to have an hard disk (not available on disk-less nodes) – 22
That's All Folks! That's All Folks! ( questions ; comments ) | mail -s uheilaaa baro@democritos.it ( complaints ; insults ) &>/dev/null 23
Recommend
More recommend