installation installation procedures procedures for
play

Installation Installation Procedures Procedures for Clusters for - PowerPoint PPT Presentation

Moreno Baricevic CNR-INFM DEMOCRITOS Trieste, ITALY Installation Installation Procedures Procedures for Clusters for Clusters PART 1 Agenda Agenda Cluster Services Cluster Services Overview on Installation Procedures Overview on


  1. Moreno Baricevic CNR-INFM DEMOCRITOS Trieste, ITALY Installation Installation Procedures Procedures for Clusters for Clusters PART 1

  2. Agenda Agenda Cluster Services Cluster Services Overview on Installation Procedures Overview on Installation Procedures Configuration and Setup of a NETBOOT Environment Troubleshooting Cluster Management Tools Notes on Security Hands-on Laboratory Session 2

  3. What's a cluster? What's a cluster? INTERNET Commodity Commodity Cluster Cluster HPC HPC LAN CLUSTER CLUSTER LAN NETWORK NETWORK servers, workstations, laptops, ... master-node computing nodes 3

  4. What's a cluster from the HW side? What's a cluster from the HW side? RACKs + rack mountable SERVERS PC / WORKSTATION LAPTOP 1U Server (rack mountable) BLADE Servers IBM Blade Center SUN Fire B1600 14 bays in 7U 16 bays in 3U 4

  5. CLUSTER SERVICES CLUSTER SERVICES CLUSTER INTERNAL NETWORK NTP NTP CLUSTER-WIDE TIME SYNC DNS DNS DYNAMIC HOSTNAMES RESOLUTION SERVER / MASTERNODE DHCP INSTALLATION / CONFIGURATION LAN (+ network devices configuration and backup) TFTP NFS SHARED FILESYSTEM REMOTE ACCESS SSH SSH FILE TRANSFER PARALLEL COMPUTATION (MPI) LDAP/NIS/... LDAP/NIS/... AUTHENTICATION ... 5

  6. HPC SOFTWARE INFRASTRUCTURE HPC SOFTWARE INFRASTRUCTURE Overview Overview Users' Parallel Applications Users' Serial Applications Parallel Environment: MPI/PVM Software Tools for Applications GRID-enabling software (compilers, scientific libraries) Resources Management Software System Management Software (installation, administration, monitoring) O.S. Network Storage + (fast interconnection (shared and parallel services among nodes) file systems) 6

  7. HPC SOFTWARE INFRASTRUCTURE HPC SOFTWARE INFRASTRUCTURE Overview (our experience) Overview (our experience) Fortran, C/C++ codes Fortran, C/C++ codes MVAPICH / MPICH / openMPI / LAM INTEL, PGI, GNU compilers BLAS, LAPACK, ScaLAPACK, ATLAS, ACML, FFTW libraries PBS/Torque batch system + MAUI scheduler gLite 3.x SSH, C3Tools, ad-hoc utilities and scripts, IPMI, SNMP Ganglia, Nagios NFS Gigabit Ethernet LUSTRE, LINUX Infiniband GPFS, GFS Myrinet SAN 7

  8. CLUSTER MANAGEMENT CLUSTER MANAGEMENT Installation Installation Installation can be performed: - interactively - non-interactively Interactive installations: - finer control Non-interactive installations: - minimize human intervention and let you save a lot of time - are less error prone - are performed using programs (such as RedHat Kickstart) which: - “simulate” the interactive answering - can perform some post-installation procedures for customization 8

  9. CLUSTER MANAGEMENT CLUSTER MANAGEMENT Installation Installation MASTERNODE Ad-hoc installation once forever (hopefully), usually interactive: - local devices (CD-ROM, DVD-ROM, Floppy, ...) - network based (PXE+DHCP+TFTP+NFS/HTTP/FTP) CLUSTER NODES One installation reiterated for each node, usually non-interactive. Nodes can be: 1) disk-based 2) disk-less (not to be really installed) 9

  10. CLUSTER MANAGEMENT CLUSTER MANAGEMENT Cluster Nodes Installation Cluster Nodes Installation 1) Disk-based nodes - CD-ROM, DVD-ROM, Floppy, ... Time expensive and tedious operation - HD cloning: mirrored raid, dd and the like (tar, rsync, ...) A “template” hard-disk needs to be swapped or a disk image needs to be available for cloning, configuration needs to be changed either way - Distributed installation: PXE+DHCP+TFTP+NFS/HTTP/FTP More efforts to make the first installation work properly (especially for heterogeneous clusters), (mostly) straightforward for the next ones 2) Disk-less nodes - Live CD/DVD/Floppy - ROOTFS over NFS - ROOTFS over NFS + UnionFS - initrd (RAM disk) 10

  11. CLUSTER MANAGEMENT CLUSTER MANAGEMENT Existent toolkits Existent toolkits Are generally made of an ensemble of already available software packages thought for specific tasks, but configured to operate together, plus some add-ons. Sometimes limited by rigid and not customizable configurations, often bound to some specific LINUX distribution and version. May depend on vendors' hardware. Free and Open - OSCAR (Open Source Cluster Application Resources) - NPACI Rocks - xCAT (eXtreme Cluster Administration Toolkit) - Warewulf/PERCEUS - SystemImager - Kickstart (RH/Fedora), FAI (Debian), AutoYaST (SUSE) Commercial - Scyld Beowulf - IBM CSM (Cluster Systems Management) - HP, SUN and other vendors' Management Software... 11

  12. Network-based Distributed Installation Network-based Distributed Installation Overview Overview PXE DHCP TFTP INITRD INSTALLATION ROOTFS over NFS Kickstart/Anaconda NFS NFS + UnionFS Customization Dedicated mount Customization through point for each node through Post-installation of the cluster UnionFS layers 12

  13. Network booting (NETBOOT) Network booting (NETBOOT) PXE + DHCP + TFTP + KERNEL + INITRD PXE + DHCP + TFTP + KERNEL + INITRD DHCPDISCOVER PXE DHCP DHCPOFFER IP Address / Subnet Mask / Gateway / ... Network Bootstrap Program (pxelinux.0) CLIENT / COMPUTING NODE SERVER / MASTERNODE DHCPREQUEST PXE DHCP DHCPACK PXE DHCP tftp get pxelinux.0 PXE TFTP TFTP INITRD tftp get pxelinux.cfg/HEXIP PXE+NBP TFTP tftp get kernel foobar PXE+NBP TFTP tftp get initrd foobar.img kernel foobar TFTP 13

  14. Network-based Distributed Installation Network-based Distributed Installation NETBOOT + KICKSTART INSTALLATION NETBOOT + KICKSTART INSTALLATION get NFS:kickstart.cfg kernel + initrd NFS get RPMs anaconda+kickstart NFS CLIENT / COMPUTING NODE SERVER / MASTERNODE tftp get tasklist kickstart: %post TFTP Installation tftp get task#1 kickstart: %post TFTP tftp get task#N kickstart: %post TFTP tftp get pxelinux.cfg/default kickstart: %post TFTP tftp put pxelinux.cfg/HEXIP kickstart: %post TFTP 14

  15. Diskless Nodes NFS Based Diskless Nodes NFS Based NETBOOT + NFS NETBOOT + NFS mount /nodes/rootfs/ kernel + initrd NFS CLIENT / COMPUTING NODE SERVER / MASTERNODE mount /nodes/IPADDR/ kernel + initrd NFS ROOTFS over NFS bind /nodes/IPADDR/FS kernel + initrd NFS mount /tmp kernel + initrd TMPFS RW (volatile) /tmp/ as tmpfs (RAM) /nodes/10.10.1.1/var/ RW (persistent) /nodes/10.10.1.1/etc/ RW (persistent) RO /nodes/rootfs/ RW RO RW RO RW RO Resultant file system 15

  16. Diskless Nodes NFS+UnionFS Based Diskless Nodes NFS+UnionFS Based NETBOOT + NFS + UnionFS NETBOOT + NFS + UnionFS mount /hopeless/roots/root kernel + initrd NFS+UnionFS CLIENT / COMPUTING NODE ROOTFS over NFS+UnionFS SERVER / MASTERNODE mount /hopeless/roots/overlay kernel + initrd NFS+UnioNFS mount /hopeless/roots/gfs kernel + initrd NFS+UnionFS mount /hopeless/clients/IP kernel + initrd NFS+UnionFS RW /hopeless/roots/192.168.10.1 /hopeless/roots/gfs RO /hopeless/roots/overlay RO RO /hopeless/roots/root Resultant file system RW! 16 DELETED FILEs NEW FILEs

  17. Drawbacks Drawbacks Removable media (CD/DVD/floppy): not flexible enough – needs both disk and drive for each node (drive not always available) – ROOTFS over NFS: NFS server becomes a single point of failure – doesn't scale well, slow down in case of frequently concurrent accesses – requires enough disk space on the NFS server – ROOTFS over NFS+UnionFS: same as ROOTFS over NFS – some problems with frequently random accesses – RAM disk: need enough memory – less memory available for processes – Local installation: upgrade/administration not centralized – need to have an hard disk (not available on disk-less nodes) – 17

  18. That's All Folks! That's All Folks! ( questions ; comments ) | mail -s uheilaaa baro@democritos.it ( complaints ; insults ) &>/dev/null 18

Recommend


More recommend