Bright Cluster Manager Advanced HPC cluster management made easy Martijn de Vries CTO Bright Computing
About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems and server farms 2. Incorporated in USA (HQ in San Jose, California) 3. Development office in Amsterdam, NL 4. Backed by ING Bank as shareholder and investor 5. Sells through a rapidly growing network of resellers and OEMs world-wide 6. Customers and resellers in US, Canada, Brazil, Europe, Middle-East, India, Singapore, Japan 7. Installations in Academia, Government, Industry, ranging from 4 node to TOP500 systems 2
Customers Academ ia Governm ent I ndustry 3
The Commonly Used “Toolkit” Approach Most HPC cluster management solutions use the “toolkit” approach (Linux distro + tools) • Examples: Rocks, PCM, OSCAR, UniCluster, CMU, bullx, etc. • Tools typically used: Ganglia, Cacti, Nagios, Cfengine, System Imager, Puppet, Cobbler, Hobbit, Big Brother, Zabbix, Groundwork, etc. Issues with the “toolkit” approach: • Tools rarely designed to work together • Tools rarely designed for HPC • Tools rarely designed to scale • Each tool has its own command line interface and GUI • Each tool has its own daemon and database • Roadmap dependent on developers of the tools Making a collection of unrelated tools work together • Requires a lot of expertise and scripting • Rarely leads to a really easy-to-use and scalable solution 4
About Bright Cluster Manager Bright Cluster Manager takes a much more fundamental & integrated approach • Designed and written from the ground up • Single cluster management daemon provides all functionality • Single, central database for configuration and monitoring data • Single CLI and GUI for ALL cluster management functionality Which makes Bright Cluster Manager … • Extremely easy to use • Extremely scalable • Secure & reliable • Complete • Flexible 5
Architecture CMDaemon 6
Bright Cluster Manager — Elements Cluster Management GUI Cluster Management Shell SSL / SOAP / X509 / IPtables Cluster Management Daemon PBS Pro Torque Monitoring Compilers Maui/MOAB Automation Libraries Provisioning Grid Engine Health Checks Debuggers SLURM Management Profilers LSF* SLES / RHEL / CentOS / SL / Oracle EL SLES / RHEL / CentOS / SL / Oracle EL ScaleMP vSMP Interconnect IPMI / iLO Ethernet Memory GPU Disk PDU CPU 7
HPC User Environment Let users focus on performing computations Rich collection of HPC software • Compilers (GNU, Intel*, Portland*, Open64, etc.) • Parallel middleware (MPI libraries, threading libraries, OpenMP, Global Arrays, etc.) • Mathematical libraries (ACML, MKL*, LAPACK, BLAS, etc.) • Development tools (debuggers, profilers, etc.) • Environment modules Intel Cluster Ready Compliant Compliant applications run out of the box 8
Management Interface Graphical User Interface (GUI) Offers administrator full cluster control Standalone desktop application Manages multiple clusters simultaneously Runs on Linux, Windows, MacOS X* Built on top of Mozilla XUL engine Cluster Management Shell (CMSH) All GUI functionality also available through Cluster Management Shell Interactive and scriptable in batch mode 9
Cluster Management Shell (CMSH) Features: Modular interface Command completion using tab key Command line history Output redirection to file or shell command Scriptable in batch mode Support for looping over objects Example [demo]% device [demo->device]% status demo ................ [ UP ] node001 ............. [ UP ] node002 ............. [ UP ] 12
Node Provisioning Image based Slave node image is a directory on the head node Unlimited number of images can be created Software changes for the slave nodes are made inside the image(s) on the head node Provisioning system ensures that changes are propagated to the slave nodes Nodes always boot over the network Slave nodes PXE boot into Node Installer, which Identifies node (switch port or MAC based) Configures BMC Partition disks (if any) and creates file systems Installs or updates software image Pivot the root from NFS to the local file system 17
Architecture — Monitoring CMDaemon BMC BMC BMC 20
Bright Cluster Manager for GPGPU 25
GPU Development Environment CUDA & OpenCL redistribution rights Current and previous versions of CUDA & OpenCL Easy switching between CUDA & OpenCL versions CUDA driver automatically compiled at boot time Support for new Fermi architecture • Native 64-bit GPU support • Multiple copy engine support • ECC reporting • Concurrent kernel execution • Fermi HW debugging support in cuda-gdb 26
GPU Monitoring 27
Cluster Health Checking Goal: provide problem free environment for running jobs Hardware & software health Three types of health check • Health checks before jobs are run – Halt workload manager few (milli)seconds before job is executed – Check health of each reserved node – If unhealthy, take off line, inform system administrator – Hand job back to workload manager • Frequently scheduled health checks – Run health check when node is not used – Run health check through queuing system • Hardware burn-in environment – Most thorough health check – Requires reboot All types are extensible 31
Scalability Cluster Management software should not be limiting factor for cluster size. Philosophy used for Bright Cluster Manager: All tasks performed by master node should be off- loadable to dedicated nodes. If master node can not handle a task as a result of cluster size, task can be placed on 1 or more dedicated nodes. For example: multiple dedicated load-balanced provisioning nodes may be assigned in a cluster. 36
Image Based Provisioning Software image (or “image”) is directory on head node Image contains full Linux file-tree (/bin, /usr, …) Software is not installed on nodes directly, but rather to image After image has been changed, changes can be propagated to the compute nodes Propagating image changes to nodes can be done in two ways: 1. Rebooting nodes 2. Using device imageupdate in CMSH, or “Update Node” in GUI Latter allows nodes to be updated without reboot Some changes do require reboot (e.g. kernel update) 37
Provisioning Process Node Installer submits provisioning request to head node Head node will queue request until a provisioning slot becomes available on one of the provisioning nodes (possibly just head node itself) Provisioning node will connect to compute node to provision software image to local file system Two install modes: • FULL : Re-partition hard drives, transfer image from scratch • SYNC : Only transfer differences between image and local disk Default install mode is SYNC Disk setup mismatch triggers FULL install mode 38
Changing Software Images Installing/updating RPMs rpm --root=/cm/images/default-image –i myapp.rpm yum --installroot=/cm/images/default-image install myapp yum --installroot=/cm/images/default-image update Installing software from source make DESTDIR=/cm/images/default-image install Note that not all Makefiles support $DESTDIR Usage example from Makefile: install -m644 file-example $(DESTDIR)/etc/file Making changes manually • chroot /cm/images/default-image cd /usr/src/myapp; make install • emacs /cm/images/default-image/etc/file 40
Cloud Bursting (in development) Allow clusters to be extended with cloud resources Cluster can grow or shrink based on workload and policies Integrated interface to public cloud providers Unsolved problem: how to deal with local storage? 41
Looking for challenging and exciting jobs in HPC? www.clustervision.com www.brightcomputing.com 42
Recommend
More recommend