xcpu a process management system
play

XCPU: A Process Management System L a t c h e s a r I o n k o v - PowerPoint PPT Presentation

XCPU: A Process Management System L a t c h e s a r I o n k o v L O S A L A M O S N A T I O N A L L A B O R A T O R Y HPC Cluster Desktop Desktop Desktop Desktop Head FS Node ... CN1 CN2 CN k IO1 FS ... CN k+1 CN k+2 CN l IO2


  1. XCPU: A Process Management System L a t c h e s a r I o n k o v L O S A L A M O S N A T I O N A L L A B O R A T O R Y

  2. HPC Cluster Desktop Desktop Desktop Desktop Head FS Node ... CN1 CN2 CN k IO1 FS ... CN k+1 CN k+2 CN l IO2 FS ... CN l+1 CN l+2 CN m IO3 FS ... CN m+1 CN m+2 CN n IO p FS

  3. Hardware Nodes • 4-8 CPU cores • 2GB RAM per core • diskless Network • high bandwidth (2-20 GB/s) • low latency • mostly Infiniband

  4. Software Linux Cluster management: Perceus, Warewulf, XCAT Job scheduling: Torque, Moab, SLURM Compute Nodes: • boot over the network • minimal root image in the RAM • more software on parallel filesystem • same software on all nodes

  5. Running a Job Make sure all libraries are included in the cluster software stack Collect all binaries, configuration and data files Write job script that • transfers all files to the assigned nodes (if not on a mounted filesystem already) • runs the binary • waits for the result • collects the results and sends the back Schedule a job, wait until it is finished

  6. Unix: Resources As Files Most devices accessible as files, but ioctl • normal files don’t do ioctls • ioctls are opaque • impossible to support over the network Not everything is a file

  7. Unix: Sharing Resources Files -- NFS, CIFS, AFS, FTP Printers -- CUPS, LPD Sound -- Pulseaudio, aRts, NAS Display -- X11, VNC, NX Ad-hoc protocols for each device

  8. Sharing I Would Like Access local files on a remote server Remote program to use the local sound card Program running remotely to print on the local printer Program running remotely to use the locally established VPN

  9. Resources As (Regular) Files If device files don’t use ioctl operations sharing over the network is easy Devices: • /dev/sound • /dev/printer • /dev/display • /dev/net

  10. Unix: Too Is Uni- Single file namespace • all users see the same files • the root decides what filesystems can be mounted The root decides what printers the users can print to The root decides what networks are available

  11. Linux Private Namespaces Linux allows processes to have private namespaces Security issues -- legacy applications and libraries expect single namespace Solution -- only root can create private namespaces Result -- nobody uses private namespaces

  12. Sharing Solutions Virtualization User-level workarounds Files -- GNOME GIO/GVFS, KDE KIO Printers -- none Network -- none

  13. How To Fix IT Fix legacy code and loosen private namespace restrictions, loosen mount restrictions Represent more resources as files -- FUSE and 9P make it easy Get rid of ioctls for the kernel devices Effect: • resources can be shared easily over the network • users can setup their favorite resources on a remote server without involving the sysadmin

  14. How Will It Work When a user logs in on a remote server, a new private namespace is created Print on local printer -- mount at /dev/printer Sound on local speakers -- mount at /dev/sound Use local VPN -- mount it at /dev/net The resources are invisible to other users and don’t affect their work

  15. XCPU: Remote Execution Distribute job related files (binary, data, configuration) to all nodes Setup job environment, arguments Start, monitor and control job execution Clean-up when the job is done Can survive head node crash

  16. XCPU Interface Interface implemented Job session files as file tree • ctl • argv Global files • arch • env • ctl • std{in,out,err} • clone • wait • env

  17. XCPU: Example XCPU file interface mounted on /mnt/xcpu $ cd /mnt/xcpu $ ls arch clone ctl env $ tail -f clone & 2 $ cd 2 $ ls argv ctl fs/ stdin stdout stderr wait $ echo foo > argv $ cp /bin/cat fs/cat $ echo hello world > fs/foo $ echo exec cat > ctl $ cat stdout hello world

  18. XCPU: How To Scale Copy files to many nodes head Linear (or even parallel) node distribution doesn’t scale Solution: setup few n1 n2 sessions from the head node and instruct the n3 n4 n5 n6 compute nodes to clone them further n7 n8 n9 n10 Runs recursively, as many levels as necessary

  19. XCPU: Tree Spawn 1.Head node creates sessions on all nodes (open clone) head 2.Head node sets up few node sessions (argv, env, executable and input files) n1 n2 3.Head node instructs sessions to clone n3 n4 n5 n6 themselves to other sessions echo clone n3,n4,n7,n8 > ctl n7 n8 n9 n10 4.Head node starts execution

  20. XCPU Implementation 9P2000 resource sharing protocol The server (xcpufs) runs on every compute node The synthetic file interface exported by xcpufs is mountable on Linux, or accessible on any OS using user-space 9P client library Total size, server, tools and libraries, is 20K lines of code

  21. XCPU Client Tools job execution xrx n[1-128],n250 /bin/date xrx -s n[1-128] /bin/date xrx -n 2 n[1-128] /bin/date xrx -a /bin/date xrx -J foo list processes -- xps kill process -- xk libxcpu

  22. XCPU Security Ownership and permissions on files define who can do what The program runs as the user that mounted the file interface Authentication • similar to the ssh authentication • no authentication server, public keys distributed in advance • superuser (xcpu-admin) can run job as any user XCPU users different than Unix users

  23. XCPU2: Next Level Of Sharing The desktop exports its filesystem All nodes for a job mount it and see the same files as the user’s desktop If an application works on user’s desktop it will (most likely) work on the cluster No library mismatches, no missing files, no wrong pathnames Similar to Plan9’s cpu command

  24. XCPU2 Cluster New node type -- job Control control node Node Responsible for controlling Compute Compute Compute Compute the nodes assigned for a Node Node Node Node Job Control job Node Compute Compute Compute Compute Node Node Node Node Job nodes “see” the Job Compute Compute Compute Compute Control Node Node Node Node filesystem on the job Node Job control node Compute Compute Compute Compute Control Node Node Node Node Node Jobs on the same node can Compute Compute Compute Compute Node Node Node Node use different distributions

  25. XCPU2 Example XCPU file interface mounted on /mnt/xcpu $ pwd /home/lucho $ ls foo bar $ xrx remote pwd /home/lucho $ xrx remote ls foo bar $

  26. XCPU2 Namespace Common case -- import Operations the root filesystem from unshare job control node mount ns file allows custom namespaces bind import Example: cd unshare import $XCPUTSADDR /mnt/term bind /dev /mnt/term/dev chroot bind /proc /mnt/term/proc bind /sys /mnt/term/sys chroot /mnt/term cache

  27. XCPU2 Scalability All nodes for a job are likely to use the same system files /bin/cat head /etc/hosts node Cooperative caching /bin/cat /bin/cat n1 n2 between the nodes in a job would achieve high /bin/cat /bin/cat n3 n4 n5 n6 hit rate Currently read-only, non-cooperative caching

  28. Conclusions XCPU2 transparently imports user’s desktop environment to all cluster nodes Makes it very easy to use different distributions and configurations If more devices and services operated as normal files, the integration would be even better (Plan9’s cpu command) Experiment with user- and kernel-level services that look like regular files Don’t be afraid of private namespaces, use them and ask your distributions for support!

  29. Links Plan9 http://plan9.bell-labs.com Glendix http://www.glendix.org 9P libraries http://9p.cat-v.org/implementations XCPU http://xcpu.org

Recommend


More recommend