simplest scalable architecture
play

Simplest Scalable Architecture NOW Network Of Workstations Many - PowerPoint PPT Presentation

Cluster Computing Simplest Scalable Architecture NOW Network Of Workstations Many types of Clusters (form HPs Dr. Bruce J. Walker) Cluster Computing High Performance Clusters Beowulf; 1000 nodes; parallel programs; MPI


  1. Cluster Computing Simplest Scalable Architecture NOW – Network Of Workstations

  2. Many types of Clusters (form HP’s Dr. Bruce J. Walker) Cluster Computing • High Performance Clusters – Beowulf; 1000 nodes; parallel programs; MPI • Load-leveling Clusters – Move processes around to borrow cycles (eg. Mosix) • Web-Service Clusters – LVS; load-level tcp connections; Web pages and applications • Storage Clusters – parallel filesystems; same view of data from each node • Database Clusters – Oracle Parallel Server; • High Availability Clusters – ServiceGuard, Lifekeeper, Failsafe, heartbeat, failover clusters

  3. Many types of Clusters (form HP’s Dr. Bruce J. Walker) Cluster Computing • High Performance Clusters – Beowulf; 1000 nodes; parallel programs; MPI • Load-leveling Clusters – Move processes around to borrow cycles (eg. Mosix) • Web-Service Clusters – LVS; load-level tcp connections; Web pages and applications • Storage Clusters – parallel filesystems; same view of data from each node • Database Clusters – Oracle Parallel Server; • High Availability Clusters – ServiceGuard, Lifekeeper, Failsafe, heartbeat, failover clusters NOW type architectures

  4. NOW Approaches Cluster Computing • Single System View • Shared Resources • Virtual Machine • Single Address Space

  5. Shared System View Cluster Computing • Loadbalancing clusters • High availability clusters • High Performance – High throughput – High capability

  6. Berkeley NOW Cluster Computing

  7. NOW Philosophies Cluster Computing • Commodity is cheaper • In 1994 1 MB RAM was – $40/MB for a PC – $600/MB for a Cray M90

  8. NOW Philosophies Cluster Computing • Commodity is faster CPU MPP year WS year 150 MHz 93-94 92-93 Alpha 50MHz i860 92-93 ~91 32 MHz SS-1 91-92 89-90

  9. Network RAM Cluster Computing • Swapping to disk is extremely expensive – 16-24 ms for a page swap on disk • Network performance is much higher – 700 us for page swap over the net

  10. Network RAM Cluster Computing

  11. NOW or SuperComputer? Cluster Computing Machine Time Cost C-90 (16) 27 $30M RS6000 (256) 27374 $4M ”+ATM 2211 $5M ”+Parallel FS 205 $5M ”+NOW protocol 21 $5M

  12. The Condor System Cluster Computing • Unix and NT • Operational since 1986 • More than 1300 CPUs at UW-Madison • Available on the web • More than 150 clusters worldwide in academia and industry

  13. What is Condor? Cluster Computing • Condor converts collections of distributively owned workstations and dedicated clusters into a high- throughput computing facility. • Condor uses matchmaking to make sure that everyone is happy.

  14. What is High-Throughput Computing? Cluster Computing • High-performance: CPU cycles/second under ideal circumstances. – “How fast can I run simulation X on this machine?” • High-throughput: CPU cycles/day (week, month, year?) under non-ideal circumstances. – “How many times can I run simulation X in the next month using all available machines?”

  15. What is High-Throughput Computing? Cluster Computing • Condor does whatever it takes to run your jobs, even if some machines… – Crash! (or are disconnected) – Run out of disk space – Don’t have your software installed – Are frequently needed by others – Are far away & admin’ed by someone else

  16. A Submit Description File Cluster Computing # Example condor_submit input file # (Lines beginning with # are comments) # NOTE: the words on the left side are not # case sensitive, but filenames are! Universe = vanilla Executable = /home/wright/condor/my_job.condor Input = my_job.stdin Output = my_job.stdout Error = my_job.stderr Arguments = -arg1 -arg2 InitialDir = /home/wright/condor/run_1 Queue

  17. What is Matchmaking? Cluster Computing • Condor uses Matchmaking to make sure that work gets done within the constraints of both users and owners. • Users (jobs) have constraints: – “I need an Alpha with 256 MB RAM” • Owners (machines) have constraints: – “Only run jobs when I am away from my desk and never run jobs owned by Bob.”

  18. Process Checkpointing Cluster Computing • Condor’s Process Checkpointing mechanism saves all the state of a process into a checkpoint file – Memory, CPU, I/O, etc. • The process can then be restarted from right where it left off • Typically no changes to your job’s source code needed – however, your job must be relinked with Condor’s Standard Universe support library

  19. Remote System Calls Cluster Computing • I/O System calls trapped and sent back to submit machine • Allows Transparent Migration Across Administrative Domains – Checkpoint on machine A, restart on B • No Source Code changes required • Language Independent • Opportunities for Application Steering – Example: Condor tells customer process “how” to open files

  20. MOSIX and its characteristics Cluster Computing • Software that can transform a Linux cluster of x86 based workstations and servers to run almost like an SMP • Has the ability to distribute and redistribute the processes among the nodes

  21. MOSIX Cluster Computing • Dynamic migration added to the BSD kernel – Now Linux • Uses TCP/IP for communication between workstations • Requires Homogeneous networks

  22. MOSIX Cluster Computing • All processes start their life at the users workstation • Migration is transparent and preemptive • Migrated processes use local resources as much as possible and the resources on the home workstation otherwise

  23. Process Migration in MOSIX Cluster Computing User-level User-level Remote Link Link Layer Layer Deputy Kernel Kernel A local process and a migrated process

  24. MOSIX Cluster Computing

  25. Mosix Make Cluster Computing

  26. PVM Cluster Computing • Task based • Tasks can be created at runtime • Tasks can be notified on the death of a parent or child • Tasks can be grouped

  27. PVM Architecture Cluster Computing • Demon based communication • User defined host list • Hosts can be added and removed during execution • The virtual machine may be used interactively or in the background

  28. Cluster Computing Heterogeneous Computing • Runs processes on different architectures • Handles conversion between little endian and big endian architectures

  29. Cluster Computing PVM communication model • Explicit message passing • Has mechanisms for packing into buffers and unpacking from buffers • Supports Asynchronous Communication • Supports one to many communication • Broadcast • Multicast

  30. The virtual machine codes Cluster Computing • All calls to PVM return an integer, if less than zero this indicates an error • pvm_perror();

  31. PVM Cluster Computing

  32. Managing the virtual machine Cluster Computing • Add a host to the virtual machine • int info = pvm_addhosts( char **hosts, int nhost, int *infos ); • Deleting a host in the virtual machine • int info = pvm_delhosts( char **hosts, int nhost, int *infos ) • Shutting down the virtual machine • int info = pvm_halt( void );

  33. Managing the virtual machine Cluster Computing • Reading the virtual machine configuration • int info = pvm config( int *nhost, int *narch, struct pvmhostinfo **hostp ) • struct pvmhostinfo { int hi_tid; char *hi_name; char *hi_arch; int hi_speed; } hostp;

  34. Managing the virtual machine Cluster Computing • Check the status of a node • int mstat = pvm_mstat(char *host); • PvmOk host is OK • PvmNoHost host is not in virtual machine • PvmHostFail host is unreachable (and thus possibly failed)

  35. Tasks Cluster Computing • PVM tasks can be created and killed during execution • id = pvm_mytid(); • cnt = pvm_spawn(image, argv, flag, node, num, tids); • pid = pvm_parrent(); • pvm_kill(tids[0]); • pvm exit(); • int status = pvm_pstat( tid )

  36. Tasks Cluster Computing • int info = pvm_tasks( int where, int *ntask,struct pvmtaskinfo **taskp ) struct pvmtaskinfo{ int ti_tid; int ti_ptid; int ti_host; int ti_flag; char *ti_a_out; int ti_pid; } taskp;

  37. Managing IO Cluster Computing • In the newest version of PVM output may be redirected to the parent • int bufid = pvm_catchout( FILE *ff );

  38. Asynchronous events Cluster Computing • Notifications on special events • info = pvm_notify(event, tag, cnt, tids); • info = pvm_sendsig(tid, signal);

  39. Groups Cluster Computing • Groups allows for easy fragmentation of the execution in an application • num=pvm_joingroup("worker"); • size = pvm_gsize("worker"); • info = pvm_lvgroup("worker"); • int inum = pvm_getinst( char *group, int tid ) • int tid = pvm_gettid( char *group, int inum )

  40. Buffers Cluster Computing • PVM applications have a default send and a default receive buffer • buf=pvm_initsend(Default|Raw|In place); • info = pvm_pk(type)(data,10,1); • info = pvm_upk(type)(data,10,1);

  41. Managing Buffers Cluster Computing • info = pvm_mkbuffer(Default|Raw|In place); • oldbuf = pvm_setrbuf(bufid); • oldbuf = pvm_setsbuf(bufid); • int info = pvm_freebuf( int bufid ) • int bufid = pvm_getrbuf( void ); • int bufid = pvm_getsbuf( void );

Recommend


More recommend