pad cluster
play

PAD Cluster: An Open, Modular and Low Cost High Performance - PowerPoint PPT Presentation

PAD Cluster: An Open, Modular and Low Cost High Performance Computing System Volnys Borges Bernal Sergio Takeo Kofuji Guilherme Matos Sipahi Marcio Lobo Netto Laboratrio de Sistemas Integrveis, EPUSP Alan G. Anderson Elebra Defesa e


  1. PAD Cluster: An Open, Modular and Low Cost High Performance Computing System Volnys Borges Bernal Sergio Takeo Kofuji Guilherme Matos Sipahi Marcio Lobo Netto Laboratório de Sistemas Integráveis, EPUSP Alan G. Anderson Elebra Defesa e Controles Ltda

  2. Agenda • Main Objectives • PAD Cluster E nvironment • PAD Cluster Architecture • Communication Libraries • System Administrator Tools • Operator Tools • User Tools • Development E nvironment

  3. PAD Cluster • Main goals – Parallel Cluster Based Computing E nvironment • Based on Commodity Components • High Performance Communication Medium • Development E nvironment for Fortran77, fortran90 & HPF • MPI Interface • IE E E POSIX UNIX Interface • X-Windows Interface – Initial Application: • RAMS ( Regional Atmospheric Modeling System ) • Development: LSI-E PUSP + E lebra, FINE P support

  4. PAD Cluster • Characteristics – Use of High Performance Commodities Components – Linux Operating System • Important: – Integration • Hardware components • Software subsystems

  5. PAD Cluster E nvironment Configuration & Operation User Interface and Utilities CDE PAD-ptools Multiconsole Cluster Windows Parallel UNIX Partitioning Interface utilities Monitoring Clustermagic LSF POSIX System Configuration Job Unix & Replication Scheduling Interface Development Tools Compilers Tools Libraries GNU Portland MPI MPICH-FULL C, C++ Profiler MPICH F77 FULL Myrinet Portland Portland API/BPI F77, F90 F77. F90, BLAS, LaPack HPF Debugger BLACS ScalaPack

  6. PAD Cluster Architecture Processing Node Processing Node • System Architecture Processing Node – Processing nodes Processing Myrinet Node switch – Access Workstation Processing Synchronization Node Hardware – Administration Fast-Ethernet Processing Multi-serial Workstation Switch Node Processing – Fast-ethernet switch Node Processing – Myrinet Switch Node – Synchronization Hardware Access Workstation Administration Workstation to external network

  7. PAD Cluster Architecture • Node Architecture Intel Pentium II Intel Pentium II Intel Pentium II Intel Pentium II RAM RAM 333 MHz 333 MHz 333 MHz 333 MHz PCI Bridge Lm 78 PCI Bridge Lm 78 Myrinet SCSI Fast Ethernet Myrinet SCSI Fast Ethernet Controller Controller Controller Controller Controller Controller

  8. Communication Infrastructure • Primary Network – Fast-E thernet – General purpose network • For traditional network services (NFS, DNS, SNMP, XNTP, … ) – Operating System TCP/ IP Stack

  9. Communication Infrastructure • High Performance Network – Myrinet – For application data – Communication Libraries: • MPICH over Operating System TCP/ IP Stack • FULL user level interface library • MPICH-FULL user level interface library

  10. Communication Libraries • MPICH Library – MPI over TCP/ IP stack • FULL Library – User level communication library – Developed in LSI-E PUSP in 1998 – Implementation Based on Cornell’s UNE T • MPICH-FULL Library – User level communication library – Internode communication: MPICH + FULL – Intranode communication: MPICH + Shared Memory

  11. Communication Libraries • MPI-FULL performance Performance of Myrinet with MPICH-FULL Performance of Myrinet with MPICH-FULL Shared Memory ( 2 processes in one node) 2 processes (1 process per node) One 333 MHz dual node Two 333 MHz dual nodes 60 60 50 50 Mbytes/s Mbytes/s 40 40 30 30 20 20 10 10 0 0 0 200000 400000 600000 800000 1000000 1200000 0 200000 400000 600000 Size of Package in bytes Size of package in bytes Performance of Myrinet with MPICH-FULL 4 processes (2 processes per node) Two 333 MHz dual nodes 60 50 MBytes/s 40 30 20 10 0 0 200000 400000 600000 800000 1000000 1200000 Size of Package in bytes

  12. Communication Infrastructure • Synchronization Hardware – Support for collective MPI operations – Implemented in FPGA – Interfaces for 8 nodes – Based on PAPE RS – Operations • barrier • broadcast • allgather • allreduce – Global Wall Clock

  13. Communication Infrastructure • Serial Lines – Connects each node to the administration workstation – Allows remote console on the administration workstation

  14. System Administrator Tools • ClusterMagic – Two main funcions: • Cluster Configuration • Node Replication – Advantages • E asy configuration / reconfiguration • Assure uniformity • Fast node replication

  15. System Administrator Tools • Cluster Magic: Cluster Configuration operator cluster.conf cluster magic generated files hosts HOSTNAME bootptab hosts.equiv network rhosts ifcfg-eth0 DNS server files nsswitch.conf fstab resolv.conf exports profile lilo.conf inittab issue node node ifcfg-lo issue.net commun specific adm files files motd files

  16. System Administrator Tools • Cluster Magic: Node Replication – Node installation based on the replication of a “Womb Node” – ClusterMagic replication diskette: • boots a small Linux System • disk partitioning • womb image copying • configuration files instalation • Boot sector initialization – Automatic process – Takes about 12 minutes

  17. Operator Tools • Xadmin – Cluster Partitioning – Remote Commands • Multiconsole – Node console access • Job Scheduling – Job submission – LSF integrated with Cluster Partitioning • Cluster Monitoring

  18. Operator Tools • Xadmin – Node partitioning N0 N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 Cluster partitioning tool N0 N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 P1 P2 P3

  19. Operator Tools • Xadmin – Remote Commands

  20. Operator Tools • Multiconsole

  21. Operator Tools • Cluster Monitoring – Java + SNMP agents

  22. User Tools • PAD-ptools – Parallel versions of UNIX utilities – pcp, pls, pcat, … – Integratded with cluster partitioning • LSF – Job submission and control • mpirun – MPICH, MPI-FULL

  23. Development E nvironment • Portland – Fortran77 – Fortran90 – HPF – Profiler – Debugger • Libraries – BLAS, BLACS, LaPack, ScaLaPack • TotalView debbuger • VAMPIR profiler

  24. Conclusions • Complete product system: – E lebra Vortix Cluster ( PAD Cluster ) • www.elebra.com.br/ aero • Several Developments: – Hardware • Collective operations, Synchronization and Global Clock – Software • Communication Libraries • Cluster Tools • Communication Drivers

  25. Future Works • University of São Paulo + Purdue University + University of Pittsburg – Hardware for collective operations and synchronization with PCI 64 bits Interface • University of São Paulo + ICS-FOTH ( Greece ) – ATM Like Switch on 2.4 Gbps/ s • University of São Paulo – New cluster administration, management and secure tools – High Availability Data Base applications

  26. Acknowledgments • FINE P • LSI-E PUSP Development Team • E lebra Development Team

Recommend


More recommend