hpc and i o subsystems
play

HPC and I/O Subsystems Ratan K. Guha School of Electrical - PDF document

HPC and I/O Subsystems Ratan K. Guha School of Electrical Engineering and Computer Science University of Central Florida Orlando, FL 32616 Overview My Experience and Current Projects Top 10 Supercomputers Cluster Computers Node


  1. HPC and I/O Subsystems Ratan K. Guha School of Electrical Engineering and Computer Science University of Central Florida Orlando, FL 32616 Overview  My Experience and Current Projects  Top 10 Supercomputers  Cluster Computers  Node to Node and I/O Communication  Current and Future trends 1

  2. My Experience  1990 - 1997- BBN Butterfly, DEC Mpp NSF Grant  2004 – Sun Cluster – ARO Grant W911NF04110100 Cluster Computing Facilities Ariel  # Nodes : 48 (Sun Fire V20z)  CPU : Dual AMD Opteron 242 1.6GHz processors  Memory : 2 GB  Network : Gigabit Ethernet  2x36 GB internal disk  OS : SunOS 5.9 2

  3. Current Projects  Composite cathodes for Intermediate Temperature SOFCs: A comprehensive approach to designing materials for superior functionality, PIs. N. Orlovskaya, A. Sleiti , J. Kapat (MMAE) , A. Masunov (NSTC), R. Guha (CS) [CPMD, Fluent] NASA Grant  VCluster: A Thread-Based Java Middleware for SMP and Heterogeneous Clusters – Ph. D. Dissertation Work  Parallel Simulation ARO Grants DAAD19-01-1-0502, W911NF04110100 Top 5 Supercomputers 1. DOE/NNSA/LLNL USA BlueGene/L - eServer Blue Gene Solution IBM 2. NNSA/Sandia National Laboratories USA -Red Storm - Sandia/ Cray Red Storm, Opteron 2.4 GHz dual core Cray Inc. 3. IBM Thomas J. Watson Research Center USA BGW - eServer Blue Gene Solution, IBM 4. DOE/NNSA/LLNL – USA ASC Purple - eServer pSeries p5 575 1.9 GHz, IBM 5. Barcelona Supercomputing Center, Spain, MareNostrum - BladeCenter JS21 Cluster, PPC 970, 2.3 GHz, Myrinet, IBM 3

  4. Top 6 -10 Supercomputers 6. NNSA/Sandia National Laboratories USA Thunderbird - PowerEdge 1850, 3.6 GHz, Infiniband Dell 7. Commissariat a l'Energie Atomique (CEA) FranceTera-10 - NovaScale 5160, Itanium2 1.6 GHz, Quadrics, Bull SA 8. NASA/Ames Research Center/NAS USA Columbia - SGI Altix 1.5 GHz, Voltaire Infiniband, SGI 9. GSIC Center, Tokyo Institute of Technology, Japan, TSUBAME Grid Cluster - Sun Fire x4600 Cluster, Opteron 2.4/2.6 GHz and ClearSpeed Accelerator, InfinibandNEC/Sun 10. Oak Ridge National Laboratory USA Jaguar - Cray XT3, 2.6 GHz dual Core, Cray Inc. Some Statistics  261 Intel processors,  113 AMD Operton family  93 IBM Power processors 4

  5. Cluster Computing  Become popular with the availability of  High performance microprocessors  High speed networks  Distributed computing tools  Provide performance comparable to supercomputers with a much lower price Cluster Computing  To run a parallel program on a cluster  Processes must be created on every Process 1 Process 2 Process 3 Process 4 machine in the cluster  Processes must be able to communicate with each other Ethernet 5

  6. Communications  Fiber Channel  Gigabit Ethernet  Myrinet  InfiniBand Myrinet  Designed by Myricom  High-speed LAN used to interconnect machines  Lightweight protocol (2Gb/s)  Low latency for short messages  Sustained data rate for large messages 6

  7. Gigabit Ethernet  Standardized by IEEE 802.3  Data rates in Gigabits/s  Deployed in high-capacity backbone network links  High end-to-end throughput and less expensive as compared to Myrinet  Four physical layer standards:  optical fiber, twisted pair cable, or balanced copper cable Fiber Channel  Gigabit-speed network technology used for storage networking  Runs on both twisted pair Cu and fiber optic  Reliable and scalable  4 Gb/s BW  Supports many topologies and protocols  Efficient  Cons  Although initially used for supercomputing, more popular now in storage markets  More standard definitions are increasing complexity of the protocol 7

  8. InfiniBand (IB)  High performance, low latency I/O Interconnect architecture for channel-based, switched fabric servers  Replacement for PCI shared-bus  First version released in Oct 2000 by InfiniBand Trade Association (ITA) formed  Compaq, Dell, HP, IBM, Intel, MS, Sun  responsible for compliance and interoperability testing of commercial products  June 2001 – Version 1.0a released Why is it different?  Unlike present I/O subsystem, IB is a network  Uses IPv6 with its 128-bit address  IB’s revolutionary approach:  Instead of sending data in parallel across the backplane bus (data path), IB uses a serial (bit-at-a-time) bus  Fewer pins saves cost and adds reliability  Serial bus can multiplex a signal  Supports multiple memory areas, which can be accessed by processors and storage devices 8

  9. Advantages of InfiniBand  High performance  20Gb/s node-to-node  60Gb/s switch-to-switch  IB has defined roadmap to 120Gb/s (fastest specification for any interconnect)  Reduced complexity  Multiple I/Os on one cable  Consolidates clustering transmissions, communications, storage and management data types over a single connection Advantages Contd.  Efficient interconnect  Communication processing in HW, not CPU, so full resource utilization at each node  Employs Remote DMA (efficient data transfer protocol)  Reliability, stability, scalability  Reliable end-to-end data connections  Virtualizations allow multiple apps to run on the same interconnect  IB fabrics have multiple paths and fault is limited to a link  Can support tens of thousands of nodes in single subnet 9

  10. Integrating into a data center  Connecting Fiber Channel storage fabrics to an IB infrastructure:  Bridges  Somewhat costly  Create a bottleneck that gates the Fiber Channel access to speeds less than the array is typically capable of delivering  Native interconnects  More cost-effective, easier-to-manage solution  Integrate the arrays directly to the IB fabric InfiniBand and Gigabit Ethernet?  IB is complimentary to GE or Fiber Channel. Cost of FC is quite high  GE and Fiber Channel are expected to connect into the IB fabric to access IB- enabled compute resources  Helps IT managers to better balance I/O and processing resources within an IB fabric  Allows applications to use IB’s RDMA to fetch data, computer and put intermediate results, good for HPC 10

  11. 11

  12. Current and Future Trends  HPC and I/O Subsystem communication will be faster and easier to manage  HPC scientific applications will continue  New multidisciplinary work will increase  Financial business will use HPC systems 12

  13. References  http://www.infinibandta.org  http://www.cray.com  http://www.myri.com  http://www.fibrechannel.org/  Jens Mache, "An Assessment of Gigabit Ethernet as Cluster Interconnect," iwcc , p. 36, 1999.  http://www.supercomp.org/sc2002/paperpdfs/pap.pap207.pdf  http://compnetworking.about.com/cs/clustering/g/bldef_infiniban.htm  InfiniBand today, Article by Dave Ellis http://www.wwpi.com/index.php?option=com_content&task=view&id=1163& Itemid=44  http://www.mellanox.com/pdf/presentations/Top500_Nov_06.pdf  http://www.mellanox.com/applications/top_500.php  Fiber Channel vs. InfiniBand vs. Ethernet http://www.processor.com/editorial/article.asp?article=articles%2Fp2911%2 F31p11%2F31p11%2Easp&guid=934C81176D3D40969DF5ABA3E28DC8 CF&searchtype=&WordList=&bJumpTo=True 13

Recommend


More recommend