interconnect technologies for clusters interconnect
play

Interconnect Technologies for Clusters Interconnect approaches - PowerPoint PPT Presentation

Cluster Computing Interconnect Technologies for Clusters Interconnect approaches Cluster Computing WAN infinite distance LAN Few kilometers SAN Few meters Backplane Not scalable Physical Cluster


  1. Cluster Computing Interconnect Technologies for Clusters

  2. Interconnect approaches Cluster Computing • WAN – ’infinite distance’ • LAN – Few kilometers • SAN – Few meters • Backplane – Not scalable

  3. Physical Cluster Interconnects Cluster Computing • FastEther • Gigabit EtherNet • 10 Gigabit EtherNet • ATM • cLan • Myrinet • Memory Channel • SCI • Atoll • ServerNet

  4. Switch technologies Cluster Computing • Switch design – Fully interconnected – Omega • Package handling – Store and forward – Cut-through routing (worm-hole routing)

  5. Implications of switch technologies Cluster Computing • Switch design – Affects the constant associated with routing • Package handling – Affects the overall routing latency in a major may

  6. Store-and-fwd vs. Worm- hole one step Cluster Computing • T ( ν ) = Overhead+Channel+Time Routing Delay • Cut through: • Store ’n fw:

  7. Store-and-fwd vs. Worm- hole ten steps Cluster Computing • T ( ν ) = Overhead+Channel+Time Routing Delay • Cut through: • Store ’n fw:

  8. FastEther Cluster Computing • 100 Mbit/sec + Generally supported + Extremely cheap - Limited bandwidth - Not really that standard - Not all implementations support zero-copy protocols

  9. Gigabit EtherNet Cluster Computing • Ethernet is hype-only at this stage • Bandwidth really is 1Gb/sec • Latency is only slightly improved – Down to 20us from 22us in 100Mb • Current standard – But NICs are as different as with FE

  10. 10 Gigabit EtherNet Cluster Computing • Target applications not really defined – But clusters are not the most likely customers – Perhaps as backbone for large clusters • Optical interconnects only – Copper currently being proposed

  11. ATM Cluster Computing • Used to be the holy grail in cluster computing • Turns out to be poorly suited for clusters – High price – Tiny packages – Designed for throughput not reliability

  12. cLAN Cluster Computing • Virtual Interface Architecture • API standard not HW standard • 1.2 Gbit/sec

  13. Myrinet Cluster Computing • Long time ’defacto-standard’ • LAN and SAN architectures • Switch-based • Extremely programmable

  14. Myrinet Cluster Computing • Very high bandwidth – 0.64Gb + 0.64 Gb in gen 1 (1994) – 1.28Gb + 1.28 Gb in gen 2 (1997) – 2.0Gb + 2.0 Gb in gen 3 (2000) – (10.0Gb + 10Gb in gen4 (2005)) ether • 18 bit parallel wires • Error-rate at 1bit per 24 hours • Very limited physical distance

  15. Myrinet Interface Cluster Computing • Hosts a fast RISC processor – 132 MHz in newest version • Large memory onboard – 2,4 or 8MB in newest version • Memory is used as both send and recieve buffers and run at CPU speed – 7.5ns in newest version

  16. Myrinet-switch Cluster Computing • Worm-hole routed – 5 ns route time • Process to process – 9us (133 MHz LANai) – 7us (200 MHz LANai)

  17. Myrinet Cluster Computing

  18. Myrinet Prices Cluster Computing • PCI/SAN interface – $495, $595, $795 • SAN Switch – 8 port $4,050 – 16 port $5,625 – 128 port $51,200 • 10 ft. cable $75

  19. Memory Channel Cluster Computing • Digital Equipment Corporation product • Raw performance: – Latency 2.9 us – Bandwidth 64 MB/s • MPI performance – Latency 7 us – Bandwidth 61 MB/s

  20. Memory Channel Cluster Computing

  21. Memory Channel Cluster Computing

  22. SCI Cluster Computing • Scalable Coherent Interface • IEEE standard • Not widely implemented • Coherency protocol is very complex – 29 stable states – An enourmous amount of transient states

  23. SCI Cluster Computing

  24. SCI Coherency Cluster Computing • States – Hom e: no remote cache in the system contains a copy of the block – Fres h: one or more remote caches may have a read-only copy, and the copy in memory is valid. – Gon e: another remote cache contains a writeable copy. There is no valid copy on the local node.

  25. SCI Coherency Cluster Computing • State is named by two components – ONLY – HEAD – TAIL – MID – Dirty: modified and writable – Clean: unmodified (same as memory) but writable – Fresh:data may be read, but not written until memory is informed – Copy: unmodified and readable

  26. SCI Coherency Cluster Computing • List constructio n: adding a new node (sharer) to the head of a list • Rollou t: removing a node from a sharing list, which requires that a node communicate with its upstream and downstream neighbors informing them of their new neighbors so they can update their pointers • Purging (invalidation ): the node at the head may purge or invalidate all other nodes, thus resulting in a single-element list. Only the head node can issue a purge.

  27. Atoll Cluster Computing • University research project • Should be very fast and very cheap • Keeps comming ’very soon now’ • I have stopped waiting

  28. Atoll Cluster Computing • Grid architecture • 250 MB/sec bidirectional links – 9 bit – 250MHz clock

  29. Atoll Cluster Computing

  30. Atoll Cluster Computing

  31. Atoll Cluster Computing

  32. Servernet-II Cluster Computing • Supports 64-bit, 66-MHz PCI • Bidirectional links – 1.25+1.25Gbit/sec • VIA compatible

  33. Servernet II Cluster Computing

  34. Servernet-II Cluster Computing

  35. Infiniband Cluster Computing • New standard • An extension of PCI-X – 1x = 2.5Gbps – 4x = 10Gbps – current standard – 12x = 30Gbps

  36. InfiniBand Price / Performance Cluster Computing Myrinet InfiniBand 10GigE GigE Myrinet D PCI-Express E 900MB/s 100MB/s 245MB/s 495MB/ Data Bandwidth 950MB/s s (Large Messages) MPI Latency 5us 50us 50us 6.5us 5.7us (Small Messages) HCA Cost $550 $2K-$5K Free $535 $880 (Street Price) $100- Switch Port $250 $2K-$6K $400 $400 $300 Cable Cost $100 $100 $25 $175 $175 (3m Street Price) • Myrinet pricing data from Myricom Web Site (Dec 2004) ** InfiniBand pricing data based on Topspin avg. sales price (Dec 2004) *** Myrinet, GigE, and IB performance data from June 2004 OSU study • Note: MPI Processor to Processor latency – switch latency is less

  37. InfiniBand Cabling Cluster Computing • CX4 Copper (15m) • Flexible 30-Gauge Copper (3m) • Fiber Optics up to 150m

  38. Cluster Computing The InfiniBand Driver Architecture APPLICATION INFINIBAND SAN NETWORK NFS-RDMA User BSD Sockets BSD Sockets FS API UDAPL Kernel TCP FILE SYSTEM SDP SDP DAT TS TS IP SCSI API SRP IPoIB Drivers FCP VERBS ETHER INFINIBAND HCA FC INFINIBAND SWITCH ETHER FC SWITCH ETH GW FC GW SWITCH E SAN LAN/WAN SERVER FABRIC

Recommend


More recommend