1 5 2012
play

1/5/2012 Overview of Interconnects Presentation Outline Myrinet - PDF document

1/5/2012 Overview of Interconnects Presentation Outline Myrinet and Quadrics General Concepts of Interconnects Myrinet Components Leading Modern Interconnects Communication features Latest Products Quadrics Components


  1. 1/5/2012 Overview of Interconnects Presentation Outline Myrinet and Quadrics  General Concepts of Interconnects  Myrinet – Components Leading Modern Interconnects – Communication features – Latest Products  Quadrics – Components – Communication features – Performance – Latest Release  Our Research Interconnects Switch-Based Interconnects  Link Fiber or Cables  Shared-medium Interconnects  Network Interfaces – LAN (Ethernet)  Switches: Crossbar Switches  Router-based Interconnects  Interconnection of Switches – Intel Paragon, Cray T3D, Cray T3E  Switch-based Interconnects – Myrinet, Quadrics, InfiniBand Basic Switching Unit Interconnects Issues (Crossbar Switch)  Communication Features – Basic issues:  bit encoding  framing  switching/routing  flow control (deadlock)  error-control (reliability) – Advanced Issues:  Memory management  Message passing semantics  offloading of the protocol processing  Multiple rails, message striping  Performance and Scalability 1

  2. 1/5/2012 Basic Switching Switching Technology  Circuit Switching  Packet Switching  Virtual Cut-through  Wormhole Switching Packet Switching Circuit Switching Wormhole Switching Virtual Cut-Through Switching 2

  3. 1/5/2012 Blocking in Wormhole Network Virtual Channels Myrinet Origin Presentation Outline (www.myri.com)  General Concepts of Interconnects  Mosaic High data rates  Myrinet – Regular topology and scalability – – Components Very low data rates – – Communication features Cut-through routing – – Performance Flow-control at every link – – Latest Release  Atomic LAN Achieve high data rates 10 15 bits per second  Quadrics – Limitations: – – Components  asynchronous signaling – Communication features  complex mapping – Performance  lack of DMA engine  multiple copies through TCP/IP stack – Latest Release  Our Research Myrinet Links Myrinet Packets  Cable links  Packet Format 18 twisted pairs, nine in each directions Header ( up to 24 bytes) – – Synchronous transmission, avoids asynchronous signaling Arbitrary length payload – – Maximal 25m cables CRC, error-control – –  Flow Control Gap – Receiver Driver – Slack Buffer – Stop and Go signals – 3

  4. 1/5/2012 Myrinet Network Interface Myrinet Switch  Host Interface  Basic Unit Programmable Processor Crossbar switch – – DMA engine, CRC-capable Worm-hole routing – – Packet interface Easy network-mapping – – 550ns switch latency  Optical-Fiber Interface – OSI level-2 and level-3  Topology – ~1500 bytes slack buffer Clos Network – – Full bisectional bandwidth – Easy Connections into larger network – Software Stack Myrinet Products  Cutting-edge interconnect technology for many  MCP Host years (Many TOP500 systems during 1995-2002, running on the host interface Application – gradually declining) Perform continuous mapping, – monitoring and route updating MPI, Sockets, etc.  High performance, low latency and highly reliable IP Multicast capable –  Self configurable and fault-tolerant GM   Capable of being I/O Fabric User-level API GM kernel module – Kernel Agent  Ideal for cluster-computing user-level API –  Recently moved to a dual strategy Provide interafce between user – Proprietary adapter – processes and NIC Myrinet Control Program 10GbE adapter – Programming Libraries  NIC MPI, sockets, etc. – Presentation Outline Quadrics Components  Hardware Components  General Concepts of Interconnects Network Interfaces, Elan 3 –  Myrinet Switches, Elite – – Components  Software – Communication features elan3lib – elanlib – Latest Products –  Quadrics – Components – Communication features – Performance – Latest Release  Our Research 4

  5. 1/5/2012 Network Interface NIC: Elan 3  Link physical layer Full duplex 10 bit, 400Mbaud Link –  Elan 3 (QM400) Network Adapter 64 bit/66MHz PCI Bus – Programmable I/O processor –  Support Multiple threads 100MHz Integrated DMA engine –  Automatic packetisation and scheduling Dedicated input packet processing engine – 8KB on chip cache – 64MB SDRAM with MMU + TLB – Supported OS: Tru64 UNIX™ and Linux™ – Communication Libraries –  MPI, Shmem, kernel messaging & IP Microcode Processor Thread Processor  Control Processor for Elan 3  Basic Features  Execute four threads – 100 MHz – 32 bit RISC – Command processing – Extended instruction set – Thread scheduling – 4-stage pipeline – Inputter thread – 32 registers – DMA thread  Execute user threads – Provide NIC programmability Other Processors Memory Management  Input Processor  64 MB SDRAM – Processing network packets,  8K 4-way Set Associative Cache – Assemble data into transactions  MMU – Initiate the transactions for Microcode Processor  DMA Processor – Address Elan or Main Memory – Service user RDMA read and write requests – Synchronized with Main Memory – Handle arbitrary source/destination buffer alignment – 16-entry TLB – Support broadcast/flood and Queue DMAs – Table Walk Engine 5

  6. 1/5/2012 Message Flow Path Messaging Protocol  Packet Format route, transactions, EOP –  Bit-level protocol 4B/5B, synchronous –  Flit-level protocol Flow Control, Error Control –  Packet-level Protocol Virtual cut-through – Error Control – Message Flow Switches: Elite User fills DMA descriptor using library calls 1.  Quaternary fat-tree topology Then informs Elan of descriptor via command port 2. – Eight bidirectional links Command processor checks descriptor parameters 3. Then adds it to DMA Queue – 16 x 8 crossbar switch 4. Data Transfer from Local to Remote Node 5. – 35ns switch latency Remote Inputter instructs DMA to send ACK 6. – Adaptive routing ACK received at Local Inputter 7. DMA ACK sets corresponding Event in Elan 8. – Hardware broadcast Event in Elan triggers Event in Main Memory to let 9. local process know DMA was successfully received 10. Remote Inputter copies data to Receive Buffer 11. DMA Event set in Remote Elan 12. Event in Elan triggers Event in Main Memory to let remote process know of DMA completion 13. Remote Process polls Event, discovers completion Hardware Broadcast Switch Functionality Adaptive Routing Hardware Broadcast 6

  7. 1/5/2012 Communication Libraries Performance (Elan-level)  elan3lib: 16 350 – Basic Communications 14 300 B a n d w id th (M B p s ) 12 – Hardware-related 250 T im e (u s ) 10 200  elanlib: 8 150 6 – Hardware Independent 100 4 – Tagged Message Passing 50 2 0 0 – Collective Communications 4 16 64 256 1K 4K 4 1 6 6 4 2 5 6 1 K 4 K 1 6 K 6 4 K 2 5 6 K 1 M  Broadcast, Barrier, Reduce Message Size (Bytes) Message Size (Bytes) Bandwidth Latency Barrier with hw/bcast Later Products  QsNet-II (Elan 4 and Elite 4) 5 Elan3 – PCI-X 4.5 – Link rate (1.333Gbaud) – 200MHz IO processor 4 lateccy ( us) – MMU (128-entry TLB, 64-bit addressing) 3.5 Elan3 – MPI latency < 3µs 3 – Bandwidth 900Mbytes/s – Max system size > 4K nodes 2.5  Moved to 10GbE world 2 2 4 6 8 10 12 14 16 Nodes Presentation Outline Myrinet  Active Network Interface Support  General Concepts of Interconnects A. Gulati, D. K. Panda, P. Sadayappan, and P. Wyckoff, NIC-based Rate –  Myrinet Control for Proportional Bandwidth Allocation in Myrinet Clusters, ICPP ‘01 S. Senapathi, B. Chandrasekharan, D. Stredney, H.-W. Shen, and D. K. – – Components Panda, QoS-aware Middleware for Cluster-based Servers to Support Interactive and Resource-Adaptive Applications, HPDC ’03 – Communication features D. Buntinas, D. K. Panda, J. Duato, and P. Sadayappan, – – Latest Products Broadcast/Multicast over Myrinet using NIC-Assisted Multidestination Messages, CANPC ‘03  Quadrics D. Buntinas, D. K. Panda and P. Sadayappan, Fast NIC-Based Barrier – over Myrinet/GM, IPDPS ‘01. – Components – D. Buntinas, D.K. Panda, and W. Gropp, NIC-Based Atomic Operations – Communication features on Myrinet/GM, SAN-1 D. Buntinas and D. K. Panda, NIC-Based Reduction in Myrinet Clusters: – Performance – Is It Beneficial? SAN-2 – Latest Products W. Yu, D. Buntinas, and D. K. Panda, High Performance and Reliable – NIC-Based Multicast over Myrinet/GM-2, ICPP ’03  Our Research 7

Recommend


More recommend