A Comparison of Two Gigabit SAN/LAN Technologies: SCI versus - PowerPoint PPT Presentation

EMMSEC 98: European Multimedia, Microprocessor Systems and Electronic Commerce Conference and Exposition A Comparison of Two Gigabit SAN/LAN Technologies: SCI versus Myrinet Ch. Kurmann, T. Stricker Laboratory for Computer Systems ETHZ - Swiss Institute of Technology CH-8092 Zurich Ecole polytechnique fédérale de Zurich Eidgenössische Politecnico federale di Zurigo Technische Hochschule Swiss Federal Institute of Technology Zurich Zürich

Motivation ■ Evaluation and comparison of Gigabit/sec interconnects need a common architectural denominator ■ We propose three different levels: Transfer-rate by protocol and API (large blocks) ◆ highly optimized remote 120 load/store operation 100 ◆ optimized standard Transfer rate (MByte/s) 80 message passing library 60 ◆ connection oriented 40 LAN emulation 20 ? X 0 CoPs-SCI CoPs-Myrinet Cray T3D Raw Deposit MPI (full semantics) TCP/IP MPI (restricted semantics) 2

Overview ■ Levels of comparison ■ Previous work ■ Technologies overview: ◆ PC Platform ◆ Myricom Myrinet ◆ Dolphin CluStar SCI ◆ SGI / Cray T3D ■ Typical transfer modes ■ Measurement results ■ Conclusion 3

Levels of Comparison ■ Three levels with different amount of support by the operating system : ◆ DIRECT DEPOSIT: ✦ simple remote load/store operation ✦ performance is expected to be closest at actual hardware peak performance ◆ MPI/PVM: ✦ optimized standard message passing library ✦ carefully coded parallel applications are expected to see this performance ◆ TCP/IP: ✦ connection oriented TCP/IP LAN emulation ◆ .... ■ Common architectural denominator 4

Previous Work ■ Previous studies: ◆ maximum bandwidth numbers ◆ minimal latency numbers ◆ performance results for an entire application ■ Performance of application depends: ◆ redistribution of data stored in distributed arrays ◆ migration of data in fine grain object store ■ We need a benchmark that covers data types beyond contiguous blocks of data (e.g. strided remote stores). 5

Direct Deposit ■ The deposit model requires a clean separation and different mechanisms for: ◆ control messages, ◆ data messages . ■ Data is “dropped” directly into the receivers address space by the hardware without active participation of the receiver process. ■ Allows to copy fine grained data involving complex access patterns like strides. 6

Message Passing Libraries ■ Sender can send messages at any time, without waiting for the receiver to be ready ■ Buffering is often done at a higher level and involves the memory system of the end-points ■ Fine grain data accesses are implemented through buffer-packing / -unpacking 7

Message Passing Model ■ Different flavors for ping prog lib net lib prog pong restricted and full send(B1,P1) recv(B2) B1 postal semantics B2 end_send end_recv ◆ non-buffering semantics: recv(B4) send(B3,P0) B3 can be mapped directly to B4 a fast direct deposit end_recv end_send including synchronization ◆ buffering semantics: ping prog lib net lib prog ping non-blocking operation send(B1,P1) send(B3,P0) allows sending at any B3 B1 end_send end_send time and leads to an barrier additional copy operation recv(B4) recv(B2) B4 B2 end_recv end_recv 8

Protocol Emulation ■ Popular API � much software ◆ UDP/IP - unreliable, connectionless network service ◆ TCP/IP - allows reliable connection-oriented communication ◆ NFS/IP - network file system ■ Protocol stacks are provided by the OS ■ Socket API, streams API are ubiquitous ■ It is unrealistic to recode all commercial web servers, databases or middleware systems for message passing APIs like MPI. ■ With IP support gigabit networks can speedup much more than just scientific applications! 9

PC Platform for this Talk ■ Single/Twin Pentium Pro 200MHz ■ Intel 440 FX Chipset ■ 64-bit 66 MHz main memory interface, 0.5 GByte/s ■ 32-bit 33 MHz PCI bus, 132 MB/s ~ 3000 per node 10

Myricom Myrinet ■ Two 1.28 Gbit/s channels (duplex) connecting hosts and (4, 8, 16-port) switches point-to-point ■ Supports any topology with switches, hot configurable ■ Wormhole routing with link level flow control guarantees the delivery of messages ■ Checksumming for error detection ■ Packets of arbitrary length (unlimited MTU) � can encapsulate any type of packets 11

Myricom Myrinet Adapter ■ RISC processor (LANai) Pentium Pro ■ 1MB SRAM to store MCP and to act as staging memory Host LANai Bus for buffering packets RISC ■ Bus Master DMA PCI-Memory PCI NI adapter-to-host (for the PCI) Bridge Mem ■ Two DMAs between memory Mem and network FIFOs Bus DMA ■ Concurrent operation of Memory DMAs 12

Myrinet Control Program ■ The LANai is a 32-bit dual-context RISC Processor with 24 general purpose registers that runs the Myrinet Control Program (MCP) ■ A typical MCP provides: ◆ control message ◆ routing table generation, management, ◆ scattering operation, ◆ gathering operation, ◆ interrupts generation ◆ checksumming, upon arrival ◆ send / receive operation, 13

Dolphin CluStar SCI ■ Two unidirectional 1.6 Gbit/s links (CluStar: 3.2 Gbit/s ) ■ Multidimensional rings and switched ringlets ■ Protocol uses data sizes of 16, 64, 256 Bytes ■ Transparent PCI-to-PCI bridge operation through memory mapped load/store interface ■ Possibility for fully coherent shared memory on high end implementations beyond PCI products ■ Per word remote memory and block transfers for message passing operation 14

Dolphin CluStar SCI Adapter ■ Protocol engine Pentium ◆ 8 64Byte stream buffers Pro ◆ PCI-SCI memory address Host mapping by ATT Bus PCI- ◆ Busmaster DMA SCI PCI-Memory Bridge PCI NI ■ Link controller Bridge ◆ Contains 3 FIFOs (TX, Mem RX, Transit) Bus DMA ■ The PCI-adapter supports a Memory subset of IEEE-SCI without hardware cache coherency 15

SGI / Cray T3D as Reference Point ■ 150 MHz 64bit DEC Alpha DEC Alpha 21064 ■ No virtual memory ■ ca. 1.28 Gbit/link Send annex ■ 3D torus topology Bus ■ Memory mapped network NI interface to send remote stores Deposit Fetch ■ Fetch/deposit engine with engine separate memory bus (no involvement of processor) Memory 16

Typical Transfer Modes ■ Peak bandwidth for large block transfers (zero-copy) ■ Reduced bandwidth for remote memory operation including fine grain accesses to the memory system ■ There are two modes for fine grain transfers: processor driven versus DMA driven: ◆ Remote loads/stores by either the processor or the DMA (Direct Deposit Model) ◆ Buffer-packing/-unpacking at the sender/receiver by either the processor or the DMA (Messaging Model) 17

Myricom Myrinet Pentium Pentium Pro Pro Network Host Host LANai LANai Bus Bus RISC RISC PCI-Memory PCI-Memory PCI Bus PCI Bus NI NI Bridge Bridge Mem Mem Mem Mem Bus Bus DMA DMA Memory Memory Direct mapped transfer Buffer-packing transfer 18

Deposit on Myrinet Intel Pentium Pro (200 MHz) with Myrinet 90 126 local memory ● ❍ 80 remote memory, ● 70 direct Throughput (Mbyte/s) 60 remote memory, ● DMA plus unpack 50 ❍ 40 ❍ 30 ● ● ❍ 20 ● ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 12 16 24 32 48 64 Store Stride (1: contiguous 2-64: strided) 19

Deposit Dolphin CluStar SCI Pentium Pentium Pro Pro Network Host Host Bus Bus PCI- PCI- SCI SCI PCI-Memory PCI-Memory Bridge PCI Bus Bridge PCI Bus NI NI Bridge Bridge Mem Mem Bus Bus DMA DMA Memory Memory Direct mapped transfer Buffer-packing transfer 20

Deposit on SCI Intel Pentium Pro (200 MHz) with SCI Interconnect 90 ● CluStar local memory ❍ 80 remote memory, ● 70 ● direct Throughput (Mbyte/s) 60 remote memory, ● DMA plus unpack 50 ❍ 40 ❍ 30 ● ❍ ● 20 ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 12 16 24 32 48 64 Store Stride (1: contiguous 2-64: strided) 21

SGI / Cray T3D DEC Alpha DEC Alpha 21064 21064 Network Send Send annex annex Bus Bus NI NI Deposit Deposit Fetch Fetch engine engine Memory Memory Direct mapped transfer Buffer-packing transfer 22

Deposit on SGI / Cray T3D Cray T3D: Copies to local and remote memory ● 120 local memory ❍ remote memory, direct ● 100 remote memory, ❍ ❍ Throughput (Mbyte/s) ● unpack at receiver 80 ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ● 60 ● ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 0 1 2 3 4 5 6 7 8 12 16 24 32 48 64 Store Stride (1: contiguous 2-64: strided) 23

A Comparison of Two Gigabit SAN/LAN Technologies: SCI versus - PowerPoint PPT Presentation

EMMSEC 98: European Multimedia, Microprocessor Systems and Electronic Commerce Conference and Exposition A Comparison of Two Gigabit SAN/LAN Technologies: SCI versus Myrinet Ch. Kurmann, T. Stricker Laboratory for Computer Systems ETHZ -

Using Ada95 to Build Software for a Gigabit Layer 7 IP Using Ada95 to Build Software for a Gigabit

Gigabit Ethernet Gigabit Ethernet implementation for implementation for FPGAs FPGAs Grzegorz

Towards Wireless Multi-Gigabit Systems Towards Wireless Multi Gigabit Systems Channel Models,

Comparison of Three Comparison of Three Wireless Based Wireless Based Technologies for

Directory Replication: from Gigabit LAN to HF Radio Steve Kille CEO October 2011 LDAPCon,

Modifying existing applications for 100 Gigabit Ethernet Jelte Fennema University of Amsterdam

Building Gigabit Britain FCS Comms Provider 8 Sept 2016 Rob Hamlin Commercial Director

GIGABIT DATA TRANSMISSION Sebastian Dittmeier Physikalisches Institut - Universitt Heidelberg

Bit Error Rate Test of DHPT 1.0 Gigabit Serial Link Leonard Germic , Carlos Marinas, Hans Krger

Gigabit Broadband, Interconnec1on proposi1ons, and the Challenge of Managing Expecta1ons Steven

Final Gigabit Kit s Workshop J une 18, 19 2002 J onat han Turner j st @cs.wust l.edu Washingt

Summary: Netstation Properties 1. Physical Attributes - Gigabit channels. No slot limit.

San Remo HOUSE DESIGN - SAN REMO San Remo HOUSE DESIGN - SAN REMO San Remo 11,040 8,060

EVALUATI ON OF BOTANI CAL PI SCI CI DES EVALUATI ON OF BOTANI CAL PI SCI CI DES ON NI LE TI LAPI A

Department of Consumer and Food Sciences Programmes offered: B Cons Sci Clothing Retail

SCI Prague Report Wolf-Ulrich Knoben, SCI Chair 23 June 2012 Agenda items Deferral of

The IBM 4758 Secure Cryptographic Coprocessor Hardware Architecture and Physical Security Steve

System-Level Sushil Menon 1 & Dr. Suryaprasad J Center for Electronic System Level Design

EECS 373 Design of Microprocessor-Based Systems Memory-Mapped I/O Example Bus with Memory-Mapped

Visualisierung 1 2015W, VU, 2.0h, 3.0EC 186.827 Eduard Grller Johanna Schmidt Oana Moraru

Reactive Synthesis Swen Jacobs <swen.jacobs@iaik.tugraz.at> VTSA 2013 Nancy, France

The I 2 C BUS Interface Corrado Santoro ARSLAB - Autonomous and Robotic Systems Laboratory

Horror on Horr Horr Horror on or on the b or on the b the bus the bus Hacking COMBUS in a

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

A Comparison of Two Gigabit SAN/LAN Technologies: SCI versus - PowerPoint PPT Presentation

EMMSEC 98: European Multimedia, Microprocessor Systems and Electronic Commerce Conference and Exposition A Comparison of Two Gigabit SAN/LAN Technologies: SCI versus Myrinet Ch. Kurmann, T. Stricker Laboratory for Computer Systems ETHZ -

Using Ada95 to Build Software for a Gigabit Layer 7 IP Using Ada95 to Build Software for a Gigabit

Gigabit Ethernet Gigabit Ethernet implementation for implementation for FPGAs FPGAs Grzegorz

Towards Wireless Multi-Gigabit Systems Towards Wireless Multi Gigabit Systems Channel Models,

Comparison of Three Comparison of Three Wireless Based Wireless Based Technologies for

Directory Replication: from Gigabit LAN to HF Radio Steve Kille CEO October 2011 LDAPCon,

Modifying existing applications for 100 Gigabit Ethernet Jelte Fennema University of Amsterdam

Building Gigabit Britain FCS Comms Provider 8 Sept 2016 Rob Hamlin Commercial Director

GIGABIT DATA TRANSMISSION Sebastian Dittmeier Physikalisches Institut - Universitt Heidelberg

Bit Error Rate Test of DHPT 1.0 Gigabit Serial Link Leonard Germic , Carlos Marinas, Hans Krger

Gigabit Broadband, Interconnec1on proposi1ons, and the Challenge of Managing Expecta1ons Steven

Final Gigabit Kit s Workshop J une 18, 19 2002 J onat han Turner j st @cs.wust l.edu Washingt

Summary: Netstation Properties 1. Physical Attributes - Gigabit channels. No slot limit.

San Remo HOUSE DESIGN - SAN REMO San Remo HOUSE DESIGN - SAN REMO San Remo 11,040 8,060

EVALUATI ON OF BOTANI CAL PI SCI CI DES EVALUATI ON OF BOTANI CAL PI SCI CI DES ON NI LE TI LAPI A

Department of Consumer and Food Sciences Programmes offered: B Cons Sci Clothing Retail

SCI Prague Report Wolf-Ulrich Knoben, SCI Chair 23 June 2012 Agenda items Deferral of

The IBM 4758 Secure Cryptographic Coprocessor Hardware Architecture and Physical Security Steve

System-Level Sushil Menon 1 &amp; Dr. Suryaprasad J Center for Electronic System Level Design

EECS 373 Design of Microprocessor-Based Systems Memory-Mapped I/O Example Bus with Memory-Mapped

Visualisierung 1 2015W, VU, 2.0h, 3.0EC 186.827 Eduard Grller Johanna Schmidt Oana Moraru

Reactive Synthesis Swen Jacobs &lt;swen.jacobs@iaik.tugraz.at&gt; VTSA 2013 Nancy, France

The I 2 C BUS Interface Corrado Santoro ARSLAB - Autonomous and Robotic Systems Laboratory

Horror on Horr Horr Horror on or on the b or on the b the bus the bus Hacking COMBUS in a

Real Time Embedded Systems &quot; Memories Memories &quot; rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

System-Level Sushil Menon 1 & Dr. Suryaprasad J Center for Electronic System Level Design

Reactive Synthesis Swen Jacobs <swen.jacobs@iaik.tugraz.at> VTSA 2013 Nancy, France

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL