MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh - PDF document

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh 3D-torus, tree fat tree, hypercube, star, vulcan switch, cube connected cycl’ omega, crossbar, M M M P P P etc, ...... M M Lecture 6: P P Network (mess.pass) MIMD Machines with Distributed M P P M P P P M M M Memory and Network • No shared memory or address space between processors • Node = processor + local-private-memory = autonomous computer – Separate instruction streams for different processors • The processor is often a high-end RISC processor • The nodes are connected through a network • Memory hierarchy: register, cache, private memory, remote memory 1 2 MIMD Multicomputer Message format • Data shared through message passing – SEND/RECEIVE pair – Are also called Message Passing Systems • Communication is more expensive than computations message – Best suited for large grain parallelism packet (64-512 bit) • Advantages – Increased number of processors increases memory and flit (typically 8bit) D D D D D S R memory bandwidth • D, data • Disadvantages • S, sequence number – Can be difficult to map existing data structures – The programmer must explicitly do the message passing • R, routing information 3 4 Cut-Through Routing (wormhole) Store & Forward Routing (obsolete!) • Split the packet in small chunks that are • Packet smallest unit (Old model not used any more) passed on directly when they get to a node in • Packets traveling more than one link is received the path (without being stored) and stored at every node in the path before it is • L = packet length (bits), B = bandwidth (bits/sec), l = number of sent on hops, L f = flit size, L = packet length (bits), B = bandwidth (bits/sec), l = number of L >> L f . • hops • T = L / B + (L f / B) • l • T = (L / B) • l T T N1 N1 N2 N2 D N3 D N3 N4 N4 Time Time packet = header & data packet = header & data 5 6

Message Passing Programming Example: Message Passing on the IBM SP • Formulate programs in terms of sending and • The message is split in a number of packets receiving messages – the packet is split in a number of flits – one flit = 1 byte • Programs are portable between different – the packets varies in length up to 255 flits machines with small changes (e.g., IBM SP, – flit 1 = packet length, flit 2 - flit r = routing info, the rest is data cluster, network of work stations) – each switch step: checks the 1st route-flit, determines the “out- port”, removes the flit (when the packet has reached the destination it no longer contains any route-flits) • To be able to exchange messages the nodes • Buffered Wormhole routing have to know – a flit is moved to the “out-port” as soon as it has arrived on the “in- port” • each others identity – “the right” “out-port” is determined by the second flit in the • the size of the message packet • content type of the message – if links are busy, flits are stored in a for all “in-ports” shared 7 8 buffer (central queue) Execution of programs Programming DMM systems 0 2 (SPMD-Single Program Multiple Data) 1 program myfirst 0 1 2 me = myid() npr = numprocs() 0 2 1 x = rand() program myfirst program myfirst program myfirst write(x,me) me = 1 me = 2 me = 0 to = me+1 mod npr npr = 3 npr = 3 npr = 3 x = 0.65 x = 0.1 from = me+npr-1 mod npr x = 0.25 write(0.65, 1) write(0.1, 2) write(0.25, 0) send(x, to) to = 1 to = 2 to = 0 receive(y,from) from = 0 from = 1 from = 2 write(y,me) send(0.25, 1) send(0.65, 2) send(0.1, 0) receive(y, 0) receive(y, 1) receive(y, 2) end 1. The same program is loaded on all nodes write(0.25, 1) write(0.65, 2) write(0.1, 0) 2. Every node starts to execute its own copy of the program end end end 3. Data is exchanged by send and receive pairs 4. Synchronizing implicit by communication explicit by barriers 9 10 Communication Network • Minimize communication overhead The topology of a network can be either static • Communication cost for ”point-to-point” Store-and-forward: t c = ( α + β L) * #hops or dynamic Cut-through: t c = α + β L • Statical Networks ♦ α is the startup cost (”node latency”) and β is the cost per unit (usally bytes) per link – Point-to-point, direct connections ♦ α >> β ⇒ − α dominates for small messages – Does not change during program execution − β dominates for large messages • Dynamical Networks Collective operations: broadcast, gather, scatter etc. • Group messages whenever possible – Switches • Use more than one copy of the data? – Dynamically configured to fit the needs of • Repeat identical computations/share the result? the program • Overlapping computations/communication? 11 12

Network Parameters Static Connected Networks • Linear array • Network size ( N ): number of nodes D – N nodes, N-1 links • Node degree ( d ) – Inner nodes: node degree d = 2, outer nodes: node degree d = 1 – number of links/channels on each node – diameter D = N-1 – bisection width b = 1 – one-way channels in/out degree, • Ring, node degree d = 2 (node degree = in + out degree) – N nodes, N links – small node degree good , cheap and modular – diameter D = N (for bidirectional ring N/2) – bisection width b = 2 • Network diameter ( D ) • Ring, node degree d = 3 , Chorodal ring of degree 3 – max(shortest path between two arbitrary nodes) – Increase number of links, higher node degree and lower – measured in number of links traversed (hops) diameter • Ring, node degree d = N-1 : Fully coupled network – small diameter is good – diameter D = 1 • Bisection width ( b ) • Star (tree with two levels) – degree d = N-1 , diameter D = 2 – smallest number of edges between two halves of the • Fat tree network 13 14 – more (fatter) channels closer to the root Static Connected Networks HPC2N Super Cluster – Mesh & Torus seth.hpc2n.umu.se • Mesh, k-dimensional with • 240 processors i 120 dual nodes N = n k nodes has • Nodes connected in a 4 x 5 x 6 mesh network with “wrap arounds” – node degree d = 2k for the inner nodes • Built by HPC2N (and CS) – diameter D = k(n-1) • 12 racks – Note! not symmetric; edge nodes • Was build during April-May 2002 • Available for users 2002-06 has degree 2 or 3 (for k=2) • 83% usage (24h/day – 7 days/week) since • Torus 2002-08 • Total peak performance 800 Gflops/s – nxn mesh with wraparounds in row & column dimensions • Top500 2002-06: – symmetric – 94 th fastest in the world • d = 4 – fastest in Sweden • D = 2n/2 = n • Financing: 5 MSEK from the Kempe 15 16 Foundations Seth - Pallas MPI Benchmark SCI Network • Wulfkit3 SCI network by Dolphin IS, Norway • The nodes are connected in a 4 x 5 x 6 mesh network with “wrap arounds” • 667 Mbytes/s peak bandwidth => peak 1.43E-9 sec/byte • 1.46 µ sec node latency • ScaMPI message passing library (MPI) from Scali AS • Software for system surveillance and system management 230 Mbytes/s max bandwidth 3.7 µ sec minimum latency (Multi(16) is pairwise ping-pong between 8 pairs of processors) 17 18

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh - PDF document

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh 3D-torus, tree fat tree, hypercube, star, vulcan switch, cube connected cycl omega, crossbar, M M M P P P etc, ...... M M Lecture 6: P P Network (mess.pass) MIMD

Straight line drawing of a graph on the flat torus Luca Castelli Aleardi, LIX Olivier

MIMD Overview Intel Paragon XP/S Overview MIMDs in the 1980s and 1990s

DEMO: torus example DEMO: torus example DEMO: torus example M Datar, Y Gur, B Paniagua, MA

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

ring worm ring worm herpes ring worm herpes staphylococcal ring worm herpes impetigo

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

singly linked lists Sept. 18, 2017 1 Recall last lecture: Java array array array array of

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

Mesh Models (Chapter 8) 1. Overview of Mesh and Related models. a. Diameter: The linear

Random walk on the torus Jean-Baptiste Boyer (IMB / ModalX) May 16, 2016 Jean-Baptiste Boyer

Ring A Vaginal Ring Containing Dapivirine for HIV-1 PrEP Ring Study: Background Study Design:

Scalable Content- Addressable Network Eireann Leverett How Torus We use a Torus because it is

Fourier analysis of Discrete Dirac n torus Nelson Faustino operators on the n torus R n /

K 7 in the torus: a long story Thomas W. Tucker Colgate University ttucker@colgate.edu The

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

trr s tr q

Medium Access Control Guevara Noubir CS4700 CS5700 Fundamentals of

EDGE, TRIPS, and CLP Bending architecture to fit workload Zachary Weinberg 22 Jan 2009 The

The Time-Triggered Architecture Peter Bhm 28.9.05 Overview 1. Introduction 2. Network

S Summary of f Technical Technical Achievements Sverre Jarp, CERN openlab CTO Sverre Jarp,

An Intra-Chip Free-Space Optical Interconnect Jing Xue , Alok Garg, Berkehan Ciftcioglu, Jianyun

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Chapter V: Indexing & Searching Information Retrieval & Data Mining Universitt des

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh - PDF document

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh 3D-torus, tree fat tree, hypercube, star, vulcan switch, cube connected cycl omega, crossbar, M M M P P P etc, ...... M M Lecture 6: P P Network (mess.pass) MIMD

Straight line drawing of a graph on the flat torus Luca Castelli Aleardi, LIX Olivier

MIMD Overview Intel Paragon XP/S Overview MIMDs in the 1980s and 1990s

DEMO: torus example DEMO: torus example DEMO: torus example M Datar, Y Gur, B Paniagua, MA

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

ring worm ring worm herpes ring worm herpes staphylococcal ring worm herpes impetigo

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

singly linked lists Sept. 18, 2017 1 Recall last lecture: Java array array array array of

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

Mesh Models (Chapter 8) 1. Overview of Mesh and Related models. a. Diameter: The linear

Random walk on the torus Jean-Baptiste Boyer (IMB / ModalX) May 16, 2016 Jean-Baptiste Boyer

Ring A Vaginal Ring Containing Dapivirine for HIV-1 PrEP Ring Study: Background Study Design:

Scalable Content- Addressable Network Eireann Leverett How Torus We use a Torus because it is

Fourier analysis of Discrete Dirac n torus Nelson Faustino operators on the n torus R n /

K 7 in the torus: a long story Thomas W. Tucker Colgate University ttucker@colgate.edu The

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

trr s tr q

Medium Access Control Guevara Noubir CS4700 CS5700 Fundamentals of

EDGE, TRIPS, and CLP Bending architecture to fit workload Zachary Weinberg 22 Jan 2009 The

The Time-Triggered Architecture Peter Bhm 28.9.05 Overview 1. Introduction 2. Network

S Summary of f Technical Technical Achievements Sverre Jarp, CERN openlab CTO Sverre Jarp,

An Intra-Chip Free-Space Optical Interconnect Jing Xue , Alok Garg, Berkehan Ciftcioglu, Jianyun

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Chapter V: Indexing &amp; Searching Information Retrieval &amp; Data Mining Universitt des

Chapter V: Indexing & Searching Information Retrieval & Data Mining Universitt des