Short Summery • Taxonomy of parallel computers – SISD: von Neumann model – SIMD: Single Instruction Multiple Data – MIMD: Multiple Instruction Multiple Data
A different taxonomy • SISD, SIMD, MIMD refer to the processor organization • With respect to the memory organization, the two fundamental models are: – Distributed memory architecture • each processor has its own private memory – Shared address space architecture • al processor have access to a same address space
Memory Organizations
Memory Organizations II • Pure shared-memory model (fig. (a)) need substantial interconnection bandwidth • Shared-address-space computers can have a local memory to speed access to non-shared data – Figures (b) and (c) in previous slide – So called Non Uniform Memory Access (NUMA) as opposed to Uniform Memory Access (UMA) has different access times depending on location of data • To reduce speed differential, local memory can also be used to cache frequently used shared data (Example: Stanford Dash) – Use of cache introduces the issue of cache coherence. – In some architectures local memory is entirely used as cache – so called cache-only memory access (COMA). Example: KSR-1
Shared vs Distributed Memory • By general consensus, shared address space model is easier to program but much harder to build – Caching on local memory is critical for performance, but makes the design much harder and introduces inefficiencies • There is a growing trend toward hybrid designs – i.e. clusters of SMPs, or NUMA machines with physically distributed memory
Interconnection Networks • The interconnect is the crucial component of any parallel computers • Static vs. Dynamic networks – Static • built out of point-to-point communication links between processors (also known as direct networks) • Usually associated to message passing architectures • Examples: Completely-/Star-connected, linear array, ring, mesh, hypercube – Dynamic • built out of links and switches (also known as indirect networks) • Usually associated to shared address space architectures • Examples: crossbar, bus-based, multistage
Crossbar Switching Networks • Crossbar switch – Digital analogous of a switching board – Allows connection of any of p processors to any of b memory banks – Examples: Sun Ultra HPC 1000, Fujitsu VPP 500, Myrinet switch
Bus-based Networks • Very simple concept, its major drawback is that bandwidth does not scale up with number of processors – Caches can alleviate problem because reduce traffic to memory
Multistage Interconnection Network • Multistage networks are a good compromise between cost and performance – More scalable in terms of cost than crossbar, more scalable in terms of performance than bus – Popular schemes include omega and butterfly networks
Omega Network
Recommend
More recommend