Shared Symmetric Memory Systems Computer Architecture J. Daniel - PowerPoint PPT Presentation

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 1/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures Introduction to multiprocessor architectures 1 2 Centralized shared memory architectures 3 Cache coherence alternatives 4 Snooping protocols 5 Performance in SMPs Conclusion 6 cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 2/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures Increasing importance of multiprocessors There is a decrease in silicon and energy efficiency as more ILP is exploited. Cost of silicon and energy grows faster than performance. Increasing interest in high performance servers. Cloud computing , software as a service , . . . Data intensive applications growth. Huge amounts of data on the Internet. Big data analytics . cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 3/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures TLP: Thread level parallelism TLP implies the existence of multiple program counters. Assumes MIMD. Generalized use of TLP outside scientific computing is relatively recent. New applications: Embedded applications. Desktop. High-end servers. cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 4/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures Multiprocessors A multiprocessor is a computer consisting of highly coupled processors with: Coordination and use typically controlled by a single operating system . Memory sharing through a single shared memory space . Software models : Parallel processing : Coupled set of cooperating threads. Request processing : Independent process execution originated by users. Multiprogramming : Independent execution of multiple applications. cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 5/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures Most common approach: From 2 to tenths of processors. Shared memory. Implies shared memory. Does not necessarily imply a single physical memory. Alternatives : CMP ( Chip Multi Processors ) or multi-core . Multiple chips. Each one may (or may not) be multi-core . Multicomputer : Weakly coupled processors not sharing memory. Used in large scale scientific computing. cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 6/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures Maximizing exploitation of multiprocessors: With n processors, at least n processes or threads are needed. Threads identification : Explicitly identified by programmer. Created by operating system from requests. Loop iterations generated by parallel compiler (e.g. OpenMP). High-level identification performed by programmer or system software with threads having enough number of instructions to execute. cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 7/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures Multiprocessors and shared memory SMP: Symmetric Multi-Processor DSM: Distributed Shared Memory Centralized shared memory. Memory is distributed across Share a single centralized processors. memory where all have equal Needed when the number of access time. processors is high. All multi-cores are SMP . NUMA : Non Uniform Memory UMA : Uniform Memory Access. Access Memory latency depends on Memory latency is uniform. data location. Communication through access to global variables. cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 8/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures SMP: Symmetric Multi Processor P1 P2 P3 P4 Private Private Private Private cache cache cache cache Shared cache Main memory cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 9/37

Shared Symmetric Memory Systems Introduction to multiprocessor architectures DSM: Distributed Shared Memory P1 P2 Mem I/O Mem I/O Interconnection network Mem I/O Mem I/O P3 P4 cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 10/37

Shared Symmetric Memory Systems Centralized shared memory architectures Introduction to multiprocessor architectures 1 2 Centralized shared memory architectures 3 Cache coherence alternatives 4 Snooping protocols 5 Performance in SMPs Conclusion 6 cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 11/37

Shared Symmetric Memory Systems Centralized shared memory architectures SMP and memory hierarchy Why using centralized memory? Multi-level large caches decrease memory bandwidth demand on main memory accesses. Evolution : 1. Single-core with memory in shared bus . 2. Memory connection in separated bus only for memory. cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 12/37

Shared Symmetric Memory Systems Centralized shared memory architectures Cache memory Kinds of data in cache memory: Private data : Data used by a single processor. Shared data : Data used by multiple processors. Problem with shared data: Datum may be replicated in multiple caches. Contention is decreased. Each processors accesses its local copy. If two processors modify their copies . . . Cache coherence? cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 13/37

Shared Symmetric Memory Systems Centralized shared memory architectures Cache coherence $t0 initially 1. Thread 1 Thread 2 Assuming lw $t0 , lw $t0 , write through. d i r x d i r x addi $t0 , $t0 , 1 sw $t0 , d i r x Process Instruction P1 Cache P2 Cache Main memory T1 Initially Not present Not present 1 T1 lw $t0 , dirx 1 Not present 1 T1 addi $t0 , $t0 , 1 1 Not present 1 T2 lw $t0 , dirx 1 1 1 T1 sw $t0 , dirx 2 1 1 cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 14/37

Shared Symmetric Memory Systems Centralized shared memory architectures Cache incoherence Why does incoherence happen? State duality : Global state → Main memory . Local state → Private cache . A memory system is coherent if any read from a location returns the most recent value that has been written to that location. Two aspects : Coherence : Which value does a read return? Consistency : When does a read get the written value? cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 15/37

Shared Symmetric Memory Systems Centralized shared memory architectures Conditions for coherence Program order preservation A read from processor P on location X after a write from processor P on location X, without intermediate writes on X by any other processor Q, always returns the value written by P . Coherent view of memory : A read from processor P on a memory location X, after a write form other processor Q on location X, returns the written value if both operations are separate enough in time and there are no intermediate writes on X. Writes serialization : Two writes on the same memory location by two different processors are seen in the same order by all the processors. cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 16/37

Shared Symmetric Memory Systems Centralized shared memory architectures Memory consistency Defines in which point in time a process reading values will see a written value. Coherence y consistency are complementary: Coherence : Behavior of reads and writes on a single memory location. Consistency : Behavior of reads and writes with respect to accesses to other memory locations. There are different consistency memory models. We will have a specific lecture on this problem cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 17/37

Shared Symmetric Memory Systems Cache coherence alternatives Introduction to multiprocessor architectures 1 2 Centralized shared memory architectures 3 Cache coherence alternatives 4 Snooping protocols 5 Performance in SMPs Conclusion 6 cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 18/37

Shared Symmetric Memory Systems Cache coherence alternatives Coherent multiprocessors A coherent multiprocessor offers: Shared data migration . A datum may be moved to a local cache and be used transparently. Decreases remote data access latency and bandwidth demand to shared memory. Shared data replication simultaneously read. Performs data copy in local cache. Decreases access latency and read contention. Critical properties for performance : Solution : Hardware protocol for keeping cache coherence. cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 19/37

Shared Symmetric Memory Systems Computer Architecture J. Daniel - PowerPoint PPT Presentation

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel Garca Snchez (coordinator) David Expsito Singh Francisco Javier Garca Blas ARCOS Group Computer Science and Engineering Department

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

Distributed Shared Memory 1 Distributed Shared Memory Making the main memory of a cluster of

Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

COMP 590-154: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors

Cap5 - Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Distributed Shared Memory Presented by Humayun Arafat 1 Outline Background Shared Memory,

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Threaded Programming Lecture 1: Concepts Overview Shared memory systems Basic Concepts

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Programming with Shared Memory In a shared memory system, any memory location can be accessible by

Shared Memory Bus for Multiprocessor Systems Mat Laibowitz and Albert Chiou Group 6 Shared

Distributed Shared Memory Distributed Shared Memory Systems Page based

Distributed Shared Memory and Machine Learning CSci 8211 Chai-Wen Hsieh 11/5/2018 Agenda

CS425/CSE424/ECE428 Distributed Systems Some material derived from slides by

MIMD Overview Intel Paragon XP/S Overview MIMDs in the 1980s and 1990s

63 rd meeting - Paris, France (July 31-August 5, 2005) Requirement of service provider for the

Fundamental Limits of Wireless Caching Under Mixed Cacheable and Uncacheable Traffic Hamdi Joudeh

State-of-the-art in Parallel Computing with R Markus Schmidberger

INF4140 - Models of concurrency Hsten 2015 October 19, 2015 Abstract This is the

Lecture 24: Virtual Memory, Multiprocessors Todays topics: Virtual memory

Terminology Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube,