Terminology Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze
Terminology 2 Parallel Programming Concepts | 2013 / 1014
Terminology 3 „When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.“ [Kansas legislature, early 20th century]
Terminology 4 ■ Concurrency □ Capability of a system to have two or more activities in progress at the same time □ May be independent, loosely coupled or closely coupled □ Classical operating system responsibility for a better utilization of CPU, memory, network, and other resources □ Demands scheduling and synchronization ■ Parallelism □ Capability of a system to execute activities simultaneously □ Demands parallel hardware, concurrency support, (and communication) ■ Any parallel program is a concurrent program ■ Some concurrent programs cannot be run as parallel program
Terminology 5 ■ Concurrency vs. parallelism vs. distribution □ Two threads started by the application ◊ Define concurrent activities in the program code ◊ Might (!) be executed in parallel ◊ Can be distributed on different machines ■ Management of concurrent activities in an operating system □ Multiple applications being executed at the same time □ Single application leveraging threads for speedup / scaleup □ Non-sequential operating system activities „The vast majority of programmers today don’t grok concurrency, just as the vast majority of programmers 15 years ago didn’t yet grok objects“ [Herb Sutter, 2005]
Concurrency [Breshears] 6 ■ Processes / threads represent the execution of atomic statements □ „Atomic“ can be defined on different granularity levels, e.g. source code line □ Concurrency should be treated as abstract concept ■ Concurrent execution □ Interleaving of multiple atomic instruction streams □ Leads to unpredictable result ◊ Non-deterministic scheduling, interrupts □ Concurrent algorithm should maintain its properties for all possible inter-leavings of sequential activities □ Example: All instructions are eventually included (fairness) ■ Some literature distinguishes between interleaving (uniprocessor) and overlapping (multiprocessor) of statements
Concurrency 7 ■ In hardware □ Context switch support Server ■ In operating systems Application □ Native process / thread support □ Synchronization support Server Middleware ■ In virtual runtime environments Application □ Java / .NET thread support ■ In middleware Server Virtual Runtime Application □ J2EE / CORBA thread pooling ■ In programming languages □ Asynchronous and Operating System event-based programming
Example: Operating System 8 code% data% files% code% data% files% registers% stack% registers% registers% registers% stack% stack% stack% Thread' Thread ' Thread ' Thread '
Concurrency Is Hard 9 ■ Sharing of global resources □ Concurrent reads and writes on the same global resource (variable) makes ordering a critical issue ■ Optimal management of resource allocation □ Process gets control over a I/O channel and is then suspended before using it ■ Programming errors become non-deterministic □ Order of interleaving may / may not activate the bug ■ Happens all even on uniprocessors ■ Race condition □ The result of an operation depends on the order of execution □ Well-known issue since the 60‘s, identified by E. Dijkstra
Race Condition 10 void echo() { char_in = getchar(); char_out = char_in; putchar(char_out); } ■ One piece of code in one process, executed at the same time … □ … by two threads on a single core. □ … by two threads on two cores. ■ What happens ?
Potential Deadlock 11 [Stallings]
Actual Deadlock 12 [Stallings] Parallel Programming Concepts | 2013 / 1014
Terminology Deadlock ■ Two or more processes / threads are unable to proceed ■ Each is waiting for one of the others to do something Livelock ■ Two or more processes / threads continuously change their states in response to changes in the other processes / threads ■ No global progress for the application Race condition ■ Two or more processes / threads are executed concurrently ■ Final result of the application depends on the relative timing of their execution 13
Terminology Starvation ■ A runnable process / thread is overlooked indefinitely ■ Although it is able to proceed, it is never chosen to run (dispatching / scheduling) Atomic Operation ■ Function or action implemented as a sequence of one or more instructions ■ Appears to be indivisible - no other process / thread can see an intermediate state or interrupt the operation ■ Executed as a group, or not executed at all Mutual Exclusion ■ The requirement that when one process / thread is using a resource, no other shall be allowed to do that 14
From Concurrency to Parallelism 15 Program Program Program Process Process Process Process Process Process Process Process Process Process Process Process Task Task Task Node Processor Processor Processor Processor Processor Memory Processor Processor Network Processor Processor Processor Memory Processor Processor Processor Memory Memory Processor Processor Processor Memory
Parallelism for … 16 ■ Speedup – compute faster ■ Throughput – compute more in the same time ■ Scalability – compute faster / more with additional resources ■ Price / performance – be as fast as possible for given money ■ Scavenging – compute faster / more with idle resources Processing Element A1 Main Memory Processing Element B1 Main Memory Scaling Up Processing Element A2 Processing Element B2 Processing Element A3 Processing Element B3 Scaling Out
The Parallel Programming Problem 17 Configuration Flexible Type Execution Parallel Application Match ? Environment
Parallelism [Mattson et al.] 18 ■ Task - Parallel program breaks a problem into tasks ■ Execution unit □ Representation of a concurrently running task (e.g. thread) □ Tasks are mapped to execution units during development time ■ Processing element □ Hardware element running one execution unit □ Depends on scenario - logical processor vs. core vs. machine □ Execution units run simultaneously on processing elements, controlled by the scheduling entity ■ Synchronization □ Mechanism to order activities of parallel tasks ■ Race condition □ Program result depends on the scheduling order
Parallel Processing 19 ■ Inside the processor □ Instruction-level parallelism (ILP) □ Multicore □ Shared memory ■ With multiple processing elements in one machine □ Multiprocessing □ Shared memory ■ With multiple processing elements in many machines □ Multicomputer □ Shared nothing (in terms of a globally accessible memory)
20 Multiple Instruction, (1966) Multiprocessor: Flynn ‘ s Taxonomy ■ Classify multiprocessor architectures among Single Data (MISD) instruction and data processing dimension Single Instruction, Single Data (SISD) Multiple Instruction, Single Instruction, Multiple Data (MIMD) Multiple Data (SIMD) (C) Blaise Barney
Another Taxonomy (Tanenbaum) 21 MIMD Parallel and Distributed Computers Multiprocessors Multicomputers (shared memory) (private memory) Bus Switched Bus Switched
Another Taxonomy (Foster) 22 ■ Multicomputer Memory Control Unit □ Set of connected von Neumann computers (DM-MIMD) Bus Central Unit Output □ Each computer runs a local Arithmetic Logic Input Unit program in local memory and sends / receives messages □ Local memory access is less expensive than remote memory access Interconnect
Shared Memory vs. Shared Nothing 23 ■ Organization of parallel processing hardware as … □ Shared memory system ◊ Concurrent processes can directly access a common address space ◊ Typically implemented as memory hierarchy, with different cache levels ◊ Examples: SMP systems, distributed shared memory systems, virtual runtime environment □ Shared nothing system ◊ Concurrent processes can only access local memory and exchange messages with other processes ◊ Message exchange typically order of magnitudes slower than memory ◊ Examples: Cluster systems, distributed systems (Hadoop, Grids, … )
Shared Memory vs. Shared Nothing 24 ■ Pfister: „shared memory “ vs. „distributed memory “ ■ Foster: „multiprocessor “ vs. „multicomputer “ ■ Tannenbaum: „shared memory “ vs. „private memory “ Process Process Process Process Data Data Data Data Processor Processor Processor Processor Message Message Message Message Shared Memory
Shared Memory 25 ■ All processors act independently and use the same global address space, changes in one memory location are visible for all others ■ Uniform memory access (UMA) system □ Equal load and store access for all processors to all memory □ Default approach for SMP systems of the past ■ Non-uniform memory access (NUMA) system □ Delay on memory access according to the accessed region □ Typically realized by processor networks and local memories ◊ Cache-coherent NUMA (CC-NUMA) , completely implemented in hardware ◊ Became standard approach with recent X86 chips
Recommend
More recommend