Threaded Programming Lecture 1: Concepts Overview Shared memory - PowerPoint PPT Presentation

Threaded Programming Lecture 1: Concepts

Overview • Shared memory systems • Basic Concepts in Threaded Programming 2

Shared memory systems • Threaded programming is most often used on shared memory parallel computers. • A shared memory computer consists of a number of processing units (CPUs) together with some memory • Key feature of shared memory systems is a single address space across the whole memory system. – every CPU can read and write all memory locations in the system – one logical memory space – all CPUs refer to a memory location using the same address 3

Conceptual model P P P P P P Interconnect Memory 4

Real hardware • Real shared memory hardware is more complicated than this … .. – Memory may be split into multiple smaller units – There may be multiple levels of cache memory – some of these levels may be shared between subsets of processors – The interconnect may have a more complex topology • … .but a single address space is still supported – Hardware complexity can affect performance of programs, but not their correctness 5

Real hardware example P � P � P � P � L1 � L1 � L1 � L1 � L2 � L2 � Memory � Memory � 6

Threaded Programming Model • The programming model for shared memory is based on the notion of threads – threads are like processes, except that threads can share memory with each other (as well as having private memory) • Shared data can be accessed by all threads • Private data can only be accessed by the owning thread • Different threads can follow different flows of control through the same program – each thread has its own program counter • Usually run one thread per CPU/core – but could be more – can have hardware support for multiple threads per core 7

Threads (cont.) Thread 1 Thread 2 � Thread 3 � PC PC PC Private data � Private data � Private data � Shared data 8

Thread communication • In order to have useful parallel programs, threads must be able to exchange data with each other • Threads communicate with each via reading and writing shared data – thread 1 writes a value to a shared variable A – thread 2 can then read the value from A • Note: there is no notion of messages in this model 9

Thread Communication Thread 1 Thread 2 mya=23 Program mya=a+1 a=mya Private 23 24 data 23 Shared data 10

Synchronisation • By default, threads execute asynchronously • Each thread proceeds through program instructions independently of other threads • This means we need to ensure that actions on shared variables occur in the correct order : e.g. thread 1 must write variable A before thread 2 reads it, or thread 1 must read variable A before thread 2 writes it. • Note that updates to shared variables (e.g. a = a + 1 ) are not atomic ! • If two threads try to do this at the same time, one of the updates may get overwritten. 11

Synchronisation example Thread 1 Thread 2 load a load a Program add a 1 add a 1 store a store a CPU 10 11 10 11 Registers 10 11 11 Memory 12

Tasks • A task is a piece of computation which can be executed independently of other tasks • In principle we could create a new thread to execute every task – in practise this can be too expensive, especially if we have large numbers of small tasks • Instead tasks can be executed by a pre-exisiting pool of threads – tasks are submitted to the pool – some thread in the pool executes the task – at some point in the future the task is guaranteed to have completed • Tasks may or may not be ordered with respect to other tasks 13

Parallel loops • Loops are the main source of parallelism in many applications. • If the iterations of a loop are independent (can be done in any order) then we can share out the iterations between different threads. • e.g. if we have two threads and the loop for (i=0; i<100; i++){ a[i] += b[i]; } we could do iteration 0-49 on one thread and iterations 50-99 on the other. • Can think of an iteration, or a set of iterations, as a task. 14

Reductions • A reduction produces a single value from associative operations such as addition, multiplication, max, min, and, or. • For example: b = 0; for (i=0; i<n; i++) b += a[i]; • Allowing only one thread at a time to update b would remove all parallelism. • Instead, each thread can accumulate its own private copy, then these copies are reduced to give final result. • If the number of operations is much larger than the number of threads, most of the operations can proceed in parallel 15

Threaded Programming Lecture 1: Concepts Overview Shared memory - PowerPoint PPT Presentation

Threaded Programming Lecture 1: Concepts Overview Shared memory systems Basic Concepts in Threaded Programming 2 Shared memory systems Threaded programming is most often used on shared memory parallel computers. A shared memory

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

THREADED PROGRAMMING OpenMP Performance 2 A common scenario..... So I wrote my OpenMP

Genetic Programming in automated test code generation for a multi-threaded microprocessor. Neow

Threaded Programming Lecture 6: Further topics in OpenMP Overview Nested parallelism

Emulation Outline Emulation Interpretation basic, threaded, directed threaded

WHY EVENTS ARE A BAD IDEA Rob von Behren, Jeremy Condit, Eric Brewer threaded servers failed

CS 360 Programming Languages Event-Driven Programming Events and Timers and Listeners, Oh My!

Cooperative Task Management without Manual Stack Management or, Event-driven Programming is Not

Validation Outline 2 Introduction Methodology Single-threaded results

Detecting Data Races in Multi-Threaded Programs Eraser A Dynamic Data-Race Detector for

Parallel Programming with Spark Qin Liu The Chinese University of Hong Kong 1 Previously on

Webbit Evented, single-threaded WebSocket server http://webbitserver.org/ @aslak_hellesoy

Overview of Threads and Concurrency Questions Why study threads and concurrent programming in

for genetic programming W. B. Langdon CREST lab, Department of Computer Science Slides

Programming Models Torsten Hoefler , Greg Bronevetsky, Brian Barrett, Bronis R. de Supinski, and

Threaded Network Interrupts Steven Rostedt srostedt@redhat.com <rostedt@goodmis.org>

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

A Problem or an Opportunity? Database workload + low throughput (0.8 IPC on an 8-wide

An Intelligent Discussion-Bot for Guiding Student Interactions in Threaded Discussions Jihie Kim

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

A Light-Weight Approach for Verifying Multi-Threaded Programs with CPAchecker ThreadingCPA Dirk

Basker : A Threaded Sparse LU factorization utilizing Hierarchical Parallelism and Data Layouts

Lessons from Building a Visualization Toolkit for Massively Threaded Architectures Robert

Threaded Programming Lecture 1: Concepts Overview Shared memory - PowerPoint PPT Presentation

Threaded Programming Lecture 1: Concepts Overview Shared memory systems Basic Concepts in Threaded Programming 2 Shared memory systems Threaded programming is most often used on shared memory parallel computers. A shared memory

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

THREADED PROGRAMMING OpenMP Performance 2 A common scenario..... So I wrote my OpenMP

Genetic Programming in automated test code generation for a multi-threaded microprocessor. Neow

Threaded Programming Lecture 6: Further topics in OpenMP Overview Nested parallelism

Emulation Outline Emulation Interpretation basic, threaded, directed threaded

WHY EVENTS ARE A BAD IDEA Rob von Behren, Jeremy Condit, Eric Brewer threaded servers failed

CS 360 Programming Languages Event-Driven Programming Events and Timers and Listeners, Oh My!

Cooperative Task Management without Manual Stack Management or, Event-driven Programming is Not

Validation Outline 2 Introduction Methodology Single-threaded results

Detecting Data Races in Multi-Threaded Programs Eraser A Dynamic Data-Race Detector for

Parallel Programming with Spark Qin Liu The Chinese University of Hong Kong 1 Previously on

Webbit Evented, single-threaded WebSocket server http://webbitserver.org/ @aslak_hellesoy

Overview of Threads and Concurrency Questions Why study threads and concurrent programming in

for genetic programming W. B. Langdon CREST lab, Department of Computer Science Slides

Programming Models Torsten Hoefler , Greg Bronevetsky, Brian Barrett, Bronis R. de Supinski, and

Threaded Network Interrupts Steven Rostedt srostedt@redhat.com &lt;rostedt@goodmis.org&gt;

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

A Problem or an Opportunity? Database workload + low throughput (0.8 IPC on an 8-wide

An Intelligent Discussion-Bot for Guiding Student Interactions in Threaded Discussions Jihie Kim

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

A Light-Weight Approach for Verifying Multi-Threaded Programs with CPAchecker ThreadingCPA Dirk

Basker : A Threaded Sparse LU factorization utilizing Hierarchical Parallelism and Data Layouts

Lessons from Building a Visualization Toolkit for Massively Threaded Architectures Robert

Threaded Network Interrupts Steven Rostedt srostedt@redhat.com <rostedt@goodmis.org>