The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler, Richard C. - PowerPoint PPT Presentation

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler, Richard C. Murphy, Dylan Stark, and Bradford L. Chamberlain Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy ʼ s National Nuclear Security Administration under contract DE-AC04-94AL85000. Wednesday, May 18, 2011

The Structure of Chapel’s Runtime Chapel Runtime Support Libraries (written in C) Tasks Communication Launchers Standard Memory Timers Threads Wednesday, May 18, 2011

Chapel’s Tasking Layer •Role: Responsible for parallelism/synchronization •Main Focus: – support begin/cobegin/coforall statements – support synchronization variables •Main Features: – Startup/Teardown – Singleton Tasks – Task Lists – Synchronization – Control – Queries – ...serialization? Wednesday, May 18, 2011

The FIFO Tasking Implementation •Work-queue model –Function calls for work execution –Centralized queue •Pros: •Cons: –Simple, easy to debug –Task synchronization ( sync ) using thread synchronization ( pthread_mutex_t ) –Very portable –Uses native state • Compute/synch overlap requires management oversubscribing (#threads > #cpus) • stacks • Difficult to provide non-native (non-mutex) synchronization behavior • thread/task-specific data –#Task-to-#thread mismatch creates unexpected deadlock potential –Does not support work stealing –Does not support CPU pinning Wednesday, May 18, 2011

Challenges in Highly-Threaded Runtimes •Per-thread state – State vs threads •Locality – An afterthought in standard threading models – Communication and synchronization are expensive, easy to use accidentally •Synchronization – Hard to make portable, maintain guarantees •Every Machine is Different – Granularity of sharing (cacheline size) – Optimal number of threads (PU count) – Communication topology – Cache structure – Memory model – Synchronization Primitives ( CMPXCHG vs TNS vs CASXA vs LDARX / STWCX ) Wednesday, May 18, 2011

Qthreads Highlights •Lightweight User-level Threading (Tasking) •Platform portability –IA32/64, AMD64, PPC32/64, SparcV9, SST, Tilera –Linux, BSD, Solaris, MacOSX •Locality awareness –“Shepherd” as thread mobility domain & locality •Fine-grained synchronization semantics –Full/Empty Bits (64-bit & 60-bit) –Mutexes –Atomic operations (Incr & CAS) •Locality-aware Workstealing Model Wednesday, May 18, 2011

Chapel Single Locale Challenges •Startup & Teardown –Functions with unspecified scope –Synchronization primitives of unspecified scope •Unsupported Behavior –Limit on OS Threads •Default defined by hardware –Forced serialization of tasks –Task-local data Wednesday, May 18, 2011

Chapel Multi-Locale Challenges •Communication (via GASNet) –Blocking system calls •Dedicated OS thread •Possibility for proxying internally •Temporary solution: Forked initialization thread •Future solution: explicit progress thread creation –External Task Operations •Task creation from outside the task library –Memory management issue –Also: synchronization issue … •Task synchronization outside the task library –Proxy-task using thread-level synchronization (pthread_mutex_t) Wednesday, May 18, 2011

Future Work •Synchronization –Tasking interface assumes only mutex semantics –MTA/Qthreads interface provide fast FEB semantics –Implementing FEB semantics with a mutex implemented with FEB operations is silly and slow •Stack Space –Problem common to all tasking interfaces –Currently requires guess-and-check –Potential directions: •Technically possible to calculate stack requirements (e.g. gcc 4.6) •Technically possible to move stack variables to heap –Moves the memory management problem Wednesday, May 18, 2011

Performance: Raw Tasking 100 •QuickSort –Naïve implementation (serial partitioning) 10 Execution Time (secs) –Uses recursive cobegin 1 –Serialization threshold •For best comparison, set high to avoid serialization 0.1 0.01 3 2.5 Ratio FIFO/Qt 2 0.001 1.5 14 16 18 20 22 24 26 28 1 Array Elements (power of 2) 0.5 0 Qthreads FIFO 14 16 18 20 22 24 26 28 Wednesday, May 18, 2011

Performance: Raw Tasking 1000 •Tree Exploration –Constructs binary tree 100 –Assigns Unique ID Execution Time (secs) –Computes sum of IDs 10 –Uses recursive cobegin 1 0.1 2.5 0.01 Ratio FIFO/Qt 2 1.5 0.001 12 14 16 18 20 22 24 26 28 1 0.5 Tree Elements (power of 2) 0 Qthreads FIFO 12 14 16 18 20 22 24 26 28 Wednesday, May 18, 2011

Performance: Data Parallel 1000 •HPCC RandomAccess –GUPS (random integer updates) Execution Time (secs) –Stresses Memory System 100 –Uses forall 10 1.5 1.25 Ratio FIFO/Qt 1 1 0.75 1 2 4 8 16 32 64 128 0.5 Number of Tasks 0.25 0 Qthreads FIFO 1 2 4 8 16 32 64 128 Wednesday, May 18, 2011

Performance: Data Parallel 1 •HPCC STREAM (-EP) –Memory Bandwidth & Vector Kernels 0.8 Execution Time (secs) –EP version avoids communication 0.6 –Uses forall 0.4 –Synchronization surprisingly important 0.2 2 Ratio FIFO/Qt 1.5 0 1 1 2 4 8 16 32 64 128 0.5 Number of Tasks 0 Qthreads FIFO 1 2 4 8 16 32 64 128 Qthreads EP FIFO EP STREAM STREAM-EP Wednesday, May 18, 2011

Thank You! Questions? Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy ʼ s National Nuclear Security Administration under contract DE-AC04-94AL85000. Wednesday, May 18, 2011

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler, Richard C. - PowerPoint PPT Presentation

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler, Richard C. Murphy, Dylan Stark, and Bradford L. Chamberlain Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler , Sandia National Laboratories * and Richard

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

CO2101 Processes and Multi-tasking Tom Ridge (tr61) 7th October 2019 tr61 Multi-tasking

CHAPEL + LAPACK Ian Bertolacci NEW DOG, MEET OLD DOG. INTRO: WHAT IS CHAPEL Chapel is a

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

5 Network Layer Network Layer Network Layer Network Layer Example: Choosing among multiple ASes

A multi-tasking wordset for Standard Forth Andrew Haley Consulting Engineer 8 September 2017

1 Network Layer Network Layer Recall: Circuit Switching vs. Packet Interplay between routing

Managing Command and Control Information Using a C2IEDM Based Tasking Grammar Dr. Michael Hieb

7 Network Layer Network Layer Network Layer Network Layer Subnets Classful Address

Chapel: Global HPCC Benchmarks and Status Update Brad Chamberlain Chapel Team CUG 2007 May 7,

4 Network Layer Network Layer Network Layer Network Layer Switching Via Memory Three types of

Chapel: Status/Community Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Outline Chapel

The Essential Hip Exam Layer 1- Osteochondral Layer 2- Inert Layer 3- Contractile

Now Arriving at Layer 3 Packet Forwarding although layer 2 switches and layer 3 routers

1 Transport Layer Transport Layer RTT Estimation RTT Estimation Basic Idea SampleRTT :

Transport Layer How TCP, UDP, and Ports fit into IP Layer 4: the Transport Layer Responsibilities

5 Data Link Layer Data Link Layer Self-learning, Interconnecting switches Source: A Dest:

Building a Big Data Chapel Chris Taylor DoD Overview Big Data? Chapel on Mesos

10 mm Cytoarchitecture and function layer 4: input layer 5: output Motor cortex: expanded layer

Routing and Switching End-to-end delivery on layer 3 in TCP/IP terms Network Layer Primary

Review First, operating systems solves time-sharing multi-tasking context = memory address

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler, Richard C. - PowerPoint PPT Presentation

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler, Richard C. Murphy, Dylan Stark, and Bradford L. Chamberlain Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler , Sandia National Laboratories * and Richard

Lambeth Lambeth Partnership Tasking Partnership Tasking &amp; &amp; Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking &amp; &amp; Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking &amp; &amp; Co- -ordination

CO2101 Processes and Multi-tasking Tom Ridge (tr61) 7th October 2019 tr61 Multi-tasking

CHAPEL + LAPACK Ian Bertolacci NEW DOG, MEET OLD DOG. INTRO: WHAT IS CHAPEL Chapel is a

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

5 Network Layer Network Layer Network Layer Network Layer Example: Choosing among multiple ASes

A multi-tasking wordset for Standard Forth Andrew Haley Consulting Engineer 8 September 2017

1 Network Layer Network Layer Recall: Circuit Switching vs. Packet Interplay between routing

Managing Command and Control Information Using a C2IEDM Based Tasking Grammar Dr. Michael Hieb

7 Network Layer Network Layer Network Layer Network Layer Subnets Classful Address

Chapel: Global HPCC Benchmarks and Status Update Brad Chamberlain Chapel Team CUG 2007 May 7,

4 Network Layer Network Layer Network Layer Network Layer Switching Via Memory Three types of

Chapel: Status/Community Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Outline Chapel

The Essential Hip Exam Layer 1- Osteochondral Layer 2- Inert Layer 3- Contractile

Now Arriving at Layer 3 Packet Forwarding although layer 2 switches and layer 3 routers

1 Transport Layer Transport Layer RTT Estimation RTT Estimation Basic Idea SampleRTT :

Transport Layer How TCP, UDP, and Ports fit into IP Layer 4: the Transport Layer Responsibilities

5 Data Link Layer Data Link Layer Self-learning, Interconnecting switches Source: A Dest:

Building a Big Data Chapel Chris Taylor DoD Overview Big Data? Chapel on Mesos

10 mm Cytoarchitecture and function layer 4: input layer 5: output Motor cortex: expanded layer

Routing and Switching End-to-end delivery on layer 3 in TCP/IP terms Network Layer Primary

Review First, operating systems solves time-sharing multi-tasking context = memory address

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination