QMPI: A Library for Multithreaded MPI Applications Alex Brooks - PowerPoint PPT Presentation

QMPI: A Library for Multithreaded MPI Applications Alex Brooks Hoang-Vu Dang Marc Snir

Outline • Motivation • Communication Model • Qthreads • QMPI • Summary 2

MOTIVATION 3

Issue • Large numbers of threads performing communication causes problems – Locking – Polling – Scheduling • As a result there are very few hybrid MPI+pthread applications 4

Current MPI Design • MPI code executed by calling thread – Requires coarse-grain locking – limits concurrency – Some implementations don’t support • Communication completion is observed through polling – Separate calls to progress engine • Scheduler is unaware of which threads have become runnable 5

Bandwidth (MB/s) 0.001 1000 0.01 100 0.1 10 1 1 2 4 8 16 32 64 128 Message size (bytes) 256 512 MPICH 1K 2K Performance 4K 8K 16K 32K 64K 128K 256K 512K 1024K 2048K 4096K 0.001 1000 0.01 100 0.1 10 1 1 2 4 8 16 32 64 128 Message size (bytes) 256 MVAPICH 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1024K 2048K 6 4096K

Goals • Enable efficient use of multithreaded two-sided communication – Light-weight threads – Low-overhead scheduling upon communication completion • Improve programmability of multithreaded MPI 7

COMMUNICATION MODEL 8

Main idea Worker Thread Communication Worker Thread Engine … Worker Thread • Light-weight tasks submit requests to comm. engine • Comm. engine marks task as runnable when communication completes 9

QTHREADS 10

Introduction • Tasking model which supports millions of light-weight threads • Three main entities – Task - Function of execution – Worker - Thread executing tasks – Shepherd - Queue of tasks 11

Synchronization • Full/Empty bit (FEB) semantics – FEB determines status of data • 0 (empty) : data is not written • 1 (full) : data is written • Read – Stall task until FEB is full, then read data and set as empty • Write – Stall until FEB is empty, then write data and set as full 12

Task Scheduler • Each work is associated with a single shepherd – Tasks pulled from shepherd to execute • Tasks can be stolen from other shepherds under certain conditions • Tasks preempt when waiting on synchronization 13

Overview • Scalable over-subscription – Millions of tasks can be spawned with minimal overhead in performance • Worker idle time is reduced through task preemption at synchronization • “Automatic” load -balancing of tasks • Shared-memory environment 14

QMPI 15

Overview • Qthreads+MPI – Qthreads light-weight task model with communication through MPI • Two threads dedicated for communication engine – One for communication, one for FEB management 16

Communication Model Node Synch Container FEB Thread Worker Shepherd FEB Queue Worker Shepherd … … Comm Thread Worker Shepherd Comm Queue Network 17

Bandwidth (MB/s) 0.001 1000 0.01 100 0.1 10 1 1 2 4 8 16 32 64 128 Message size (bytes) 256 512 MPICH 1K 2K Performance 4K 8K 16K 32K 64K 128K 256K 512K 1024K 2048K 4096K 0.001 1000 0.01 100 0.1 10 1 1 2 4 8 16 32 64 128 Message size (bytes) 256 MVAPICH 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1024K 23 2048K 4096K

Target Applications • Not beneficial for all problems – Little overlap in multithreaded communication increase runtime • Bulk-synchronous communication • Oversubscription – Benefit directly from Qthreads 24

Simple Experiment • 5-point stencil computation – Send edge values to neighbors – Recv edge values from neighbors – Compute new values 25

Results Receive Phase Send Phase 10000 1000 Execution Time (usec) 1000 100 100 MPI+Pthread MPI+Pthread 10 QMPI QMPI 10 1 1 120 1200 12000 120 1200 12000 Grid Size (1 side) Grid Size (1 side) Calculation Phase 100000 Execution Time (usec) 10000 1000 MPI+Pthread 100 QMPI 10 1 120 1200 12000 Grid Size (1 side) 26

SUMMARY 27

Conclusion • Large numbers of threads performing communication causes problems • QMPI uses a communication model to decrease communication overhead • QMPI performs much better than traditional MPI+pthreads in many situations 28

On-going/Future Work • Test QMPI with real applications – MiniGhost, Lulesh, UTS, etc. • Message Aggregation • Push QMPI model to an internal feature of MPI 29

QUESTIONS 30

QMPI: A Library for Multithreaded MPI Applications Alex Brooks - PowerPoint PPT Presentation

QMPI: A Library for Multithreaded MPI Applications Alex Brooks Hoang-Vu Dang Marc Snir Outline Motivation Communication Model Qthreads QMPI Summary 2 MOTIVATION 3 Issue Large numbers of threads performing

SE350: Operating Systems Lecture 5: Multithreaded Kernels Outline Use cases for multithreaded

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

RadixVM: Scalable address spaces for multithreaded applications Austin T. Clements M. Frans

Issues with Multithreaded Parallelism on Multicore Architectures Marc Moreno Maza University of

Testing of Multithreaded Programs Kari Khknen, Olli Saarikivi, Keijo Heljanko The Problem

Trace-driven Simulation of Multithreaded Applications Alejandro Rico, Alejandro Duran, Felipe

Analysis of Multithreaded Algorithms Marc Moreno Maza University of Western Ontario, London,

The Homeschooling - Library Connection Diane Pamel- Library Director Southworth Library and

Eric Lashley Library Director, Georgetown Public Library (TX) Patrick Lloyd, LMSW Community

Library RMR Project Renovate, Modernize, Reorganize Library Serves Patrons of Every Age This

King Fahd University of Petroleum & Minerals Deanship of Library Affairs KFUPM Library

PopUp Library @ Senior Center Whats a PopUp Library? Library services somewhere that is

What do you do with the temporarily placed programs? The problem is more widespread than just

The Chester Community Library The Chester Community Library The Chester Community Library The

Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder

Fast In Memory Checkpointing with POSIX API for Legacy Exascale Applications Jan Fajerski,

The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for

GPCF* Update Present status as a series of questions / answers related to decisions made / yet

Firecracker How to Securely Run Thousands of Workloads on a Single Host What is Firecracker? -

Customer Performance Jim Warner University of California Santa Cruz March 2014 Exaggerated

CloudMirror : T enant Network Abstraction that Reflects Applications Needs Myungjin Lee

Concise parallelism Natural C/C++ Parallelism A single operator to control multiple parallel

Sambuz

Useful Links

Newsletter

Mail Us

QMPI: A Library for Multithreaded MPI Applications Alex Brooks - PowerPoint PPT Presentation

QMPI: A Library for Multithreaded MPI Applications Alex Brooks Hoang-Vu Dang Marc Snir Outline Motivation Communication Model Qthreads QMPI Summary 2 MOTIVATION 3 Issue Large numbers of threads performing

SE350: Operating Systems Lecture 5: Multithreaded Kernels Outline Use cases for multithreaded

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

RadixVM: Scalable address spaces for multithreaded applications Austin T. Clements M. Frans

Issues with Multithreaded Parallelism on Multicore Architectures Marc Moreno Maza University of

Testing of Multithreaded Programs Kari Khknen, Olli Saarikivi, Keijo Heljanko The Problem

Trace-driven Simulation of Multithreaded Applications Alejandro Rico, Alejandro Duran, Felipe

Analysis of Multithreaded Algorithms Marc Moreno Maza University of Western Ontario, London,

The Homeschooling - Library Connection Diane Pamel- Library Director Southworth Library and

Eric Lashley Library Director, Georgetown Public Library (TX) Patrick Lloyd, LMSW Community

Library RMR Project Renovate, Modernize, Reorganize Library Serves Patrons of Every Age This

King Fahd University of Petroleum &amp; Minerals Deanship of Library Affairs KFUPM Library

PopUp Library @ Senior Center Whats a PopUp Library? Library services somewhere that is

What do you do with the temporarily placed programs? The problem is more widespread than just

The Chester Community Library The Chester Community Library The Chester Community Library The

Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder

Fast In Memory Checkpointing with POSIX API for Legacy Exascale Applications Jan Fajerski,

The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for

GPCF* Update Present status as a series of questions / answers related to decisions made / yet

Firecracker How to Securely Run Thousands of Workloads on a Single Host What is Firecracker? -

Customer Performance Jim Warner University of California Santa Cruz March 2014 Exaggerated

CloudMirror : T enant Network Abstraction that Reflects Applications Needs Myungjin Lee

Concise parallelism Natural C/C++ Parallelism A single operator to control multiple parallel

Sambuz

Useful Links

Newsletter

Mail Us

King Fahd University of Petroleum & Minerals Deanship of Library Affairs KFUPM Library