Design and Implementa/on of Key Proposed MPI-3 One-Sided - PowerPoint PPT Presentation

Design ¡and ¡Implementa/on ¡of ¡Key ¡Proposed ¡MPI-‑3 ¡ One-‑Sided ¡Communica/on ¡Seman/cs ¡on ¡InfiniBand ¡ Sreeram ¡Potluri , ¡Sayantan ¡Sur, ¡Devendar ¡Bureddy ¡ ¡ and ¡Dhabaleswar ¡K. ¡Panda ¡ ¡ Network-‑Based ¡Compu2ng ¡Laboratory ¡ Department ¡of ¡Computer ¡Science ¡and ¡Engineering ¡ The ¡Ohio ¡State ¡University, ¡USA ¡ ¡ EuroMPI ¡2011 ¡

Introduction • Reduced synchronization overheads, simultaneous use of powerful system resources - key on modern clusters • Better support through one-sided communication in MPI-2 • Optimized implementation in MVAPICH2 • Limitations in semantics – hindered its wider acceptance • RMA working group proposed several extensions as part of the MPI-3 effort • Efficient implementation is crucial – to highlight their performance benefits, encourage their wide-spread use • Can the new semantics be implemented with high performance in MVAPICH2? 2 EuroMPI 2011

Overview MPI-3 One Sided Communication Synchronization Communication Window Creation • Get_accumulate • Lock_all, Unlock_all • Win_allocate • Rput, Rget, • Win_flush, • Win_create_dynamic, Win_flush_local, Raccumulate, Win_attach, Rget_accumulate Win_flush_all, Win_detach Win_flush_local_all • Fetch_and_op, • Win_sync Compare_and_swap Separate and Unified Undefined Conflicting Accumulate Ordering Windows Accesses 3 EuroMPI ¡2011 ¡

Flush Operations • Local and remote completions bundled in MPI-2 one- sided communication model • Handled using synchronization operations, requires closure of an epoch • Overhead in scenarios that require only local completions • Considerable overhead on networks like IB - semantics and cost of local and remote completions are different • RDMA Reads and Atomic Ops: CQ event means both local and remote completions • RDMA Writes: CQ event only means local completion. Remote completion requires a follow up Send/Recv exchange or an atomic operation. • Flush operations allow for more efficient check for completions 4 EuroMPI ¡2011 ¡

Flush Operations Put+Unlock% Put+Flush_local% Put+Flush% Get+Unlock% Get+Flush_local% Get+Flush% 10" sec)% 10 ¡ 10 ¡ 8 ¡ 8 ¡ Time ¡(usec) ¡ Time ¡(usec) ¡ 6 ¡ 6 ¡ 4 ¡ 4 ¡ 2 ¡ 2 ¡ 0 ¡ 0 ¡ 1 ¡ 2 ¡ 16 ¡ 32 ¡ 128 ¡ 256 ¡ 512 ¡ 1K ¡ 2K ¡ 4K ¡ 4 ¡ 8 ¡ 64 ¡ 1 ¡ 4 ¡ 16 ¡ 64 ¡ 256 ¡ 1K ¡ 4K ¡ Message ¡Size ¡(Bytes) ¡ Message ¡Size ¡(Bytes) ¡ • Local completion of Put is efficient using flush • Completion does not require closure of the epoch 8-core Intel Westmere Nodes connected with InfiniBand QDR IB 5 EuroMPI ¡2011 ¡

Request Based Operations • Current semantics provide bulk synchronization • Lack of a way to request completion of individual operations, without closing an epoch • Does not serve well for fine grained computation and communication overlap • Request based operations (MPI_Rput, MPI_Rget, and others) return an MPI Request, can be polled for completion • Added GCP(Get-Compute-Put) Benchmarks in the OSU suite to highlight their benefits 6 EuroMPI ¡2011 ¡

Request Based Operations GCP ¡Benchmark ¡ MPI_Win_lock ¡ MPI_Win_lock ¡ MPI_Win_lock ¡ for ¡i ¡in ¡1, ¡N ¡ for ¡i ¡in ¡1, ¡N ¡ for ¡i ¡in ¡1, ¡N ¡ ¡ ¡ ¡ ¡ ¡ MPI_Get ¡(i th ¡ Block) ¡ ¡ ¡ ¡ ¡ ¡ MPI_Get ¡(i th ¡ Block) ¡ ¡ ¡ ¡ ¡ ¡ MPI_Rget ¡(i th ¡ Block) ¡ end ¡for ¡ end ¡for ¡ end ¡for ¡ MPI_Win_unlock ¡ MPI_Win_unlock ¡ ¡ ¡ ¡ MPI_Wait_any ¡(get ¡requests) ¡ Compute ¡(N ¡Blocks) ¡ while ¡a ¡get ¡request ¡j ¡completes ¡ MPI_Win_lock ¡ ¡ ¡ ¡ ¡ ¡ ¡Compute ¡(j th ¡Block) ¡ for ¡i ¡in ¡1, ¡N ¡ MPI_Win_lock ¡ ¡ ¡ ¡ ¡ ¡ MPI_Rput ¡(j th ¡ Block) ¡ ¡ ¡ ¡ ¡ ¡ Compute ¡(i th ¡Block) ¡ for ¡i ¡in ¡1, ¡N ¡ ¡ ¡ ¡ ¡ ¡ MPI_Wait_any ¡ (get ¡requests) ¡ ¡ ¡ ¡ ¡ ¡ MPI_Put ¡(i th ¡ Block) ¡ ¡ ¡ ¡ ¡ ¡ MPI_Put ¡(i th ¡ Block) ¡ end ¡while ¡ end ¡for ¡ end ¡for ¡ MPI_Wait_all ¡(put ¡requests) ¡ MPI_Win_unlock ¡ MPI_Win_unlock ¡ MPI_Win_unlock ¡ ¡ ¡ Overlap ¡using ¡Request ¡Ops ¡ No ¡Overlap ¡ Overlap ¡using ¡Lock-‑Unlock ¡ 7 ¡ EuroMPI ¡2011 ¡

Request Based Operations Lock6Unlock% Request%Ops% 100 ¡ Percentage ¡Overlap ¡ 80 ¡ 60 ¡ 40 ¡ 20 ¡ 0 ¡ 2K ¡ 8K ¡ 32K ¡ 128K ¡ 512K ¡ 2M ¡ Message ¡Size ¡(Bytes) ¡ • Request based operations provide superior overlap 8-core Intel Westmere Nodes connected with InfiniBand QDR IB 8 EuroMPI ¡2011 ¡

Dynamic Windows • Creation of a window is collective on communicator • A process can attach or detach memory to the window dynamically • User has to manage exchange and correct use of address information • MPI Implementations on IB have to manage dynamic exchange of key information to use RDMA • MVAPICH2 uses a pull model – request-for-info sent when the first operation is issued on a region, information is cached • Request is piggy-backed onto the first data packet for small and medium message sizes 9 EuroMPI ¡2011 ¡

Dynamic Windows Sta3c%Window% Dynamic%Window% OSU Put Bandwidth OSU Put Latency 8 ¡ 4000 ¡ Bandwidth ¡(MBps) ¡ 6 ¡ 3000 ¡ Time ¡(usec) ¡ 4 ¡ 2000 ¡ 2 ¡ 1000 ¡ 0 ¡ 0 ¡ 0 ¡ 2 ¡ 8 ¡ 32 ¡ 128 ¡ 512 ¡ 2K ¡ 1 ¡ 16 ¡ 256 ¡ 4K ¡ 64K ¡ 1M ¡ Message ¡Size ¡(Bytes) ¡ Message ¡Size ¡(Bytes) ¡ • Dynamic windows can provide performance similar to static windows • Key exchange overhead is amortized 8-core Intel Westmere Nodes connected with InfiniBand QDR IB 10 EuroMPI ¡2011 ¡

Conclusion and Future Work • First implementation of features from the proposed one- sided communication semantics for MPI-3 • Highlighted their benefits • Working towards a complete implementation of the proposed MPI-3 one-sided communication standard • Modifying application benchmarks to show how real- world applications can benefit from the proposed extensions 11 EuroMPI ¡2011 ¡

¡Thank ¡You! ¡ {potluri, ¡surs, ¡bureddy, ¡panda}@cse.ohio-‑state.edu ¡ ¡ ¡ ¡ ¡ ¡ Network-‑Based ¡Compu/ng ¡Laboratory ¡ h]p://nowlab.cse.ohio-‑state.edu/ ¡ MVAPICH ¡Web ¡Page ¡ h]p://mvapich.cse.ohio-‑state.edu/ � 12 ¡

Design and Implementa/on of Key Proposed MPI-3 One-Sided - PowerPoint PPT Presentation

Design and Implementa/on of Key Proposed MPI-3 One-Sided Communica/on Seman/cs on InfiniBand Sreeram Potluri , Sayantan Sur, Devendar Bureddy and

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

Access Programming with MPI-3 One Sided R OBERT G ERSTENBERGER , M ACIEJ B ESTA , T ORSTEN H OEFLER

MPI Internals Advanced Parallel Programming Overview MPI Library Structure Point-to-point

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Trace-based detection of lock contention in MPI one-sided communication Marc-Andr e Hermanns

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

One-Sided Access in Two-Sided Markets Marianne Verdier Universit de Lille 1, Laboratoire

Object Grammars Compositional & Bidirectional Mapping Between Text and Graphs Tijs van der

Advanced Synchronization and Deadlock A house of cards? Locks + CV /signal a great way to

Touchless Handle Swipe to lock/unlock Touchless Handle is a hands-free way to operate a bathroom

Datalog-based Scalable Semantic Diffing of Concurrent Programs Chungha Sung | Shuvendu K. Lahiri

CS 61A Lecture 12 Monday, September 29 Announcements Homework 3 due Wednesday 10/1 @ 11:59pm

For Thursday Read Weiss, chapter 4, sections 1-4 Homework: Weiss, chapter 3, exercises

1 Fib- -Heap Heap- -Extract Extract- -Min Min Example: Fib- -Heap Heap- -Extract

Priority queue Binary heap March 06, 2019 Cinda Heeren / Will Evans / Geoffrey Tien 1 REMINDER

Sambuz

Useful Links

Newsletter

Mail Us

Design and Implementa/on of Key Proposed MPI-3 One-Sided - PowerPoint PPT Presentation

Design and Implementa/on of Key Proposed MPI-3 One-Sided Communica/on Seman/cs on InfiniBand Sreeram Potluri , Sayantan Sur, Devendar Bureddy and

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

Access Programming with MPI-3 One Sided R OBERT G ERSTENBERGER , M ACIEJ B ESTA , T ORSTEN H OEFLER

MPI Internals Advanced Parallel Programming Overview MPI Library Structure Point-to-point

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

MPI &amp; MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Trace-based detection of lock contention in MPI one-sided communication Marc-Andr e Hermanns

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

One-Sided Access in Two-Sided Markets Marianne Verdier Universit de Lille 1, Laboratoire

Object Grammars Compositional &amp; Bidirectional Mapping Between Text and Graphs Tijs van der

Advanced Synchronization and Deadlock A house of cards? Locks + CV /signal a great way to

Touchless Handle Swipe to lock/unlock Touchless Handle is a hands-free way to operate a bathroom

Datalog-based Scalable Semantic Diffing of Concurrent Programs Chungha Sung | Shuvendu K. Lahiri

CS 61A Lecture 12 Monday, September 29 Announcements Homework 3 due Wednesday 10/1 @ 11:59pm

For Thursday Read Weiss, chapter 4, sections 1-4 Homework: Weiss, chapter 3, exercises

1 Fib- -Heap Heap- -Extract Extract- -Min Min Example: Fib- -Heap Heap- -Extract

Priority queue Binary heap March 06, 2019 Cinda Heeren / Will Evans / Geoffrey Tien 1 REMINDER

Sambuz

Useful Links

Newsletter

Mail Us

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Object Grammars Compositional & Bidirectional Mapping Between Text and Graphs Tijs van der