Introduction Programming Model The MPI Threads API Summary & Conclusions A Middleware for Concurrent Programming in MPI Applications Tobias Berka, Helge Hagenauer and Marian Vajterˇ sic September 13, 2011 1 / 26
Introduction Programming Model The MPI Threads API Summary & Conclusions Outline 1 Introduction Emergent Parallel Applications The Need for Concurrency 2 Programming Model Concurrency using Threads Thread Collectives In Actual Use 3 The MPI Threads API The MPIT Interface Definition Constructs and Features Performance Overhead 4 Summary & Conclusions 2 / 26
Introduction Programming Model Emergent Parallel Applications The MPI Threads API The Need for Concurrency Summary & Conclusions Introduction 3 / 26
Introduction Programming Model Emergent Parallel Applications The MPI Threads API The Need for Concurrency Summary & Conclusions Emergent Parallel Applications Parallelism is abundant in today’s data centers: Multi-core CPUs, High-bandwidth low-latency interconnection networks, Accelerator hardware. Exciting new applications in today’s information economy: Information retrieval (i.e. search), Online analytical processing, Recommender systems, Data mining. 4 / 26
Introduction Programming Model Emergent Parallel Applications The MPI Threads API The Need for Concurrency Summary & Conclusions Use Case: Parallel Search Engine Requirements beyond the classic batch-job operation: Add Document Update Document Data Remove Document Query Documents 5 / 26
Introduction Programming Model Emergent Parallel Applications The MPI Threads API The Need for Concurrency Summary & Conclusions Use Case: Parallel Search Engine We group similar operations – short and long: Maintenance Layer Add Document Update Document Data Remove Document Query Documents Query Layer 6 / 26
Introduction Programming Model Emergent Parallel Applications The MPI Threads API The Need for Concurrency Summary & Conclusions The Need for Concurrency Multi-User Operation Multiple users, Single back-end, Single data base... ⇒ We need concurrency! Both layers can be used concurrently, At the same time: Answer queries, Modify the data base, ⇒ We need concurrency! 7 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions Programming Model 8 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions How do we implement concurrent activities? Operations and queues: Data structure to describe operations, Queue holds operations, “Main loop” pops operations and processes them. Threads: One activity = one thread. 9 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions The pros and cons... Operations and queues: + Efficient, - No true concurrency, - Cannot process operation and receive independent messages, Threads: - Context switching overhead, - Shared data requires locking, + Very tidy abstraction, + Compositional (can always add more threads). 10 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions Use Case: Parallel Search Engine Let’s use threads to implement these concurrent activities: Maintenance Layer Add Document Update Maintenance Thread Document Queue Data Remove Document Query Thread Query Queue Documents Query Layer 11 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions Programming Abstraction Key abstraction: thread collective, Goals: Encapsulate concurrent activities, Isolate concurrent communication, Unify and simplify the design. Conflicting objectives: Safety and ease of programmability, Performance. 12 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions Thread Collectives Creates a new thread within every MPI process ( T1 � → T2 � ), Assigns a copy of the MPI communicator ( C 1 → C 2), C1 P1 P2 P3 P4 T1 T1 T1 T1 T2 T2 T2 T2 C2 13 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions Thread Collectives Encapsulates computation: thread function(s) for � , T2 Isolates communication: communicator C 2. C1 P1 P2 P3 P4 T1 T1 T1 T1 T2 T2 T2 T2 C2 14 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions Parallel Search Engine P 1– P 4 each hold a part of all documents, T 1s: Answer queries (query layer), T 2s: Add, remove or update documents (maintenance layer). C1 P1 P2 P3 P4 T1 T1 T1 T1 T2 T2 T2 T2 C2 15 / 26
Introduction Concurrency using Threads Programming Model Thread Collectives The MPI Threads API In Actual Use Summary & Conclusions What do we get? Simple, ready-made abstraction, Encapsulate & isolate, Compositional, Caveat: synchronization and locking. 16 / 26
Introduction The MPIT Interface Definition Programming Model Constructs and Features The MPI Threads API Performance Overhead Summary & Conclusions The MPI Threads API 17 / 26
Introduction The MPIT Interface Definition Programming Model Constructs and Features The MPI Threads API Performance Overhead Summary & Conclusions The MPIT Interface Definition Additional layer of middleware to provide what we need, Designed as a library for compatability (not a new programming lanugage), The “MPI Threads” (MPIT) interface definition, Written as a single C header file (157 physical SLOC 1 ). 1 According to David A. Wheeler’s “SLOCCount”. 18 / 26
Introduction The MPIT Interface Definition Programming Model Constructs and Features The MPI Threads API Performance Overhead Summary & Conclusions Constructs and Features Thread collectives, One thread within every MPI process, Separate MPI communicator, Conventional threads, Portable thread interface, We get it “for free” – we have all of the machinery. Process-local synchronization constructs, Mutex locks, condition variables, semaphores and barriers, Specifies reliable semantics. 19 / 26
Introduction The MPIT Interface Definition Programming Model Constructs and Features The MPI Threads API Performance Overhead Summary & Conclusions Performance Overhead Additional layer of indirection (1 additional function call), Additional error and consistency checks: Condition variable checks spurious wake-up, Barrier verifies thread identity. ⇒ Execution time overhead. MPIT prototype implemented on top of POSIX threads (2,650 physical SLOC 2 ). 2 According to David A. Wheeler’s “SLOCCount”. 20 / 26
Introduction The MPIT Interface Definition Programming Model Constructs and Features The MPI Threads API Performance Overhead Summary & Conclusions Thread Creation Difference: additional indirection. create and join threads 0.14 MPIT prototype 0.12 POSIX threads 0.1 milliseconds 0.08 0.06 0.04 0.02 0 2 4 6 8 threads 21 / 26
Introduction The MPIT Interface Definition Programming Model Constructs and Features The MPI Threads API Performance Overhead Summary & Conclusions Lock / Unlock Mutex Difference: additional checks. lock and unlock mutex 0.3 MPIT prototype POSIX threads 0.25 milliseconds 0.2 0.15 0.1 0.05 0 2 4 6 8 threads 22 / 26
Introduction The MPIT Interface Definition Programming Model Constructs and Features The MPI Threads API Performance Overhead Summary & Conclusions Wait / Wake on Condition Variable Difference: indirection & checks (different time scale). wait / wake on condition variable 6 MPIT prototype POSIX threads 5 milliseconds 4 3 2 1 0 2 4 6 8 threads 23 / 26
Introduction Programming Model The MPI Threads API Summary & Conclusions Summary & Conclusions 24 / 26
Introduction Programming Model The MPI Threads API Summary & Conclusions Summary & Conclusions Parallel programs may require additional concurrency, New abstraction: thread collectives, MPIT interface specification, Performance penalty is acceptable... ...unless too many locks are used. 25 / 26
Introduction Programming Model The MPI Threads API Summary & Conclusions Thank you! Thank you! 26 / 26
Recommend
More recommend