diy bl diy block ck parallel el da data an analysis
play

DIY Bl DIY Block ck-Parallel el Da Data An Analysis Scientific - PowerPoint PPT Presentation

DIY Bl DIY Block ck-Parallel el Da Data An Analysis Scientific Achievement Master Assigner Decomposer DIY is a programming model and runtime for block-parallel Block loading Decomposition Mapping blocks Application analytics on DOE


  1. DIY Bl DIY Block ck-Parallel el Da Data An Analysis Scientific Achievement Master Assigner Decomposer DIY is a programming model and runtime for block-parallel Block loading Decomposition Mapping blocks Application analytics on DOE leadership machines; all parallel operations and to processes Analysis Algorithm Block execution Comm. links communications are expressed in terms of blocks, not processors, Data Movement which enables the same program to run in- and out-of-core with Communication I/O Algorithms OS / Runtime single or multiple threads. Local neighbor Collective Parallel sort Significance and Impact Global reduction Independent K-d tree DIY enabled Delaunay and Voronoi tessellation of cosmology dark matter particles to 128K processes and improved performance by Components of DIY and its place in the software stack are 50X [2], and it enabled ptychographic phase retrieval of designed to address the data movement challenge in synchrotron X-ray images on 128 GPUs in real time [3]; DIY won an extreme-scale data analysis. honorable mention paper at LDAV 2016 [1]. [1] Morozov and Peterka, Block-Parallel Data Analysis with DIY2, LDAV 2016. Research Details [2] Morozov and Peterka, Efficient Delaunay Tessellation through K-D Tree Decomposition, SC16. § Enabling VTK-m by DIY-ing various VTK distributed-memory filters: parallel [3] Nashed et al., Parallel Ptychographic Reconstruction, Optics Express 2014. resampling, multipart dataset redistribution, and stream tracing. § Ongoing preparation for exascale: relaxing synchronization, using deeper Work was performed at Argonne and Lawrence Berkeley National Labs. memory hierarchy, compatibility with many-core thread models. Dmitriy Morozov (LBNL) & Tom Peterka (ANL)

  2. Par aralle allel l Ev Event Generation and Analysis with DIY DIY Scientific Achievement Fermilab researchers developed two HPC parallel codes using DIY. - Pythia8 Monte Carlo event generator [1] - Feldman-Cousins correction [2] Significance and Impact DIY efficiently utilizes HPC workflows, resources, and HEP community tools. Event generator model for Research Details proton-proton collision: • Allows for extremely short turn-around of Robust predictions of collider events are needed to search for large parameter space explorations (e.g. new physics effects. Much of the generator tuning) dynamics is described by • Paves the way for new and advanced tunable parameters. The optimization algorithms, e.g. LHC search calculation of event generator analyses. predictions is expensive, and must be done for each choice of parameters. A full detector [1] Buchanan et al., JINST simulation of these calculations 2020 (in preparation) is even more expensive, [2] Hoche et al., arXiv 2019 . Scalability: Top: strong scaling of Pythia8 DIY code. requiring parallel HPC codes. [3] Sousa et al., CHEP 2018 . Bottom: weak scaling of Feldman-Cousins DIY code. Work was performed at Argonne and Fermilab under SciDAC HEP on HPC Partnership

  3. IE IExchange: : Pr Programming Old Synchronous Exchange Compute for (max_rounds) { void foo() { master.foreach(foo); deque_icoming(); Exchange master.exchange(); Messages compute(); all_done = reduce(local_work); // synch. collective enqueue_outgoing(); Synchronous Global if (all_done) } Computation of Total Work break; } End New Asynchronous IExchange master.iexchange(bar); bool bar() { do { Compute and dequeue_incoming(); Exchange compute(); enqueue_outgoing(); Asynchronous Termination Detection } while (fill_incoming()); return true; Synchronize and End } Morozov et al., IExchange: Generic Asynchronous Pattern for Interleaved Computation and Communication, in preparation, 2019. Work was performed at LBNL and ANL

  4. IE IExchange: : Termination De Detection Old Synchronous Exchange Compute Exchange No state: Global work > 0 Messages communicate & compute Synchronous Global State 0: Computation of Total Work local work = 0 Enter ibarrier State 0: End communicate & Not all others entered ibarrier compute New Asynchronous IExchange State 1: locally entered ibarrier State 1: Compute and communicate & All others entered ibarrier Exchange compute State 2: Asynchronous everyone entered Termination Detection ibarrier Global Stop work = 0 Synchronize and End Morozov et al., IExchange: Generic Asynchronous Pattern for Interleaved Computation and Communication, in preparation, 2019. Work was performed at LBNL and ANL

  5. IExchange: IE : As Asyn ynch chronous Co Communication and Co Computation in DIY DIY Scientific Achievement Interleaved asynchronous communication Compute pattern for iterative computations in DIY - Eliminates global synchronization on every iteration Exchange - Easier to use: asynchronous communication and Messages termination detection handled by DIY Old Synchronous Exchange Significance and Impact Synchronous Global Irregular imbalanced workloads can be Computation of Total Work accelerated using IExchange. Research Details End • Asynchronous communication and termination detection, interleaved with computation • Handles non-monotonic progress and/or unknown amount of global work. Scalability: strong Compute scaling plot and shows iexchange Exchange up 3.5X faster New Asynchronous IExchange and 5.4X better efficiency than Asynchronous exchange for Termination Detection particle tracing in Nek5000 thermal hydraulics Synchronize and End application. Morozov et al., IExchange: Generic Asynchronous Pattern for Interleaved Computation and Communication, in preparation, 2019. Work was performed at LBNL and ANL

Recommend


More recommend