Catching Idlers with Ease: A Lightweight Wait-State Profiler for MPI - PowerPoint PPT Presentation

Catching Idlers with Ease: A Lightweight Wait-State Profiler for MPI Programs Guoyong Mao, David Böhme, Markus Geimer, Marc-André Hermanns, Daniel Lorenz and Felix Wolf Petascale Tools Workshop, Madison, WI, USA, August 4, 2014

Late sender processes A Send B Recv Waiting time time 2 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Wait an NxN processes Allgather A Waiting time Allgather B Waiting time Allgather C time 3 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

What we want to know Processing time Wait time Processing time Wait time Processing time Processing time 4 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

What we measure Execution time Execution time Execution time Execution time 5 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

The minimum idea Execution time Execution time Execution time Execution time Minimal execution time 6 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

The minimum idea Processing time Wait time Processing time Wait time Processing time Processing time Estimated processing time Estimated wait time 7 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Considered parameters • We consider • MPI function • Message size • Receiver rank • Other possible parameters • Sender rank • Data type • Tradeoff between • Number of samples for a meaningful minimum and amount data • Parameters considered • Need to find the relevant parameters. 8 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Algorithm For every combination of • MPI function • Message size class • Process record the • Minimum execution time For every combination of MPI call path and message size class record the • Number of visits • Total execution time At the end of the profiling run, subtract the minimum from the execution time for every visit to calculate the wait time. 9 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Per-call overhead increase compared to profiling overhead w/o wait state analysis (%) 250 200 150 100 50 0 10 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Accuracy MPI_Recv JUROPA JUQUEEN 0.18 0.16 0.14 0.12 wait ratio 0.1 Scalasca 0.08 0.06 minimum 0.04 method 0.02 0 11 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

12 wait ratio 0.02 0.04 0.06 0.08 0.12 0.14 0.16 0.1 0 Daniel Lorenz et al., Petascale Workshop, August 4, 2014 Accuracy MPI_Wait make_id_list ks_congrad path_product u_shift_fermion comm_embed rev_comm_rho JUROPA rev_commnct tf_ad_splitting parallel rsl_lite_exch_y rsl_lite_exch_x x_solve y_solve z_solve resid rprj3 psinv interp rhs x_solve JUQUEEN y_solve z_solve resid rprj3 psinv iterp method minimum Scalasca

13 wait ratio 0.02 0.04 0.06 0.08 0.12 0.14 0.16 0.1 0 Daniel Lorenz et al., Petascale Workshop, August 4, 2014 Accuracy MPI_Wait make_id_list ks_congrad path_product u_shift_fermion comm_embed rev_comm_rho JUROPA rev_commnct tf_ad_splitting parallel rsl_lite_exch_y rsl_lite_exch_x x_solve y_solve z_solve resid rprj3 psinv interp rhs x_solve JUQUEEN y_solve z_solve resid rprj3 psinv iterp method minimum Scalasca

Accuracy MPI_Waitall JUROPA JUQUEEN 0.14 0.12 0.1 wait ratio 0.08 Scalasca 0.06 minimum 0.04 method 0.02 0 x_solve y_solve z_solve x_solve y_solve z_solve bndry_3 bndry_2 solvers, commnc copy_fa copy_fa copy_fa copy_fa pcg ces ces ces ces d d t 14 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Non-blocking communication A Isend Isend processes B Wait time 15 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Scalasca detects no wait state A Isend Isend processes B Wait time 16 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Minimum approach does calculate wait states A Isend Isend processes B Wait time 17 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

But is this wrong for performance analysis? A Isend Isend processes B Wait time Latency = Possible overlap time 18 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Detailed example from SP 19 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Wait time according to Scalasca 20 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Wait time according to minimum method 21 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Jitter may cause a little higher wait time Processing time Wait time Wait time Processing time Processing time Processing time Estimated processing time Estimated wait time 22 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Accuracy MPI_Waitall JUROPA JUQUEEN 0.14 0.12 0.1 wait ratio 0.08 Scalasca 0.06 minimum 0.04 method 0.02 0 x_solve y_solve z_solve x_solve y_solve z_solve bndry_3 bndry_2 solvers, commnc copy_fa copy_fa copy_fa copy_fa pcg ces ces ces ces d d t 23 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Static imbalance Processing time Wait time Processing time Wait time Processing time Wait time Processing time Wait time Estimated wait time too small Estimated processing time 24 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Static imbalance • Calculating global minima could resolve process local static imbalances • Reduction operation after measurement • No dilation at measurement time • Loose sender/receiver parameterization of minima • For collective operations, global minima were better 25 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Accuracy for Wait at NxN JUROPA JUQUEEN 0.35 0.3 wait ratio 0.25 0.2 0.15 0.1 Scalasca 0.05 0 minimum get_max_recvs solvers,pcg tf_controle glbl_int_sum EP trnspse_x_yz glbl_int_sum method 26 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Conclusion (1) • Minimum method works for the estimation of blocking and non-blocking communication • For blocking communication results similar to Scalasca • For non-blocking communication, in Waitall wait time do not match the Scalasca analysis. • Low runtime overhead • No trace recording or piggybacking • May not produce 100% accurate numbers, but • Sufficient accuracy to locate performance problems • Point to places where we might want to investigate further with trace analysis 27 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Conclusion (2) • Detection of good minimum crucial • Static imbalance • Tradeoff between number of parameters and number of samples • Jitter may lead to minor increase of measured wait state • For non-blocking communication • Count possible overlap time • Might be larger than pure Late Sender time • Isn’t this even more accurate to estimate the optimization potential? 28 Daniel Lorenz et al., Petascale Workshop, August 4, 2014

Reference Guoyong Mao, David Böhme, Marc-André Hermanns, Markus Geimer, Daniel Lorenz, Felix Wolf: Catching Idlers With Ease: A Lightweight Wait- State Profiler for MPI Programs . In: EuroMPI ’14: Proc. Of the 21 st European MPI Users’ Group Meeting, Tokyo, Japan, Sep. 9-12, 2014 29 Daniel Lorenz, Petascale Workshop, Madison, WI, 8/4/14

Catching Idlers with Ease: A Lightweight Wait-State Profiler for MPI - PowerPoint PPT Presentation

Catching Idlers with Ease: A Lightweight Wait-State Profiler for MPI Programs Guoyong Mao, David Bhme, Markus Geimer, Marc-Andr Hermanns, Daniel Lorenz and Felix Wolf Petascale Tools Workshop, Madison, WI, USA, August 4, 2014 Late sender

KRISTA BOAN WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT,

Defeating IMSI catchers CCS 2015 10-13-2015 Denver Fabian van den Broek, Roel Verdult and Joeri

Points to ponder while we wait for everyone to log on Points to ponder while we wait for

CS 457 Lecture 5 Reliable Delivery Part 2 Fall 2011 Stop and Wait in Action Stop and Wait

Catching the Future Before it f Catches You A 2010 .edu survey Catching the Future Before

February 2017 What is ease of doing business? Ease of doing business is an index published by the

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

CV Border Wait- -Time Time CV Border Wait Measurement Project Measurement Project Border

Competitive Freshness Algorithms for Wait free Objects Wait-free Objects Peter Damaschke, Phuong

Physical Therap Physical Therapy: A Hidd y: A Hidden en Answ Answer to er to Ease Ease the

EASE The Air Regulatory Efficiency and Streamlining Effort EASE Project Overview Formerly

E-Invoice (IRN) System Ease in doing Business Ease in GST Compliance! 1 GST-E-Invoicing

- Helicobacter Helicobacter - - - THE EASE AND DIFFICULTY THE EASE AND DIFFICULTY OF A NEW

Mutual Exclusion, Async Completions Why We Wait 7C. Asynchronous Event Completions We await

Midterm Exam CSE 421/521 - Operating Systems Fall 2011 October 20th, Thursday Lecture - XIV

Design and Pilot Testing of Subgoal Labeled Worked Examples for Five Core Concepts in CS1 Briana

Semaphores considered Edsger s perspective harmful During system conception it transpired

The Average Waiting Time for Both Classes in a Delayed Accumulating Priority Queue Blair Bilodeau

Guidelines for Action Space Definition in Reinforcement Learning-based Traffic Signal Control

Intelligent flights disruption management Q COVID-19 Impact on airlines industry US$61 Bil Cash

Finding Temporal Paths under Waiting Time Constraints Philipp Zschoche TU Berlin July 7 2020,

Sambuz

Useful Links

Newsletter

Mail Us