ParaStack : Efficient Hang Detection for MPI Programs at Large Scale - PowerPoint PPT Presentation

ParaStack : Efficient Hang Detection for MPI Programs at Large Scale Hongbo Li Zizhong Chen & Rajiv Gupta

Question Solution Evaluation 2

Question Solution Evaluation Program Hang Resource Wastage Current Solution 3

Execution in Batch Mode Process ID … … ! 0 1 2 i Time " " : occupied supercomputer time. Processes communicate via message passing (MPI). 4

Program Hang Occurs Program hang --- a type of bug whose occurrence stalls the program’s execution. Root cause can be in one single process, e.g. process 0 --- Incorrect thread-level synchronization and infinite loop, or all processes --- communication deadlock across all processes et.al. Process ID … … ! 0 1 2 i Time 5

Hang Causes Resource Wastage Process ID … … ! 0 1 2 i Large scale Time Resource waste Negative --- significant resource wastage at large scale. 6

Solution: Hang Detection Process ID … … $ 0 1 2 i Time ! " ! # Release resources when detecting a hang Shorter detection delay ( ! " ) à Bigger saving ( ! # ) 7

Traditional Detection Method Timeout is a commonly used method based on various metrics, e.g., IO-watchdog monitors how often a program writes . Setting a good timeout is hard due to following two dilemmas: Small timeout à Large Savings Too Small timeout à False Alarms Large timeout à Avoid False Positives Too Large timeout à Large Wastage 8

Question Solution Evaluation Statistical Model Two Problems 9

ParaStack Does not guess based on null unlike timeout methods. Detects hangs based on runtime history. 10

Basic Concept while (…) { user code MPI_Function () } ! "#$ Definition: ! "#$ = . "#$ . $"$/0 1 234 where denotes the number of processes executing inside user code and 1 42456 denotes the total number of processes employed in the run. 11

Dynamic Variation of Sout 0.6 S out 0.3 LU 0 1 51 101 Running timeline 1 0.5 S out FT 0 Running Timeline 1 101 201 0.6 S out SP 0.3 0 1 51 101 Running timeline A snippet of ! "#$ variation obtained via sampling every 1 millisecond interval. 12

When a Hang Occurs 0.8 0.4 S out 0 1 51 101 Running Timeline ! "#$ variation of a faulty LU run, where a fault is simulated by a very long sleep and injected on the left border of the red region. Program hang is characterized by two features : (1) very small ! %&' and (2) consecutive observations of (1) . 13

̂ ̂ Suspicion !(# $%& ) is the empirical cumulative distribution function obtained from randomly sampling ( )*+ . - , we obtain . = 0 12 - Given probability and classify the observed value of ( )*+ into a pair of opposite random events : Feature 1: Small 14

Significance Test of Hang Geometric distribution . The probability distribution of ! = # times of suspicions before the first occurrence of non- suspicion is $ ! = # = % & ∗ (1 − %) where % estimates the true suspicion probability , . Given the confidence level 1 − - , we claim a hang is detected if $ . / ! ≥ 1 = 2 3 ≤ 5 . Make it simple : something is very likely wrong when a very rare event occurs. Feature 1+2: Consecutively small 15

e Whole Picture v i t u c e s d n e o v c r e ! # s s a b o s p e o r r a d s y n ! " t o i l i i c b i a p b s o u r s P ! 16

Two Problems with the Model (1) How to achieve random sampling? (2) The observed suspicion probability ( ̂ " ) doesn’t reflect the truth ( " ), i.e., # ≠ % # . 17

Random Sampling Insert between two consecutive samplings with a random time step: !"#$ % + %/( . Too small % à lack of randomness; Bigger % à better randomness. 1 ü ü ü ü 0.5 S out 0 ûû û ûû û û ûû û û û û û û û û û û û ûû û ûû û ûû û û 1 101 201 Running Timeline û Lack of randomness ü better randomness Solution : use runs test to check randomness of the sample sequence, and double ) if it is found to be lack of randomness until randomness is assured. 18

Random Sampling (Cont.) Runs test --- a standard test that checks the randomness of a two-valued data sequence. Runs test’s procedure : calculate the average of the sample sequence; 1) denote values bigger than the average as (+) and those smaller than 2) that as (-); check the number of runs ( ! ) --- a run is defined as a series of 3) consecutive (+) or (-); Too small or too large " à the sequence is lack of randomness 4) (significance test) 19

Random Sampling (Cont.) Example . We have a sample sequence as 0.2 0.1 0.1 0.2 0.1 0.1 0.0 0.0 0.8 0.9 1.0 0.8 0.9 0.1 0.9 0.9, which can be transformed as below + − + + . − − − − − − − − + + + + Its average is 0.44375, the non-rejection region at 95% confidence is (4, 14), and # = 4 . As & is outside the non- rejection region , we claim the sampling is not random and thus double ' . 20

̂ ̂ ̂ ̂ ̂ ! " ≠ " The difference ( $ ) between the observed probability ( ! " ) and the true probability ( " ) is closely related to the sample size % . Solution : Hence, we estimate |" − ! "| ≤ $ at different sample size levels with high confidence (95%) : * = 0.47 3 = 0.3 when 11 ≤ : < 19, * = 0.27 when 19 ≤ : < 42, 3 = 0.2 * = 0.12 when 42 ≤ : < 86, 3 = 0.1 when 86 ≤ :. * = 0.06 3 = 0.05 At each level, we use a different credible ! " to define what is a suspicion ( ? @AB ≤ C DE * ) . Make it simple: the difference gets smaller as sample size increases. 21

̂ ̂ ! " ≠ " (Cont.) |% − ̂ %| ≤ ) is not enough as underestimating " , i.e., ! " < " , lead to false positives. % + --- the probability that a program is still healthy --- % < % , Given converges faster than % + to the significance level , as k increases à more false positives. We use - = ! " + 0 as an estimate of " in the calculation of hangs’ probability ( - 1 ), which guarantees that - ≥ " with 97.5% confidence. 22

Question Solution Evaluation 23

Goal Trivial overhead High accuracy & Low false positive ParaStack > Timeout Short detection delay Enable resource saving when a hang occurs 24

Evaluation Setting Fault injection A hang is simulated by injecting a long enough sleep () in either source code or binary. Target Programs HPL, HPCG, NPB benchmark set ParaStack’s default setting 10 randomly selected processes are monitored. Significance level ! = 0.1% . The initial maximal sampling interval is set as ' = 400 ms. 25

Evaluation Setting (Cont.) Number of hang-injected runs using default ParaStack Scale Tardis Tianhe-2 Stampede 256 800+ 20+ 1024 300+ 100+ 4096 50 8192 5 16384 3 Used notations AC Accuracy FP False positive rate D Average delay S Standard deviation of delays 26

Overhead, Accuracy & False Alarms Overhead @ scale 1024 with 5 runs on each program. We disable the automatic adaptation of ! . Average accuracy à over 99% for 100 runs of each program No false alarm reported in: - 39.7 hours of hang-free runs at scale of 1024 - 66 hours of hang-free runs at scale of 256 - all hang-injected runs 27

ParaStack v.s. Timeout 10 runs per setting & 256 processes Timeout baseline Hang is claimed to be found upon K consecutive observations of !"#$ ≤ 0 sampled at a fixed interval I . Like ParaStack, it only samples 10 processes to maintain the trivial overhead. 28

ParaStack v.s. Timeout (Cont.) 10 runs per setting & 256 processes Setting of ParaStack: P: ParaStack initializing ! as 400ms. ParaStack initializing ! as 10ms which doesn’t deliver random P*: sampling. P* compares well with P as ParaStack is able to automatically adjust ! to ensure a good model. 29

Detection Delay The median of detection delays based on 100 runs per setting at scale 256. BT CG LU SP FT MG HPL HPCG 4 6 3 3 13 3 4 5 (Unit: seconds ) 30

Detection Delay (Cont.) Delay on Tianhe-2 with 50 runs per setting Delay on Stampede with 20 runs per setting @ scale 1024 and 10 runs per setting at scale 4096 ParaStack detects hangs in a few seconds , which is far less than the commonly used 1-minute timeout . 31

Timesaving 100.0% 88.7% Saved time (%) 59.2% 55.5% 44.8% 50.0% 33.5% 27.5% 24.0% 10.0%11.3% 0.0% 0.0% 1 2 3 4 5 6 7 8 9 10 Hangs 10 faulty HPL runs with program hang’s occurrence uniformly distributed over the program execution On average 35.5% time saving 32

Thank you! Any Question? 33

ParaStack : Efficient Hang Detection for MPI Programs at Large Scale - PowerPoint PPT Presentation

ParaStack : Efficient Hang Detection for MPI Programs at Large Scale Hongbo Li Zizhong Chen & Rajiv Gupta Question Solution Evaluation 2 Question Solution Evaluation Program Hang Resource Wastage Current Solution 3 Execution in

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Message Passing Programming Designing MPI Applications Overview Lecture will cover MPI

Investigation of Parallel Processing Using How to Enable/Access Open MPI in Open MPI ADMB.

Parallelization strategies in PWSCF (and other QE codes) MPI vs Open MP MPI Message

The Evolution of MPI William Gropp Computer Science www.cs.uiuc.edu/ homes/ wgropp Outline 1.

FlipBack: Automatic Target Protection Against Soft Errors Xiang Ni Parallel Programming Lab

RRDtool RRDtool \aHr-aHr-deE-t:ul\ n [ E, fr. round robin database tool ] :a system to store and

synchronization 2: locks / memory ordering 1 last time pthread create/join racing where data

in Large-Scale Warehouses (Extended Abstract) Jiaoyang Li, 1 Andrew Tinka, 2 Scott Kiesel, 2

Scalable MPI Record + Replay Ignacio Laguna, Harshitha Menon Lawrence Livermore National

Applied Machine Learning Applied Machine Learning Syllabus and logistics Siamak Ravanbakhsh

Ergodic and Non-Ergodic Quantum Dynamics (or) Thermalization and Localization in Many-Body

INTRODUCING... An open, easy-to-use, secure & scalable platform for building the Internet

ParaStack : Efficient Hang Detection for MPI Programs at Large Scale - PowerPoint PPT Presentation

ParaStack : Efficient Hang Detection for MPI Programs at Large Scale Hongbo Li Zizhong Chen & Rajiv Gupta Question Solution Evaluation 2 Question Solution Evaluation Program Hang Resource Wastage Current Solution 3 Execution in

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

MPI &amp; MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Message Passing Programming Designing MPI Applications Overview Lecture will cover MPI

Investigation of Parallel Processing Using How to Enable/Access Open MPI in Open MPI ADMB.

Parallelization strategies in PWSCF (and other QE codes) MPI vs Open MP MPI Message

The Evolution of MPI William Gropp Computer Science www.cs.uiuc.edu/ homes/ wgropp Outline 1.

FlipBack: Automatic Target Protection Against Soft Errors Xiang Ni Parallel Programming Lab

RRDtool RRDtool \aHr-aHr-deE-t:ul\ n [ E, fr. round robin database tool ] :a system to store and

synchronization 2: locks / memory ordering 1 last time pthread create/join racing where data

in Large-Scale Warehouses (Extended Abstract) Jiaoyang Li, 1 Andrew Tinka, 2 Scott Kiesel, 2

Scalable MPI Record + Replay Ignacio Laguna, Harshitha Menon Lawrence Livermore National

Applied Machine Learning Applied Machine Learning Syllabus and logistics Siamak Ravanbakhsh

Ergodic and Non-Ergodic Quantum Dynamics (or) Thermalization and Localization in Many-Body

INTRODUCING... An open, easy-to-use, secure &amp; scalable platform for building the Internet

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

INTRODUCING... An open, easy-to-use, secure & scalable platform for building the Internet