Arash Deshmeh, Jacob Machina, and Angela C. Sodan University of Windsor, Canada ADEPT Scalability Predictor in Support of Adaptive Resource Allocation IPDPS 2010
Outline � Background: Adaptive Resource Allocation � Related Work � Downey Runtime/Speedup Model � The ADEPT Predictor � Experimental Results � Anomaly Detection � Automated Reliability Judgment � Summary and Conclusion
Background: Adaptive Resource Allocation � Adaptive resource allocation: Up to 70% improvement in avg. response times by � Reducing fragmentation � Adapting to current load (low/high) 98% of applications said to be moldable � Requires knowing jobs’ scalability / efficiency but not practically available yet In fact, it is a response-time function in dependence on CPU/core resources (Burton Smith)
Illustration of Adaptive Resource Allocation Fragmentation reduction Adaptation to current load ���������������� Ideal Speedup ����� ����� Real ���� � �� size N N N min opt max Job 2 with � Run at higher efficiency with smaller original Size 10 sizes if high load � Run at lower efficiency with larger ���� sizes of low load
More Background � Benefits for user: � Help in choosing job sizes tactically � Determine maximum meaningful job sizes ( � our data about real applications) � Relevance for resource allocation in: � Clusters (MPI jobs) � SMPs (OpenMP or MPI jobs) � Virtual-machine resource provisioning
Related Work � Most approaches are white-box (detailed model) � Require tools: code instrumentation, compiler/OS support, analysis of memory-access behavior, etc. • Complex and computationally expensive � Unsuitable for large-scale use in HPC centers � Valuable for cross-site or new-platform performance projection • Black-box approaches (few observ. points, simple model) � Easy-to-use and cheap � Suffer from anomalies or non-uniform scalability patterns
Goals of ADEPT Scalability Predictor � Goals of ADEPT � Achieve high prediction accuracy � Provide computationally efficient approach � Detect and automatically correct individual anomalies � Detect and model non-uniform patterns (multi-phase) � Perform reliability judgment with potential advice for outcome improvement � Apply black-box prediction � Based on Downey runtime/speedup model
Downey Model Mode n range S(n) T(n) 1 � n � A An / ( A +( � /2)( n -1)) Low ( A - � /2)/ n + � /2 A � n � 2 A -1 variance An / ( � (A-1/2+ n (1- � /2)) � ( A -1/2)/ n + 1 - � /2 2 A -1 � n A 1 1 � n � A + A � - � � + ( A + A � - � )/ n High nA ( � +1) / ( � (n+ A - variance A + A � - � � n � +1 1)+ A ) A 160 350 300 Speedup Curves, � varies Speedup curves for Downey m odel and a Speedup Curves, A varies 140 300 250 typical application 120 250 200 100 200 80 150 Flat Declining 150 60 Transitional 100 100 40 Typical application Linear 20 50 50 Downey model 0 0 0 0 100 200 300 400 0 100 200 300 400 0 100 200 300 400 � Simple (only A and � to be learned ) � Needs few observation points
ADEPT Predictor 1. Anomaly detection and scalability-pattern identification 2. Envelope derivation Core of ADEPT 3. Curve fitting 4. Reliability judgment
Core: Envelope Derivation � Derives constraints from observations � Calculates closed-form solutions (within certain percentage of deviation) from pairs of observations � Use lowest and highest bounds as overall envelope S Forming the Envelope 300 Range Pair 1 250 Range Pair 2 200 Range Pair 3 150 100 50 0 N 0 100 200 300 400
Core: Curve Fitting � Prediction per target point, biased to closest observations � Weighted least-squared relative errors � Two-step 1. Closest point fixed 2. Extending variation by certain percentage within envelope � Constraints from envelope and two-step curve fitting make ADEPT both accurate and fast S Speedup Prediction Using 4 Methods 200 150 100 50 Levm ar ADEPT / Exhaus tive / Genetic 0 N 0 100 200 300 400 500
Experimental Set-Up � Experiments with MPI and OpenMP � NAS benchmarks BT, CG, FT, LU, SP � 7 real anonymous applications (from administrator scalability tests) � Both interpolation and extrapolation � 3 to 4 input observation points � Prediction of T(n) and S(n) � T(1) not always available
Experimental Results: Speedup 6 8 60 NAS_FT NAS_OMP_BT NAS_OMP_CG 7 5 50 6 4 5 40 3 4 30 3 2 20 2 Standard Biased Weighting Predictions 1 10 1 Uniform Weighting Predictions 0 0 0 0 50 100 150 0 10 20 30 40 0 10 20 30 40 12 120 80 App_A App_E App_F 70 10 100 60 8 80 50 6 60 40 30 4 40 20 2 20 10 0 0 0 0 5 10 15 20 25 0 50 100 150 200 250 300 0 50 100 150 � Applied fitting approach better than non-weighted � Both interpolation and extrapolation work well � Most extrapolation still good on twice the number of nodes � Accuracy higher for closer extrapolation
Experimental Results: Runtime 1000 1000 1000 NAS_BT NAS_CG NAS_FT 100 100 100 10 10 10 1 1 1 0 50 100 150 0 50 100 150 0 50 100 150 200 250 300 10000 100000 100000 App_B App_D App_E 10000 10000 1000 1000 1000 100 100 100 10 10 10 1 1 1 0 500 1000 1500 2000 0 50 100 150 200 250 300 0 50 100 150 200 250 300 � Both interpolation and extrapolation work well � Whether T(1) available or not did not make any difference � Some predictions perfect match (App_A, App_C, App_G) � Accuracy higher for closer extrapolation
ADEPT Predictor 1. Anomaly detection and scalability-pattern identification 2. Envelope derivation Core of ADEPT 3. Curve fitting 4. Reliability judgment
Anomaly Detection � Serious deviations from model can be detected (Application never fully conforms to model) � Approach: fluctuation metric R R i = ((t i * n i /n i+1 )/t i+1 )*(1+(n i+1 -n i )/n i+1 ) (idea is relative speedup, normalized to distance) Check whether R i+1 > (1+ � )R i with � being sensitivity factor both R i+1 and R i are anomaly candidates
Individual Anomalous Points 120 2.20 2.20 R Metric Curve R Metric Curves Speedup curve, with anomalous point 2.00 2.00 100 1.80 1.80 80 1.60 1.60 60 1.40 1.40 40 1.20 1.20 20 1.00 1.00 0.80 0 0.80 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 250 140 9 160 Anomaly, NAS_SP Anomaly, NAS_OMP_SP Anomaly, Synthetic 8 140 120 7 120 100 6 100 80 5 80 4 60 60 3 40 40 2 20 20 1 0 0 0 0 50 100 150 200 250 300 0 10 20 30 40 0 50 100 150 200 250 • Minimum of 4 input points required • Check R curve after removal of anomaly candidate • If improvement, classify as anomaly point and reduce its weight for curve fitting
Anomaly Patterns 7 Stepwise NAS_OMP_FT 6 5 4 3 2 1 0 0 10 20 30 40 9 300 60 Stepwise NAS_OMP_FT, Fitted Stepwise Synthetic, Fitted 8 Specially Optimized for 2^n Nodes, Fitted 250 7 50 6 200 40 5 150 4 30 3 100 20 2 50 1 10 0 0 0 0 10 20 30 40 0 50 100 150 200 250 300 350 0 50 100 150 200 Currently considered: • Stepwise scalability (minimum of 5 points required) � Model instance per phase • Specially optimized for certain numbers of nodes, e.g. powers of two (minimum of 9 points required), regular anomalous points � Omit other points from curve fitting � Report suitable allocations
ADEPT Predictor 1. Anomaly detection and scalability-pattern identification 2. Envelope derivation Core of ADEPT 3. Curve fitting 4. Reliability judgment
Automated Reliability Judgment � All input points in linear section � More input points needed ( n � A ) � High fitting error, not explainable as anomaly � Report problem � Runner-up problem (two or more model instances with greatly different A match) � More input points needed (beyond current range)
Automatic Reliability Judgment (2) 35 250 1000 Runner-Up Model Instance, NAS_SP All Linear Speedup, App_C High Fitting Error, NAS_LU 30 200 25 100 150 20 15 100 10 10 50 5 0 0 1 0 10 20 30 40 0 50 100 150 200 250 300 0 50 100 150 200 250 � All 3 cases (linear, high-fitting error, runner-up) successfully detected
Summary and Conclusion � ADEPT is accurate and efficient For both interpolation and extrapolation (if not too far away) � Works well without serial time T (1) � Performance similar to that reported in literature for white-box � approaches � Employs envelope derivation technique to constrain search during model fitting � Biased model fitting with efficient two-level approach � Anomaly detection based on fluctuation metric and automatic correction � Warnings by reliability judgment if prediction uncertain � Suitable for production environments Extrapolative scalability prediction as feedback to users � Adaptive resource allocation �
Recommend
More recommend