Benchmark Design for Robust Profile-Directed Optimization SPEC - PowerPoint PPT Presentation
Benchmark Design for Robust Profile-Directed Optimization SPEC Workshop 2007 Paul Berube and Jos Nelson Amaral University of Alberta NSERC Alberta Ingenuity iCore January 21, 2007 Paul Berube 1 In this talk SPEC: SPEC
Benchmark Design for Robust Profile-Directed Optimization SPEC Workshop 2007 Paul Berube and José Nelson Amaral University of Alberta NSERC Alberta Ingenuity iCore January 21, 2007 Paul Berube 1
In this talk • SPEC: SPEC CPU • PDF: Offline, profile-guided optimization • Test: Evaluate • Data/Inputs: Program input data January 21, 2007 Paul Berube 2
PDF in Research • SPEC benchmarks and inputs used, but rules seldom followed exactly – PDF will continue regardless of admissibility in reported results • Some degree of profiling is taken as a given in many recent compiler and architecture works January 21, 2007 Paul Berube 3
An Opportunity to Improve • No PDF for base in CPU2006 – An opportunity to step back and consider • Current evaluation methodology for PDF is not rigorous – Dictated by inputs/rules provided in SPEC CPU – Usually followed when reporting PDF research January 21, 2007 Paul Berube 4
Current Methodology Static optimization input.ref Flag Tuning optimizing Test compiler peak_static January 21, 2007 Paul Berube 5
Current Methodology PDF optimization input.train input.ref Flag Tuning PDF Train optimizing Test Profile compiler peak_pdf Instrumenting compiler January 21, 2007 Paul Berube 6
Current Methodology PDF optimization input.train input.ref Flag Tuning PDF Train optimizing Test Profile compiler if (peak_pdf > peak_static) Instrumenting peak := peak_pdf; compiler January 21, 2007 Paul Berube 7
Current Methodology PDF optimization input.train input.ref Flag Tuning PDF Train optimizing Test Profile compiler if (peak_pdf > peak_static) Instrumenting peak := peak_pdf; compiler else peak := peak_static; January 21, 2007 Paul Berube 8
Current Methodology PDF optimization input.train input.ref Flag Tuning Is this PDF Train optimizing Test comparison Profile compiler sound? Does 1 training and 1 test input predict PDF performance? (peak_pdf > peak_static) if (peak_pdf > peak_static) Instrumenting (peak_pdf > other_pdf) peak := peak_pdf; compiler else peak := peak_static; January 21, 2007 Paul Berube 9
Current Methodology PDF optimization input.train input.ref Flag Tuning Variance Is this between inputs PDF Train optimizing Test comparison can be larger than Profile compiler sound? reported Does 1 training improvements! and 1 test input predict PDF performance? (peak_pdf > peak_static) if (peak_pdf > peak_static) Instrumenting (peak_pdf > other_pdf) peak := peak_pdf; compiler else peak := peak_static; January 21, 2007 Paul Berube 10
January 21, 2007 vs. Static combined bzip2 – Train on xml -6 10 12 -4 -2 compressed 0 2 4 6 8 docs gap graphic jpeg Paul Berube log mp3 mpeg pdf program random reuters source > 14% xml 11
PDF is like Machine Learning • Complex parameter space • Limited observed data (training) • Adjust parameters to match observed data – maximize expected performance January 21, 2007 Paul Berube 12
Evaluation of Learning Systems • Must take sensitivity to training and evaluation inputs into account – PDF specializes code according to training data – Changing inputs can greatly alter performance • Performance results must have statistical significance measures – Differentiate between gains/losses and noise January 21, 2007 Paul Berube 13
Overfitting • Specializing for the training data too closely • Exploiting particular properties of the training data that do not generalize • Causes: – insufficient quantity of training data – insufficient variation among training data – deficient learning system January 21, 2007 Paul Berube 14
Overfitting • Currently: ✗ Engineer the compiler to not overfit the single training data (underfitting) ✗ No clear rules for input selection ✗ Some benchmark authors replicate data between train and ref • Overfitting can be rewarded! January 21, 2007 Paul Berube 15
Criteria for Evaluation • Predict expected future performance • Measure performance variance • Do not reward overfitting • Same evaluation criteria as ML – Cross-validation addresses these criteria January 21, 2007 Paul Berube 16
Cross-Validation • Split a collection of inputs into two or more non-overlapping sets • Train on one set, test on the other set(s) • Repeat, using a different set for training Test Train January 21, 2007 Paul Berube 17
Leave-one-out Cross-Validation • If little data, reduce test set to 1 input – Leave N out: only N inputs in test Test Train January 21, 2007 Paul Berube 18
Cross-Validation • The same data is NEVER in both the training and the testing set – Overfitting will not enhance performance • Multiple evaluations allows statistical measure to be calculated on the results – Standard deviation, confidence intervals... • Set of training inputs allows system to exploit commonalities between inputs January 21, 2007 Paul Berube 19
Proposed Methodology • PDFPeak score, distinct from peak – Report with standard deviation • Provide a PDF workload – Inputs used for both training and evaluation, so “medium” sized (~2 min running time) – 9 inputs needed for meaningful statistical measures January 21, 2007 Paul Berube 20
Proposed Methodology • Split inputs into 3 sets (at design time) • For each input in each evaluation, calculate speedup compared to (non-PDF) peak • Calculate (over all evaluations) – mean speedup – standard deviation of speedups January 21, 2007 Paul Berube 21
Example PDF Workload (9 inputs): jpeg mpeg xml html text doc pdf source program January 21, 2007 Paul Berube 22
Example – Split workload PDF Workload A jpeg (9 inputs): xml pdf jpeg mpeg xml B mpeg html html text source doc pdf source C text program doc program January 21, 2007 Paul Berube 23
Example – Train and Run A Train Instrumenting compiler January 21, 2007 Paul Berube 24
Example – Train and Run A PDF Train optimizing Profile(A) compiler Instrumenting compiler January 21, 2007 Paul Berube 25
Example – Train and Run A B+C PDF Train optimizing Test Profile(A) compiler mpeg 1% html 5% text 4% Instrumenting compiler doc -3% source 4% program 2% January 21, 2007 Paul Berube 26
Example – Train and Run B A+C PDF Train optimizing Test Profile(B) compiler jpeg 4% Mpeg 2% xml -1% html 5% text 5% text 3% Instrumenting doc 1% compiler doc -7% pdf 4% source 1% program 1% program 1% January 21, 2007 Paul Berube 27
Example – Train and Run C A+B PDF Train optimizing Test Profile(C) compiler jpeg 5% jpeg 2% Mpeg 2% xml 2% xml -3% html 5% mpeg -1% text 2% text 3% Instrumenting html 3% doc 2% compiler doc -7% pdf 3% pdf 3% source 1% source 3% program-1% program 1% January 21, 2007 Paul Berube 28
Example – Evaluate doc 1% doc -3% html 3% html 5% Average: 2.33 jpeg 5% jpeg 4% mpeg -1% mpeg 1% pdf 3% pdf 4% program 1% program 2% source 3% source 4% text 5% text 4% xml -1% xml 2% January 21, 2007 Paul Berube 29
Example – Evaluate doc 1% doc -3% html 3% html 5% Average: 2.33 jpeg 5% jpeg 4% mpeg -1% mpeg 1% pdf 3% pdf 4% program 1% program 2% Std. Dev: 2.30 source 3% source 4% text 5% text 4% xml -1% xml 2% January 21, 2007 Paul Berube 30
Example – Evaluate doc 1% doc -3% html 3% html 5% Average: 2.33 jpeg 5% jpeg 4% mpeg -1% PDF improves performance: mpeg 1% • 2.33±2.30%, 17 times out of 25 pdf 3% • 2.33±4.60%, 19 times out of 20 pdf 4% program 1% program 2% Std. Dev: 2.30 source 3% source 4% text 5% text 4% xml -1% xml 2% January 21, 2007 Paul Berube 31
Example – Evaluate PDF improves performance: • 2.33±2.30%, 17 times out of 25 • 2.33±4.60%, 19 times out of 20 (peak_pdf > peak_static)? (new_pdf > other_pdf)? Depends on mean and variance of both! January 21, 2007 Paul Berube 32
Pieces of Effective Evaluation • Workload of inputs • Education about input selection – Rules and guidelines for authors • Adoption of a new methodology for PDF evaluation January 21, 2007 Paul Berube 33
Practical Concerns • Benchmark user – Many additional runs, but on smaller inputs – Two additional program compilation • Benchmark author – Most INT benchmarks use multiple data, and/or additional data is easily available – PDF input set could be used for REF January 21, 2007 Paul Berube 34
Conclusion • PDF is here: important for compilers and architecture, in research and in practice • The current methodology for PDF evaluation is not reliable • Proposed a methodology for meaningful evaluation January 21, 2007 Paul Berube 35
Thanks Questions? January 21, 2007 Paul Berube 36
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.