Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for Graph Drawing January 2015 Catherine C. McGeoch Amherst College and D-Wave Systems Inc.
Algorithms and Statistics Mismatch Statistics – type Algorithm – answer: type question: With certainty at least 95%, Is the algorithm cost C(n) a C(100) < C(200), member of the set O(n log n)? assuming cost is normally * Upper bound distributed and on leading variance is term of the constant in n . cost function on problem size n.
Categories of Statistics l Confirmatory : Assume a model, form a hypothesis about parameters of the model, measure confidence in the conclusions. l Descriptive : Find a concise summary of data: location, spread, distribution, trend lines. l Exploratory (EDA): Find patterns and relationships, generate hypotheses & conjectures about the model . l Graphical EDA : Emphasizes visualization tools.
Experimental Algorithmics: Mostly Exploratory 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Modeling: Comparison: Trend Lines: Find the ``true'' Bigger/smaller Apply a concise underlying model. Location/ model. Extrapolate. spread. Interpolate. Exploratory + Confirmatory + Confirmatory + graphical descriptive descriptive methods. methods. methods.
Find the Model First Fit algorithm for Bin Packing (ccm 1989). e(u,N) = empty space in bins, as a function of item size u, item count N. Contrary to previous conjectures: e(u, N) is not linear in u for large N . e(u, N) is not linear in N for all u < 1.
Find the Model Min cuts in randomly weighted grid graphs. The mean occurs with probability zero. The result: discovery of bimodality and an explanation of observed mean runtime
Find the Model Paired data points x, y. Correlation statistic r .83 < r < .87 Fairly strong positive correlation. But the model is quite different in each case!
Algorithmic Experiments: The Good News Near total control of the experiment. Tons of data. Fairly simple causal models (input-output). Relatively strong theory (for runtime analysis anyway).
Algorithmic Experiments: The Bad News High expectations from data analysis. Unusual and complex relationships. Standard statistical techniques do not address common questions.
Methods for Algorithmic Experiments * * Emphasizing descriptive, exploratory & graphical techniques. Focus on exploratory (vs designed) experiments. I. How not to do it (anonymized). II. One methodological question: What should I measure?
How Not to Do It 1 ``As conjectured, algorithm cost is asymptotically O(n) , except for a slight increase at large n . 1'' 1: CPU time increase is probably due to paging at large problem sizes.
CPU Times Are Noisy One program, 9 identical instances (ordered by size), run on 12 common platforms. (11 other platforms couldn't run all instances.) src: DIMACS TSP Challenge
What Not To Do 1 l Don't choose the wrong cost metric: CPU time ≠ dominant operation l Don’t fit regression lines to CPU times: step functions, increasing constants, skew, bimodality, etc. violate the model assumptions... l Don’t naively extrapolate from regression fits to asymptopia.
How Not to Do It 2 G = (Input Graph) My experiments using C = (Initial Coloring) Culberson's Iterated Greedy code for graph Loop coloring. Reorder colors (Color rule) Reorder vertices (Vertex rule) Look at: variations in Vertex Rule C = GreedyColor[G] //save best C = Kempe reduction (k iterations) C = Revert rule (r iterations) Inputs: from a DIMACS Challenge (1993). until Stopping rule (r, max) Report C* best coloring found.
DIMACS Challenge Graph Coloring Inputs (REG) = Register interference graphs from a compiler application. Optimal Colorings not known. This input.
Culberson's Iterated Greedy Compare Five Vertex Rules Algorithm, scores for one input. Score = color count + `niceness’ term. Two rules look like random walks . Three rules converge after 1 iteration . Why do they converge so quickly? ... Report scores after every 100 iterations.
Why do they converge so quickly? l Because the REG graphs are trivial to Adjacency Matrix solve. y l Optimality of one pass of simple greedy can be verified by 2 cliques eye! chained together x
DIMACS Challenge Graph Coloring Inputs (REG) = Register interference graphs from a compiler application. Optimal colorings not known.... Because nobody tried running the simple greedy algorithm on these problems.
How Not to Do It 2 l Don't assume that good performance is due to your wonderful new algorithm. It could be due to easy inputs. l Don’t overlook simple/baseline algorithms. l Don't just look at the final result – look at the mechanics. l Don't pull parameters (iteration count) out of your hat. My runtimes were 700x bigger than necessary.
How Not to Do It 3 The First Fit u Heuristic for Bin Packing. L = list of n weights 0 uniformly distributed on (0, u) . FF: Consider weights in order of arrival; pack each into the first (leftmost) bin that can contain it. How well does FF pack them into unit- capacity bins, as a function of u, asymptotically in n ?
First Fit Bin Packing Experiments circa 1978: Pack n =1000 weights, u =.25, .5, .75, 1, 20 trials each. Measure B : bins used in the packing. Cost = Bins / OPT Cost < Bins / Weights Known: FF is optimal at u =1. Conjecture: FF is optimal for all u .
First Fit Bin Packing Ten years later: n =100,000 weights, u = .2, .22, ... .98 1.0. Measure Empty Space: E = B – Weightsum. Observe = Empty space is not linear in u! The ``peak'' grows as n increases; the valley disappears. New Conjecture: FF is optimal u=1, but nowhere else.
How Not To Do It 3 l Don’t assume you are looking at asymptopia. l Don’t look at one cost metric. Here, a difference gives a better view than a ratio. Cost = Bins / Weights EmptySpace = Bins – Weights
Don’t Be That Guy: What to Avoid Looking at the wrong metric. Looking at just one metric. Looking at summarized instead of raw data. Reasoning incorrectly about cause and effect. Thinking the model is a true description of the data --- Use exploratory experiments to build understanding.
Experimental Algorithm Evaluation I. How not to do it. II. What should I measure? Matching goals to choices. Reducing design complexity. Variance reduction techniques.
Factors that affect experimental outcomes in algorithmics Metrics . Quantities that are measured as performance indicators: e.g. time, solution quality. Today! Input . Category, provenance, size n , more. Algorithm/code. Data structure, tuning, rules. Environment . Language, compiler, memory hierarchy, etc.
What Should I Measure? Theory: The ocean is at most O( n 3 ) feet deep when n feet from shore. Practice: New York Harbor is 24.1532 feet deep at its entrance. The Atlantic Ocean is 12,881.82892 feet deep in one spot. It is 102.03901 feet deep in another ....
What Should I Measure? Match the performance Accurate. indicator to the research goals. There is usually a tradeoff: Theory = accurate Precise (always true) Practice = precise (many decimal places).
Flavors of Experiments Field experiments : Laboratory experiments: l Observe real-world l Isolate components phenomena l Manipulate parameters l Describe results l Cause/effect l Classify outcomes l Build models
Levels of Instantiation Program Algorithm Process Quicksort A[lo,hi]: void Qsort(A, lo, hi) { if (lo >= hi) return; x is an element of A int p = Partition(A); Qsort(A, lo, p-1); Partition A around x Qsort(A, p+1, hi); Recur to left of x } Recur to right of x ... Paradigm ... Algorithm ... Data Structures ... Source ... Object ... Process ...
What Should I Measure? Lab Experiment: Accurate = Dominant Isolated components operation counts. of the algorithm, abstract costs, simple generated inputs. Field Experiment : Precise = CPU Instantiated code, times. whole costs, realistic inputs.
Time Performance Indicators l Theory's dominant op. l Data structure ops. l Function calls. l Main loop iterations. l Code block counts. l CPU time l Memory accesses l Cache & page misses l Wall clock time l ... experimenters have many choices !
How to Choose a PI l Enough precision to distinguish outcomes among competing algorithms. l Choose PIs that are directly comparable across several algorithms. l Lab : PI should isolate the interesting factors and ignore irrelevant factors. . l Field : PI should measure the whole cost. l Choose indicators that match the literature. l OK to use more than one.
Multiple Performance Indicators worst x = number of edge crossings A y = aspect ratio C z = user task . score B Which algorithm is best ? best
Multiple Performance Indicators Two strategies for reducing dimensionality in data: 1. Transform numerical to categorical data and stratify. 2. Project to a lower dimension. low med high
Recommend
More recommend