experimental algorithmics
play

Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for - PowerPoint PPT Presentation

Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for Graph Drawing January 2015 Catherine C. McGeoch Amherst College and D-Wave Systems Inc. Algorithms and Statistics Mismatch Statistics type Algorithm answer: type


  1. Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for Graph Drawing January 2015 Catherine C. McGeoch Amherst College and D-Wave Systems Inc.

  2. Algorithms and Statistics Mismatch Statistics – type Algorithm – answer: type question: With certainty at least 95%, Is the algorithm cost C(n) a C(100) < C(200), member of the set O(n log n)? assuming cost is normally * Upper bound distributed and on leading variance is term of the constant in n . cost function on problem size n.

  3. Categories of Statistics l Confirmatory : Assume a model, form a hypothesis about parameters of the model, measure confidence in the conclusions. l Descriptive : Find a concise summary of data: location, spread, distribution, trend lines. l Exploratory (EDA): Find patterns and relationships, generate hypotheses & conjectures about the model . l Graphical EDA : Emphasizes visualization tools.

  4. Experimental Algorithmics: Mostly Exploratory 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Modeling: Comparison: Trend Lines: Find the ``true'' Bigger/smaller Apply a concise underlying model. Location/ model. Extrapolate. spread. Interpolate. Exploratory + Confirmatory + Confirmatory + graphical descriptive descriptive methods. methods. methods.

  5. Find the Model First Fit algorithm for Bin Packing (ccm 1989). e(u,N) = empty space in bins, as a function of item size u, item count N. Contrary to previous conjectures: e(u, N) is not linear in u for large N . e(u, N) is not linear in N for all u < 1.

  6. Find the Model Min cuts in randomly weighted grid graphs. The mean occurs with probability zero. The result: discovery of bimodality and an explanation of observed mean runtime

  7. Find the Model Paired data points x, y. Correlation statistic r .83 < r < .87 Fairly strong positive correlation. But the model is quite different in each case!

  8. Algorithmic Experiments: The Good News Near total control of the experiment. Tons of data. Fairly simple causal models (input-output). Relatively strong theory (for runtime analysis anyway).

  9. Algorithmic Experiments: The Bad News High expectations from data analysis. Unusual and complex relationships. Standard statistical techniques do not address common questions.

  10. Methods for Algorithmic Experiments * * Emphasizing descriptive, exploratory & graphical techniques. Focus on exploratory (vs designed) experiments. I. How not to do it (anonymized). II. One methodological question: What should I measure?

  11. How Not to Do It 1 ``As conjectured, algorithm cost is asymptotically O(n) , except for a slight increase at large n . 1'' 1: CPU time increase is probably due to paging at large problem sizes.

  12. CPU Times Are Noisy One program, 9 identical instances (ordered by size), run on 12 common platforms. (11 other platforms couldn't run all instances.) src: DIMACS TSP Challenge

  13. What Not To Do 1 l Don't choose the wrong cost metric: CPU time ≠ dominant operation l Don’t fit regression lines to CPU times: step functions, increasing constants, skew, bimodality, etc. violate the model assumptions... l Don’t naively extrapolate from regression fits to asymptopia.

  14. How Not to Do It 2 G = (Input Graph) My experiments using C = (Initial Coloring) Culberson's Iterated Greedy code for graph Loop coloring. Reorder colors (Color rule) Reorder vertices (Vertex rule) Look at: variations in Vertex Rule C = GreedyColor[G] //save best C = Kempe reduction (k iterations) C = Revert rule (r iterations) Inputs: from a DIMACS Challenge (1993). until Stopping rule (r, max) Report C* best coloring found.

  15. DIMACS Challenge Graph Coloring Inputs (REG) = Register interference graphs from a compiler application. Optimal Colorings not known. This input.

  16. Culberson's Iterated Greedy Compare Five Vertex Rules Algorithm, scores for one input. Score = color count + `niceness’ term. Two rules look like random walks . Three rules converge after 1 iteration . Why do they converge so quickly? ... Report scores after every 100 iterations.

  17. Why do they converge so quickly? l Because the REG graphs are trivial to Adjacency Matrix solve. y l Optimality of one pass of simple greedy can be verified by 2 cliques eye! chained together x

  18. DIMACS Challenge Graph Coloring Inputs (REG) = Register interference graphs from a compiler application. Optimal colorings not known.... Because nobody tried running the simple greedy algorithm on these problems.

  19. How Not to Do It 2 l Don't assume that good performance is due to your wonderful new algorithm. It could be due to easy inputs. l Don’t overlook simple/baseline algorithms. l Don't just look at the final result – look at the mechanics. l Don't pull parameters (iteration count) out of your hat. My runtimes were 700x bigger than necessary.

  20. How Not to Do It 3 The First Fit u Heuristic for Bin Packing. L = list of n weights 0 uniformly distributed on (0, u) . FF: Consider weights in order of arrival; pack each into the first (leftmost) bin that can contain it. How well does FF pack them into unit- capacity bins, as a function of u, asymptotically in n ?

  21. First Fit Bin Packing Experiments circa 1978: Pack n =1000 weights, u =.25, .5, .75, 1, 20 trials each. Measure B : bins used in the packing. Cost = Bins / OPT Cost < Bins / Weights Known: FF is optimal at u =1. Conjecture: FF is optimal for all u .

  22. First Fit Bin Packing Ten years later: n =100,000 weights, u = .2, .22, ... .98 1.0. Measure Empty Space: E = B – Weightsum. Observe = Empty space is not linear in u! The ``peak'' grows as n increases; the valley disappears. New Conjecture: FF is optimal u=1, but nowhere else.

  23. How Not To Do It 3 l Don’t assume you are looking at asymptopia. l Don’t look at one cost metric. Here, a difference gives a better view than a ratio. Cost = Bins / Weights EmptySpace = Bins – Weights

  24. Don’t Be That Guy: What to Avoid Looking at the wrong metric. Looking at just one metric. Looking at summarized instead of raw data. Reasoning incorrectly about cause and effect. Thinking the model is a true description of the data --- Use exploratory experiments to build understanding.

  25. Experimental Algorithm Evaluation I. How not to do it. II. What should I measure? Matching goals to choices. Reducing design complexity. Variance reduction techniques.

  26. Factors that affect experimental outcomes in algorithmics Metrics . Quantities that are measured as performance indicators: e.g. time, solution quality. Today! Input . Category, provenance, size n , more. Algorithm/code. Data structure, tuning, rules. Environment . Language, compiler, memory hierarchy, etc.

  27. What Should I Measure? Theory: The ocean is at most O( n 3 ) feet deep when n feet from shore. Practice: New York Harbor is 24.1532 feet deep at its entrance. The Atlantic Ocean is 12,881.82892 feet deep in one spot. It is 102.03901 feet deep in another ....

  28. What Should I Measure? Match the performance Accurate. indicator to the research goals. There is usually a tradeoff: Theory = accurate Precise (always true) Practice = precise (many decimal places).

  29. Flavors of Experiments Field experiments : Laboratory experiments: l Observe real-world l Isolate components phenomena l Manipulate parameters l Describe results l Cause/effect l Classify outcomes l Build models

  30. Levels of Instantiation Program Algorithm Process Quicksort A[lo,hi]: void Qsort(A, lo, hi) { if (lo >= hi) return; x is an element of A int p = Partition(A); Qsort(A, lo, p-1); Partition A around x Qsort(A, p+1, hi); Recur to left of x } Recur to right of x ... Paradigm ... Algorithm ... Data Structures ... Source ... Object ... Process ...

  31. What Should I Measure? Lab Experiment: Accurate = Dominant Isolated components operation counts. of the algorithm, abstract costs, simple generated inputs. Field Experiment : Precise = CPU Instantiated code, times. whole costs, realistic inputs.

  32. Time Performance Indicators l Theory's dominant op. l Data structure ops. l Function calls. l Main loop iterations. l Code block counts. l CPU time l Memory accesses l Cache & page misses l Wall clock time l ... experimenters have many choices !

  33. How to Choose a PI l Enough precision to distinguish outcomes among competing algorithms. l Choose PIs that are directly comparable across several algorithms. l Lab : PI should isolate the interesting factors and ignore irrelevant factors. . l Field : PI should measure the whole cost. l Choose indicators that match the literature. l OK to use more than one.

  34. Multiple Performance Indicators worst x = number of edge crossings A y = aspect ratio C z = user task . score B Which algorithm is best ? best

  35. Multiple Performance Indicators Two strategies for reducing dimensionality in data: 1. Transform numerical to categorical data and stratify. 2. Project to a lower dimension. low med high

Recommend


More recommend