BBM 202 - ALGORITHMS T ODAY ‣ Analysis of Algorithms ‣ Observations D EPT . OF C OMPUTER E NGINEERING ‣ Mathematical models ‣ Order-of-growth classifications ‣ Dependencies on inputs ‣ Memory A NALYSIS OF A LGORITHMS Feb. 16, 2017 Acknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton University. Cast of characters Running time Programmer needs to develop “ As soon as an Analytic Engine exists, it will necessarily guide the future a working solution. course of the science. Whenever any result is sought by its aid, the question will arise—By what course of calculation can these results be arrived at by Student might play the machine in the shortest time? ” — Charles Babbage (1864) any or all of these Client wants to solve roles someday. problem efficiently. how many times do you Theoretician wants have to turn the crank? to understand. Basic blocking and tackling is sometimes necessary. [this lecture] Analytic Engine 3 4
Reasons to analyze algorithms Some algorithmic successes Predict performance. Discrete Fourier transform. • Break down waveform of N samples into periodic components. • Applications: DVD, JPEG, MRI, astrophysics, …. Compare algorithms. this course (BBM 202) • Brute force: N 2 steps. • FFT algorithm: N log N steps, enables new technology. Friedrich Gauss Provide guarantees. 1805 time quadratic Understand theoretical basis. 64T Analysis of algorithms (BBM 408) 32T Primary practical reason: avoid performance bugs. 16T linearithmic 8T linear size 1K 2K 4K 8K client gets poor performance because programmer did not understand performance characteristics • sFFT: Sparse Fast Fourier Transform algorithm (Hassanieh et al., 2012) - A faster Fourier Transform: k log N steps (with k sparse coefficients) 5 6 Some algorithmic successes The challenge N-body simulation. Q. Will my program be able to solve a large practical input? • Simulate gravitational interactions among N bodies. • Brute force: N 2 steps. • Barnes-Hut algorithm: N log N steps, enables new research. Andrew Appel Why does it run out of memory ? Why is my program so slow ? PU '81 time quadratic 64T 32T 16T linearithmic 8T Key insight. [Knuth 1970s] Use scientific method to understand linear performance. size 1K 2K 4K 8K 7 8
Scientific method applied to analysis of algorithms A NALYSIS OF A LGORITHMS A framework for predicting performance and comparing algorithms. ‣ Observations ‣ Mathematical models Scientific method. ‣ Order-of-growth classifications • Observe some feature of the natural world. ‣ Dependencies on inputs • Hypothesize a model that is consistent with the observations. ‣ Memory • Predict events using the hypothesis. • Verify the predictions by making further observations. • Validate by repeating until the hypothesis and observations agree. Principles. Experiments must be reproducible. Hypotheses must be falsifiable. Feature of the natural world = computer itself. 9 Example: 3-sum 3-sum: brute-force algorithm 3-sum. Given N distinct integers, how many triples sum to exactly zero? public class ThreeSum { public static int count(int[] a) { int N = a.length; a[i] a[j] a[k] sum % more 8ints.txt int count = 0; 8 30 -40 10 0 1 for (int i = 0; i < N; i++) 30 -40 -20 -10 40 0 10 5 for (int j = i+1; j < N; j++) 30 -20 -10 0 2 for (int k = j+1; k < N; k++) check each triple % java ThreeSum 8ints.txt -40 40 0 0 if (a[i] + a[j] + a[k] == 0) for simplicity, ignore 3 4 count++; integer overflow -10 0 10 0 4 return count; } public static void main(String[] args) { int[] a = In.readInts(args[0]); StdOut.println(count(a)); } } Context. Deeply related to problems in computational geometry. 11 12
Measuring the running time Measuring the running time % java ThreeSum 1Kints.txt Q. How to time a program? Q. How to time a program? A. Manual. A. Automatic. tick tick tick 70 % java ThreeSum 2Kints.txt public class Stopwatch (part of stdlib.jar ) tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick Stopwatch() create a new stopwatch 528 double elapsedTime() % java ThreeSum 4Kints.txt time since creation (in seconds) tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick public static void main(String[] args) tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick { tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick int[] a = In.readInts(args[0]); tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick Stopwatch stopwatch = new Stopwatch(); tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick StdOut.println(ThreeSum.count(a)); tick tick tick tick tick tick tick tick double time = stopwatch.elapsedTime(); tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick } tick tick tick tick tick tick tick tick client code tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick tick 4039 13 14 Measuring the running time Empirical analysis Q. How to time a program? Run the program for various input sizes and measure running time. A. Automatic. N time (seconds) † public class Stopwatch (part of stdlib.jar ) 250 0 Stopwatch() create a new stopwatch 500 0 double elapsedTime() time since creation (in seconds) 1.000 0,1 2.000 0,8 public class Stopwatch { 4.000 6,4 private final long start = System.currentTimeMillis(); 8.000 51,1 public double elapsedTime() 16.000 ? { long now = System.currentTimeMillis(); return (now - start) / 1000.0; } } implementation (part of stdlib.jar ) 15 16
Data analysis Data analysis Standard plot. Plot running time T ( N ) vs. input size N . Log-log plot. Plot running time T ( N ) vs. input size N using log-log scale. log-log plot 51.2 straight line of slope 3 25.6 standard plot 50 12.8 lg( T ( N )) = b lg N + c b = 2.999 6.4 lg( T ( N )) c = -33.2103 40 3.2 running time T ( N ) 1.6 T ( N ) = a N b , where a = 2 c 30 .8 .4 20 .2 .1 10 1K 2K 4K 8K lg N power law 1K 2K 4K 8K Regression. Fit straight line through data points: a N b . problem size N slope Hypothesis. The running time is about 1.006 × 10 –10 × N 2.999 seconds. 17 18 Prediction and validation Doubling hypothesis Hypothesis. The running time is about 1.006 × 10 –10 × N 2.999 seconds. Doubling hypothesis. Quick way to estimate b in a power-law relationship. Run program, doubling the size of the input. "order of growth" of running time is about N 3 [stay tuned] Predictions. time (seconds) † N ratio lg ratio • 51.0 seconds for N = 8,000 . 250 0 – • 408.1 seconds for N = 16,000 . 500 0 4,8 2,3 1.000 0,1 6,9 2,8 Observations. N time (seconds) † 2.000 0,8 7,7 2,9 8.000 51,1 4.000 6,4 8 3 8.000 51 8.000 51,1 8 3 8.000 51,1 16.000 410,8 seems to converge to a constant b ≈ 3 validates hypothesis! Hypothesis. Running time is about a N b with b = lg ratio . Caveat. Cannot identify logarithmic factors with doubling hypothesis. 19 20
Doubling hypothesis Experimental algorithmics Doubling hypothesis. Quick way to estimate b in a power-law hypothesis. System independent effects. • Algorithm. determines exponent b • Input data. in power law Q. How to estimate a (assuming we know b ) ? A. Run the program (for a sufficient large value of N ) and solve for a . System dependent effects. determines constant a in power law • Hardware: CPU, memory, cache, … • Software: compiler, interpreter, garbage collector, … N time (seconds) † • System: operating system, network, other applications, … 8.000 51,1 51.1 = a × 8000 3 8.000 51 ⇒ a = 0.998 × 10 –10 8.000 51,1 Bad news. Difficult to get precise measurements. Good news. Much easier and cheaper than other sciences. Hypothesis. Running time is about 0.998 × 10 –10 × N 3 seconds. almost identical hypothesis e.g., can run huge number of experiments to one obtained via linear regression 21 22 In practice, constant factors matter too! A NALYSIS OF A LGORITHMS Q. How long does this program take as a function of N ? ‣ Observations ‣ Mathematical models ‣ Order-of-growth classifications String s = StdIn.readString(); int N = s.length(); ‣ Dependencies on inputs ... ‣ Memory for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) distance[i][j] = ... ... N time N time 1.000 0,11 250 0,5 2.000 0,35 500 1,1 4.000 1,6 1.000 1,9 8.000 6,5 2.000 3,9 Jenny ~ c 1 N 2 seconds Kenny ~ c 2 N seconds 23
Recommend
More recommend