putting the science in computer science
play

Putting the Science in Computer Science What makes for a good - PowerPoint PPT Presentation

Putting the Science in Computer Science What makes for a good program, and how can we measure / evaluate programs for goodness? Write as many definitions of good as you can, and describe how you would measure each one.


  1. Putting the “Science” in Computer Science What makes for a good program, and how can we measure / evaluate programs for “goodness”? Write as many definitions of “good” as you can, and describe how you would measure each one. Firstname Lastname Th. 10 / 13 (Your response)

  2. Given a computational problem Is there a solution? What is it? How good is it?

  3. Is it e ffi cient?

  4. Data: which algorithm is best? Lower is better � � � � � � � � 2 4 6 8 Problem Size

  5. Data: which algorithm is best? Lower is better � � � � � � � � � � � � � � � � � � � � � � � � 2 4 6 8 Problem Size

  6. Data: which algorithm is best? Lower is better � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 5 10 15 20 Problem Size

  7. Data: which algorithm is best? � � � � � � Lower is better � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 10 20 30 40 Problem Size

  8. Data: which algorithm is best? � Lower is better � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 20 40 60 80 Problem Size

  9. Data: which algorithm is best? � Lower is better � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 20 40 60 80 100 120 Problem Size

  10. Data: which algorithm is best? � Lower is better � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 50 100 150 Problem Size

  11. Data: which algorithm is best? � Lower is better � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 50 100 150 200 Problem Size

  12. Data: which algorithm is best? � Lower is better � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 100 200 300 Problem Size

  13. Data: which algorithm is best? � � � � Lower is better � � � � � � � � � � � � � � � � � � � � ���� � � � � � � � � � � � � � � � � � � � � � � � � 200 400 600 800 Problem Size

  14. Interpreting empirical data Key take-away: it’s messy and incomplete ! We can measure • a particular algorithm • written in a particular language • as a particular program • compiled using a particular version of a particular compiler • with particular settings (e.g., enabling / disabling optimizations) • running on a particular data set , of a particular size • on a particular computer • with particular resources (CPUs, memory, hard drive, …) • under a particular version of a particular operating system • in a particular environment 
 e.g., with other programs running in the background

  15. Interpreting a theoretical model Key take-away: it’s lossy ! A theory abstracts away certain details. cost metric : • corresponds to one “step” • highlights the essence of the work e.g., multiplications, comparisons, function calls… • serves as a proxy for an empirical measurement Instead of measuring time, we count steps. e.g., “This algorithm costs n 2 multiplications.”

  16. good data + good theory = good science we can make predictions and 
 we can communicate with other scientists

  17. General Decidability e.g., Can it be solved at all ? Complexity Class CS 81 
 CS 140 
 e.g., Can it be solved in polynomial time ? MA 167 Asymptotic Analysis “Big O” e.g., O(n) time , where n is list size CS 42 
 CS 70 Exact Theory Recurrence 
 relation e.g., 7n + 2 multiplications , where n is list size Empirical Data CS 105 
 HPC e.g., This run took 17.3 seconds on this data. Specific

  18. Asymptotic Analysis (Big O)

  19. Asymptotic analysis We’re always answering the same question: How does the cost scale 
 (when we try larger and larger inputs)? Not: • Exactly how many steps will it execute? • How many seconds will it take? • How many megabytes of memory will it need?

  20. Ti e informal de fj nition of “Big O” A reasonable upper bound on 
 (an abstraction of) 
 a problem’s di ff iculty or 
 a solution’s performance, 
 for reasonably large input sizes.

  21. In the limit (for VERY LARGE inputs) The running time is bounded 
 O(1) regardless of the input size. An input twice as big takes 
 O(n) no more than twice as long. An input twice as big takes 
 O(n 2 ) no more than four times as long. An input one bigger takes 
 O(2 n ) no more than twice as long.

  22. If We Only Care About Scalability… What are the consequences? Constant factors can be ignored. n and 6n and 200n scale identically (“linearly”) 
 Small summands can be ignored. 
 n 2 and n 2 + n + 999999 are indistinguishable when n is huge.

  23. Grouping Algorithms by Scalability takes 6 steps takes 1 (big) step O(1) no more than 4000 steps somewhere between 2 and 47 steps, depending on the input takes 100n + 3 steps O(n) takes n/20 + 10,000,000 steps anywhere between 3 and 68 steps per item, for n items. takes 2n 2 + 100n + 3 steps takes n 2 /17 steps O(n 2 ) somewhere between 1 and 40 steps per item, for n 2 items anywhere between 1 and 7n steps per item, for n items.

  24. How hard is the problem? O(n n ) Intractable problems 
 O(n!) (exponential) O(2 n ) O(n 3 ) O(n 2 ) Tractable problems 
 (polynomial) O(n log(n)) O(n) O( √ n) No problem! O(log(n)) O(1)

  25. logs aren’t scary! Ti ey’re our friends. log2(1) = 0 // 2 0 = 1 log2(2) = 1 // 2 1 = 2 log is the inverse of exponentiation. log2(3) ≈ 1.58 How many times can I cut N in half? log2(4) = 2 // 2 2 = 4 Can I avoid looking at all the input?! log2(5) ≈ 2.32 70.00 log2(6) ≈ 2.58 log2(7) ≈ 2.81 46.67 log2(8) = 3 // 2 3 = 8 23.33 log 0.00 s-media-cache-ak0.pinimg.com/736x/5d/f7/6d/5df76d1672ccdffc74af2e2bf55330aa.jpg 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63

  26. How hard are these problems ? cost metric cost double multiplications sum additions half-count divisions

  27. How hard are these problems ? cost metric cost double multiplications O(1) sum additions O(n) half-count divisions O(log n)

  28. What’s the cost, T, for each function? double 
 sum 
 half-count 
 multiplications additions divisions (define (double n) T(0) n/a (* n 2)) T(1) T(2) (define (sum n) (if (= n 0) T(3) 0 (+ n (sum (- n 1))))) T(4) … (define (half-count n) (if (= n 1) T(n) 0 (+ 1 (half-count (quotient n 2)))))

  29. What’s the cost, T, for each function? double 
 sum 
 half-count 
 multiplications additions divisions (define (double n) T(0) 1 0 n/a (* n 2)) T(1) 1 1 0 T(2) 1 2 1 (define (sum n) (if (= n 0) T(3) 1 3 1 0 (+ n (sum (- n 1))))) T(4) 1 4 2 … … … … (define (half-count n) (if (= n 1) ⌊ log 2 n ⌋ T(n) 1 n 0 (+ 1 (half-count (quotient n 2)))))

  30. Recurrence Relations 
 (translating code to math)

  31. Translating recursion to recurrence relations For a given cost metric: additions 1. Translate the base case(s), using specific input sizes How many steps does this base case take? 2. Translate the recursive case(s), using input size N Define T(N) in terms of smaller cost recurrence relation ( define (sum n) base case → T(0) = 1 input size ( if (= n 0) recursive case → T(N) = 3 + T(N-1) 0 ( + n (sum (- n 1)))))

  32. Translating recursion to recurrence relations For a given cost metric: additions 1. Translate the base case(s), using specific input sizes How many steps does this base case take? 2. Translate the recursive case(s), using input size N Define T(N) in terms of smaller cost recurrence relation ( define (sum n) base case → T(0) = 0 input size ( if (= n 0) recursive case → T(N) = 1 + T(N-1) 0 ( + n (sum (- n 1))))) T(N) = 1 + T(N-1) = 1*1 + T(N-1) closed form T(N) = 1 + 1 + T(N-2) = 2*1 + T(N-2) asymptotic form T(N) = 1 + 1 + 1 + T(N-3) = 3*1 + T(N-3) T(N) … … T(N) = 1 + 1 + 1 + … 1 + T(N-N) = N*1 + T(N-N) = N ∈ O(N)

  33. Translating recursion to recurrence relations For a given cost metric: arithmetic operations and comparisons 1. Translate the base case(s), using specific input sizes How many steps does this base case take? 2. Translate the recursive case(s), using input size N Define T(N) in terms of smaller cost recurrence relation ( define (sum n) base case → T(0) = 1 input size ( if ( = n 0) recursive case → T(N) = 2 + T(N-1) 0 ( + n (sum ( - n 1)))))

Recommend


More recommend