Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3. What's next?
A problem • Find the (upper) median of a list of n items. • Upper median means "if the list has an even number of items, pick the one that's from the bottom, rather than s from the bottom" • Obvious solution: sort, then pick the middle item. • Seems like more work than is needed. • Generalize ('strengthen the recursion'): SELECT( , S): find the th smallest in a set of items. • Illustrate with sets of numbers, ordered smallest to largest
A SELECT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) • median-of-medians, analysis from hell • complicated, hard to believe it's worth implementing
Surprising simpler algorithm • RandSelect(k, S) • Pick a random item in your set, S • Partition into set of numbers less than , and set of those greater than • If has at least items: RandSelect( ) • If has k-1 items: return • Otherwise, RandSelect( , ) • Works in "expected linear time" because on average, the size of the larger partition is ¾ size of the set. Work is (roughly) � � � � � • . �� � � � � � �
Challenges • Where do you get a random number in a functional programming language? • Once you have it, how do you test a procedure that depends on randomness?
Big idea! (More in CS18) • Randomized algorithms are often simpler than deterministic ones • Deep philosophical question: why does adding a stream of randomness make tasks easier?
http://www.eatingwell.com/recipe/267339/citrus-sherbet/
Why test-cases? • Jack Wrenn, PhD thesis proposal
• If a student's misconception is consistently reflected both in their example and in their implementation… • …then it will not be detected by adapting those examples as test cases. • That point motivates Jack's dissertation work (Executable examples), but…
• If a student's misconception is reflected in their implementation… • …and the examples are created post-implementation, then • …the examples are certain to enforce the misconception .
What's next?
Where we are now
Programming • Basic constructs like "procedure" • if-then-else and cond • let-expressions ("local values") • Recursion in multiple forms • Functions as first-class entities (lambda) • Higher-order procedures • Modules as a way to gather types/data/procedures together
Data structures • Lists • Recursive definition leads to recursive code structure • Recursive definition leads to recurrence relations in analysis • Tuples • Trees • Recursive definition leads to recurrence relations in analysis, often with a factor of 2 • Balanced or ordered trees can help speed things up
Analysis • Go from code to recurrence • "Solve" a few classes of recurrence relations • Use plug-n-chug to guess solution • Use big-O to represent 'fairly equivalent' program performance
Algorithms • Fast-reverse • Insertion, Selection, and Merge Sort • Exhaustion (subsets!) • Minimax • Tree-search • General trees • BSTs • Tree traversal
Problem approaches • Design Recipe • Recursion • Recursion Diagrams • Recursion on any kind of structured data • Natural numbers • Lists • Trees • Divide-and-Conquer • Recursion is a special case • Data-hiding (via modules) • Decomposition • Using helper procedures to achieve a larger goal
What's next?
What's next? Programming • Problems get bigger • Hundreds or thousands of lines of code in a program • Programs themselves become complex objects worthy of study! • Software engineering • Programming techniques that support large and complex software • Object-oriented programming (CS18) • Event-driven programming (most web stuff) • …
What's next? Data structures • Lists, trees are very simple • Amenable to recursion approaches • Build on these: heaps, priority queues, … • Generalize: • Directed acyclic graphs • Prerequisite structure in course requirements make a good example • Directed graphs • Streets in a city (some of them one-way) for example • Edges often "labelled" with data like "how long to traverse this one block stretch?" • Problems like "find shortest path" (i.e., quickest route from here to there) • … [CS1570]
What's next? Analysis • Analysis of probabilistic programs like RandSelect • Analysis of performance of more complicated data structures • Analysis of algorithms like shortest-path • Study of "effective" solutions to (some instances of) provably hard problems
What's next? Algorithms • How does Google work? • How does Facebook choose which ads to show you? • How do we recognize unusual behaviors? • Securities fraud • Crime • How do you make a drone deliver a package? • How does Disney/Pixar make Frozen II ?
A shift in style • In CS17, we've been very concrete: let's sort this list of numbers, let's find an integer in a tree with int-values at nodes, etc. • ADTs moved away from this a little: we have a Dictionary, but we don't know the details, only the runtime-performance • In general CS work, the gap between the real world and the code is much greater
A conceptual gap • The internet consists of a bunch of computers tied together by network connections from computers to routers (specialized computers that can pass data from one machine to another) • The routers are interconnected as well • The connections come and go; some are permanent, some are very temporary • How do we get data from my computer to yours? • We'll work out an algorithm in which we somehow represent what a router is or can do , but in discussing the algorithm, we'll just draw pictures, etc. • Leave implementation for later
Example problem and algorithm • We have a bunch of data: • We'd like to "classify" it into clusters (red dots could be cluster centers)
Idea • First, decide how many clusters (by hand?) • really annoying assumption, relieved by fancier methods • For our example, pick k = 2. • grab ANY TWO points in the dataset as "centers"
Divide the data into those closer to each point
For each group, find the "mean"
Using these new means, reclassify!
Repeat until stabilized
What didn't I mention? • How to find distances • Are data points stored in a list? An array? A tree? • What are the piles we created? • Are data points lists of ints? of floats? Are they tuples? • Are they all 2-dimensional? Could this work in 3D? in 10D?
Skills • Whatever math is needed • Whatever else is needed • For graphics: physics, • An ability to guess some representation of the problem that might work • The ability to translate a pictorial record of a discussion into an actual algorithm ("pseudocode") and then a real program ("code") • Analysis (during and after the fact)
Recommend
More recommend