median finding
play

Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. - PowerPoint PPT Presentation

Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. Median finding Testing iroot on interval 1, 2, 3, 4, 5 suppose function values for some procedure f are 7, -2, -8, 5, -3 checkExpect(iroot(1, 4, f), ???) ? let k =


  1. Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. Median finding

  2. Testing iroot • on interval 1, 2, 3, 4, 5 suppose function values for some procedure f are • 7, -2, -8, 5, -3 • checkExpect(iroot(1, 4, f), ???) ? let k = iroot(1, 4, f); checkExpect(f(k)*f(k+1) <= 0, true);

  3. Analyze backboneSimilar • Let B(n) be the max number of operations involved in evaluating backboneSimilar(t1, t2), where n is the number of nodes/leaves in the larger of t1 and t2.

  4. Analyze backboneSimilar • We're going to apply backboneSimilar to the left and right subtrees. If the left subtree has k items, the right has n-k-1 (the -1 for the item at the current node!) • But we don't know what k is • Could be any number from 1 to n-1 • ���…���

  5. Analyze backboneSimilar ���…��� How much work are we really doing? How often do we "visit" each node of t1? At most once, right? And all we do is test whether it has children or not! Seems like total work at least as long as if not, we can increase c to make it at least as big as a.

  6. Usual well-ordering proof • Suppose that ���…��� and . Then I claim that for Let be the set of all natural numbers for which (*) is false. Observe that 1 is not in S. Suppose S nonempty, and we'll arrive at a contraduction. Let be the least element of S (well-ordering). Then (*) holds for n = 1…h-1

  7. Usual well-ordering proof • Suppose that 𝐶 1 = 𝑏 𝐶 𝑜 ≤ 𝑑 + ���…��� ( 𝐶 𝑙 + 𝐶(𝑜 − 𝑙 − 1)) max and 𝑏 ≤ 𝑑 . Claim ∗ 𝐶(𝑜) ≤ 𝑑𝑜 for 𝑜 = 1, 2, … Let 𝑇 be the set of all natural numbers for which (*) is false. Let ℎ be the least element of S (well- ordering). Then (*) holds for n = 1…h-1. What's 𝐶(ℎ) ? Well, 𝐶 ℎ ≤ 𝑑 + ���…��� ( 𝐶 𝑙 + 𝐶(ℎ − 𝑙 − 1)) max = 𝑑 + ���…��� ( 𝑑𝑙 + 𝑑(ℎ − 𝑙 − 1)) max = 𝑑 + ���…��� ( 𝑑𝑙 + 𝑑(ℎ − 1) − 𝑑𝑙) max = 𝑑 + ���…��� ( 𝑑(ℎ − 1)) = 𝑑 + 𝑑(ℎ − 1) = 𝑑ℎ. max Contradiction!

  8. Median-finding

  9. Warmup: ceilings For , we have , hence for . � � � � For , we have � and � � � � � Reason: apply previous result to � and � . � � � For , we have � � �� ⌈ � � ⌉ � � � � � Reason: apply previous result to � to get � = � . � � � � � � Then apply part 2 to to get �� . � �

  10. � � � • For , we have � � �� � � �� • For , we have � � �� � � �� • For , we have � � �� � � �� • For , we have � � ��

  11. Last facts about ceilings • For , we have In particular � � • For , we have � �

  12. A problem • Find the (upper) median of a list of n items. • Upper median means "if the list has an even number of items, pick the one that's from the bottom, rather than s from the bottom" • Obvious solution: sort, then pick the middle item. • Seems like more work than is needed. • Generalize ('strengthen the recursion'): SELECT( , S): find the th smallest in a set of items. • Illustrate with sets of numbers, ordered smallest to largest ��� • MEDIAN( ) is now just SELECT( , ). �

  13. A SELECT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) • Input: a nonempty set of numbers, and an index , • Output: The th smallest of the numbers. 1. If (one item set), return that item. � 2. Divide input into � groups of five, and at most one group of remaining items. � 3. Find the (upper) medians of each of these groups. � � 4. Find the median of these medians (recursively) � 5. Partition the input around this median. Let be the number of elements on the low side. • Low side : all items less than or equal to 𝑦 . High side : items greater than 𝑦 . 6. If , find the th smallest item on the low side; otherwise find the th smallest item in the high side (recursively)

  14. Group into 5s; median of medians; partition; recur on appropriate piece Input: 1 5 2 9 8 3 7 4 11 22 27 14 6 21 31 13 12; find 14 th -smallest item. 1 3 27 13 1 3 27 13 5 7 14 12 5 7 14 12 2 4 6 1 5 2 9 8 3 7 4 11 6 12 13 5 7 13 21 2 4 6 9 11 21 9 11 21 8 22 31 22 27 14 21 31 8 22 31 12 items less than or equal to median of medians; want 14 th item. So SELECT(2, upper group), recursively.

  15. A SELECT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan) • Input: a set of numbers, and an index , • Output: The th smallest of the numbers. 1. Group into 5s; find medians of each (at a cost of for each); find median of medians 2. Partition around median of medians. Recur. • Fictitious, experimental analysis. Suppose that each "part" was no larger than ¾ of input. Then we'd have � � � and combine with previous term ( � Replace � with similar a ): � � �

  16. � " ; I then claim this looks consistent with for all For ignoring the "ceilings" for a moment, we'd then get � � � � �

  17. In practice, it's a little messier than this. • Warning: Some of the following steps look like magic. • Carefully crafted to make the algebra as simple as possible. � � �� • Recall from warmup: For , we have � � �� • Critical step: show that in recursive call, the partition piece we recur on is not too big. � • was almost too large � • We'll show it's more like 70%, but with a slight adjustment.

  18. Claim: after partitioning, each "pile" has at least numbers in it (almost) MEDIANS

  19. Claim: after partitioning, each "pile" has at least numbers in it (almost) MEDIANS, SORTED Median of medians

  20. Claim: after partitioning, each "pile" has at least numbers in it (almost) MEDIANS, SORTED Median of medians, 𝑦 Values less than 𝑦 • Values greater than 𝑦 All but two columns (first and last) have 3 elts greater than > 𝑦 � � �� • • At least 3 � − 2� ≥ �� − 6 elts in "greater than pile" "Greater than" pile is no larger than the other � � � • �� Contains at least half of the � medians • At most �� + 6 elts in " ≤ " pile (or greater-than pile)

  21. Recurrence • Let be the max number of operations involved in "Select" on any input of size . • Group into fives: • 𝑑𝑜 • Find medians of each group: • 𝑏⌈𝑜/5⌉ • median of medians: • 𝑈⌈𝑜/5⌉ • partition around median element: • 𝑐𝑜 (combine: 𝑑 � = 𝑑 + 𝑐 ) • recur on appropriate piece: �� • At most �� + 6 elts in pile �� • Operation count: ≤ 𝑈⌈ �� + 6 ⌉ �� � • Total: + + ��

  22. Algebra �� � + + �� � �� � + + � �� � �� � + + � �� �� � �� (Note: �� � + + � ) �� • Since n is at least 1, we can write �� �� + + �� �� ��� (Note: ��� �� + + ) ��

  23. Summary (replacing with ) • For �� + + ��

  24. Algebraic Cleverness �� For + + �� 1. For , compute explicitly, and pick a number with for in this range. � � 1. Let � , for instance! ���,…,��� 2. Pick . (!) � 1. Because we have �� . Also: 3. I claim that for all , . 4. For we have . (Item 2: ) 5. Still need to handle the case 6. Why 160? Because it's large enough to make the argument work!

  25. Claim: for all • Suppose it's false for some minimum value 𝑙 , but true for all smaller 𝑜 . � • Then 𝑙 > 160 (because we already showed it true for 160 and less). Hence �� > 8 (used later). � �� • 𝑈 𝑙 ≤ 𝑑𝑙 + 𝑈 � + 𝑈 �� + 6 � �� ≤ 𝑑𝑙 + 𝑡 � + 𝑡 �� + 6 • � �� ≤ 𝑑𝑙 + 𝑡( � + 1) + 𝑡( �� + 6 + 1 ) • � �� = 𝑑𝑙 + 𝑡 � + 𝑡 + 𝑡 �� + 7𝑡 • �� = 𝑑𝑙 + 𝑡 �� + 8𝑡 • � ��� ≤ �� 𝑙 + 𝑡 �� + 8𝑡 • ��� = 𝑡 �� + 8𝑡 • � = 𝑡𝑙 + 8𝑡 − �� 𝑡𝑙 • � � � = 𝑡𝑙 + 𝑡 8 − [By note above, �� > 8 , so 0 > 8 − �� ] • �� ≤ 𝑡𝑙 • Contradiction! Hence claim is true for all 𝑜 .

  26. Why piles of five ? � �� • If you try more or fewer, the sum of � and �� ends up changing to something…a bit larger than 1 instead of a bit less than 1 • So 5 is a "sweet spot" for this algorithm!

  27. Surprising simpler algorithm • RandSelect(k, S) • Pick a random item in your set, S • Partition into set of numbers less than , and set of those greater than • If has at least items: RandSelect( ) • If has k-1 items: return • Otherwise, RandSelect( , ) • Works in "expected linear time" because on average, the size of the larger partition is ¾ size of the set. Work is (roughly) � � � � � • . �� � � � � � �

  28. Big idea! (More in CS18) • Randomized algorithms are often simpler than deterministic ones • Deep philosophical question: why does adding a stream of randomness make tasks easier?

Recommend


More recommend