CS 3343 – Fall 2011 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small y changes by Carola Wenk 10/6/11 1 CS 3343 Analysis of Algorithms
Order statistics Order statistics Select the i th smallest of n elements (the element with rank i ). • i = 1: minimum ; • i = n : maximum ; • i = ⎣ ( n +1)/2 ⎦ or ⎡ ( n +1)/2 ⎤ : median . ( ) ( ) Naive algorithm : Sort and index i th element. Worst-case running time = Θ ( n log n + 1) Θ ( l W t i ti + 1) = Θ ( n log n ), using merge sort or heapsort ( not quicksort). i t h t ( i k t) t 10/6/11 2 CS 3343 Analysis of Algorithms
Randomized divide-and- conquer algorithm l ith R AND -S ELECT ( A , p, q, i ) ( p q ) i- th smallest of A [ p . . q ] [ p q ] if p = q then return A [ p ] r ← R AND -P ARTITION ( A , p, q ) k ← r – p + 1 k ← + 1 k = rank( A [ r ]) k( A [ ]) k if i = k then return A [ r ] if i < k if i < k then return R AND -S ELECT ( A , p, r – 1 , i ) else return R AND -S ELECT ( A , r + 1 , q, i – k ) k ≤ A [ r ] ≥ A [ r ] p r q 10/6/11 3 CS 3343 Analysis of Algorithms
Example Example Select the i = 7th smallest: Select the i = 7th smallest: 6 10 13 5 8 3 2 11 i = 7 pivot Partition: P i i 2 5 3 6 8 13 10 11 k = 4 Select the 7 – 4 = 3rd smallest recursively. Select the 7 4 3rd smallest recursively. 10/6/11 4 CS 3343 Analysis of Algorithms
Intuition for analysis Intuition for analysis (All our analyses today assume that all elements ( y y are distinct.) for R AND - P ARTITION Lucky: Lucky: log 1 = n 0 = T ( n ) = T (9 n /10) + dn 1 n 10 / 9 = Θ ( n ) Θ ( n ) C ASE 3 C ASE 3 Unlucky: T ( n ) = T ( n – 1) + dn T ( n ) T ( n 1) + dn arithmetic series arithmetic series = Θ ( n 2 ) Worse than sorting! Worse than sorting! 10/6/11 5 CS 3343 Analysis of Algorithms
Analysis of expected time Analysis of expected time The analysis follows that of randomized The analysis follows that of randomized quicksort, but it’s a little different. Let T ( n ) = the random variable for the running Let T ( n ) = the random variable for the running time of R AND -S ELECT on an input of size n , assuming random numbers are independent. assuming random numbers are independent. For k = 0, 1, …, n –1, define the indicator random variable random variable 1 if P ARTITION generates a k : n – k –1 split, X k = 0 otherwise 0 otherwise. k 10/6/11 6 CS 3343 Analysis of Algorithms
Analysis (continued) Analysis (continued) To obtain an upper bound, assume that the i th element pp always falls in the larger side of the partition: T (max{0, n –1}) + dn if 0 : n –1 split, T (max{1, n –2}) + dn if 1 : n –2 split, T ( n ) = M T (max{ n –1, 0}) + dn ( { 1 0}) if if n –1 : 0 split, 1 0 li d − 1 n X ( ( ) ) ∑ ∑ = = − − − − + + (max{ (max{ , 1 1 }) }) X T T k k n n k k dn dn k . = 0 k − 1 n ( ( ) ) ∑ ∑ ≤ + 2 ( ( ) ) X T k dn k k = ⎣ ⎦ / 2 k n 10/6/11 7 CS 3343 Analysis of Algorithms
Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ⎥ ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ = ⎣ ⎦ / 2 k n Take expectations of both sides. Take expectations of both sides. 10/6/11 8 CS 3343 Analysis of Algorithms
Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] ( ( ) ) ∑ ∑ = = + + 2 2 ( ( ) ) E E X X T T k k dn dn k = ⎣ ⎦ / 2 k n Linearity of expectation. y p 10/6/11 9 CS 3343 Analysis of Algorithms
Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] ( ( ) ) ∑ ∑ = = + + 2 2 ( ( ) ) E E X X T T k k dn dn k = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] [ [ ] ] ∑ ∑ = = ⋅ ⋅ + + 2 2 ( ( ) ) E E X X E E T T k k dn dn k = ⎣ ⎦ / 2 k n Independence of X k from other random p k choices. 10/6/11 10 CS 3343 Analysis of Algorithms
Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ ⎣ ⎦ = / 2 k n − 1 n [ [ ] ] ( ( ) ) ∑ ∑ = = + + 2 2 ( ( ) ) E E X X T T k k dn dn k = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] [ [ ] ] ∑ ∑ = = ⋅ ⋅ + + 2 2 ( ( ) ) E E X X E E T T k k dn dn k = ⎣ ⎦ / 2 k n − − 2 1 2 1 n n [ [ ] ] ∑ ∑ ∑ ∑ = = + + ( ( ) ) E E T T k k dn dn n n ⎣ ⎦ ⎣ ⎦ = = / 2 / 2 k n k n Linearity of expectation; E [ X k ] = 1/ n . Linearity of expectation; E [ X k ] 1/ n . 10/6/11 11 CS 3343 Analysis of Algorithms
Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ ⎣ ⎦ = / 2 k n − 1 n [ [ ] ] ( ( ) ) ∑ ∑ = = + + 2 2 ( ( ) ) E E X X T T k k dn dn k = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] [ [ ] ] ∑ ∑ = = ⋅ ⋅ + + 2 2 ( ( ) ) E E X X E E T T k k dn dn k = ⎣ ⎦ / 2 k n − − 2 1 2 1 n n [ [ ] ] ∑ ∑ ∑ ∑ = = + + ( ( ) ) E E T T k k dn dn n n ⎣ ⎦ ⎣ ⎦ = = / 2 / 2 k n k n − 1 2 n [ [ ] ] dn ∑ ∑ = = + + ( ( ) ) E E T T k k dn n = ⎣ ⎦ / 2 k n 10/6/11 12 CS 3343 Analysis of Algorithms
Hairy recurrence Hairy recurrence (But not quite as hairy as the quicksort one.) − 1 2 n [ ] ∑ = + [ ( )] ( ) E T n E T k dn n n ⎣ ⎣ ⎦ ⎦ k = / / 2 2 k n Prove: E [ T ( n )] ≤ cn for constant c > 0. • The constant c can be chosen large enough so that E [ T ( n )] ≤ cn for the base cases. − 1 n k ∑ ∑ 3 n 2 ≤ k ( (exercise). ) Use fact: 8 8 = ⎣ ⎦ / 2 n 10/6/11 13 CS 3343 Analysis of Algorithms
Substitution method Substitution method − 1 2 n [ [ ] ] k ∑ ∑ ≤ + ( ) E T n ck dn n ⎣ ⎦ = / 2 n Substitute inductive hypothesis. 10/6/11 14 CS 3343 Analysis of Algorithms
Substitution method Substitution method − 1 2 n [ [ ] ] ∑ ∑ ≤ + ( ) E T n ck dn n ⎣ ⎦ = / 2 k n ⎛ ⎞ 2 3 c ≤ ≤ + + 2 ⎜ ⎜ ⎟ ⎟ n n dn dn ⎝ 8 ⎠ n Use fact. 10/6/11 15 CS 3343 Analysis of Algorithms
Substitution method Substitution method − 2 1 n [ [ ] ] ∑ ∑ ≤ + ( ) E T n ck dn n = ⎣ ⎦ / 2 k n ⎛ ⎞ 2 3 c ≤ ≤ + + 2 ⎜ ⎜ ⎟ ⎟ n n dn dn ⎝ 8 ⎠ n ⎛ ⎛ ⎞ ⎞ cn cn = − − ⎜ ⎜ ⎟ ⎟ cn d dn ⎝ 4 ⎠ Express as desired – residual . 10/6/11 16 CS 3343 Analysis of Algorithms
Substitution method Substitution method − 2 1 n [ [ ] ] ∑ ∑ ≤ + ( ) E T n ck dn n ⎣ ⎦ = / 2 k n ⎛ ⎞ 2 3 c ≤ ≤ + + 2 ⎜ ⎜ ⎟ ⎟ n n dn dn ⎝ 8 ⎠ n ⎛ ⎛ ⎞ ⎞ cn cn = − − ⎜ ⎜ ⎟ ⎟ cn d dn ⎝ 4 ⎠ ≤ ≤ , cn cn if c ≥ 4 d . 10/6/11 17 CS 3343 Analysis of Algorithms
Summary of randomized order-statistic selection d i i l i • Works fast: linear expected time • Works fast: linear expected time. • Excellent algorithm in practice. • But the worst case is very bad: Θ ( n 2 ) • But, the worst case is very bad: Θ ( n ). Q. Is there an algorithm that runs in linear time in the worst case? i i h ? A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. d T j [1973] I DEA : Generate a good pivot recursively. g p y 10/6/11 18 CS 3343 Analysis of Algorithms
Worst-case linear-time order statistics i i S ELECT ( i, n ) 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2 Recursively S ELECT the median x of the ⎣ n /5 ⎦ 2. Recursively S ELECT the median x of the ⎣ n /5 ⎦ group medians to be the pivot. 3. Partition around the pivot x . Let k = rank( x ). p ( ) if i = k then return x 4. Same as elseif i < k R AND R AND - then recursively S ELECT the i th then recursively S ELECT the i th smallest element in the lower part S ELECT else recursively S ELECT the ( i–k )th smallest element in the upper part 10/6/11 19 CS 3343 Analysis of Algorithms
Choosing the pivot Choosing the pivot 10/6/11 20 CS 3343 Analysis of Algorithms
Choosing the pivot Choosing the pivot 1 Divide the n elements into groups of 5 1. Divide the n elements into groups of 5. 10/6/11 21 CS 3343 Analysis of Algorithms
Recommend
More recommend