order statistics
play

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - PowerPoint PPT Presentation

CMPS 6610/4610 Fall 2016 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk CMPS 6610/4610 Algorithms 1 Order statistics Select the i th smallest of n elements (the element with rank i ). i


  1. CMPS 6610/4610 – Fall 2016 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk CMPS 6610/4610 Algorithms 1

  2. Order statistics Select the i th smallest of n elements (the element with rank i ). • i = 1: minimum ; • i = n : maximum ; • i =  ( n +1)/2  or  ( n +1)/2  : median . Naive algorithm : Sort and index i th element. Worst-case running time =  ( n log n + 1) =  ( n log n ), using merge sort ( not quicksort). CMPS 6610/4610 Algorithms 2

  3. Randomized divide-and- conquer algorithm R AND -S ELECT ( A , p, q, i ) i- th smallest of A [ p . . q ] if p = q then return A [ p ] r  R AND -P ARTITION ( A , p, q ) k  r – p + 1 k = rank( A [ r ]) if i = k then return A [ r ] if i < k then return R AND -S ELECT ( A , p, r – 1 , i ) else return R AND -S ELECT ( A , r + 1 , q, i – k ) k  A [ r ]  A [ r ] p r q CMPS 6610/4610 Algorithms 3

  4. Example Select the i = 7th smallest: 6 10 13 5 8 3 2 11 i = 7 pivot Partition: 2 5 3 6 8 13 10 11 k = 4 Select the 7 – 4 = 3rd smallest recursively. CMPS 6610/4610 Algorithms 4

  5. Intuition for analysis (All our analyses today assume that all elements are distinct.) for R AND - P ARTITION Lucky:  n  log 1 0 n 1 T ( n ) = T (3 n /4) + dn 4 / 3 =  ( n ) C ASE 3 Unlucky: T ( n ) = T ( n – 1) + dn arithmetic series =  ( n 2 ) Worse than sorting! CMPS 6610/4610 Algorithms 5

  6. Analysis of expected time • Call a pivot good if its rank lies in [ n /4,3 n /4]. • How many good pivots are there? n /2  A random pivot has 50% chance of being good. • Let T ( n , s ) be the runtime random variable time to reduce array size to  3/4 n T ( n , s )  T (3 n /4, s ) + X(s)  dn #times it takes to Runtime of partition find a good pivot CMPS 6610/4610 Algorithms 6

  7. Analysis of expected time Lemma: A fair coin needs to be tossed an expected number of 2 times until the first “heads” is seen. Proof: Let E ( X ) be the expected number of tosses until the first “heads”is seen. • Need at least one toss, if it’s “heads” we are done. • If it’s “tails” we need to repeat (probability ½).  E ( X ) = 1 + ½ E ( X )  E ( X ) = 2 CMPS 6610/4610 Algorithms 7

  8. Analysis of expected time time to reduce array size to  3/4 n T ( n , s )  T (3 n /4, s ) + X(s)  dn #times it takes to Runtime of partition find a good pivot  E ( T ( n , s ))  E ( T (3 n /4, s )) + E (X(s)  dn ) Linearity of  E ( T ( n , s ))  E ( T (3 n /4, s )) + E (X(s))  dn expectation  E ( T ( n , s ))  E ( T (3 n /4, s )) + 2  dn Lemma  T exp (n)  T exp (3 n /4) +  ( n)  T exp (n)   ( n) CMPS 6610/4610 Algorithms 8

  9. Summary of randomized order-statistic selection • Works fast: linear expected time. • Excellent algorithm in practice. • But, the worst case is very bad:  ( n 2 ). Q. Is there an algorithm that runs in linear time in the worst case? A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. I DEA : Generate a good pivot recursively. This algorithm has large constants though and therefore is not efficient in practice. CMPS 6610/4610 Algorithms 9

  10. Worst-case linear-time order statistics S ELECT ( i, n ) 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively S ELECT the median x of the  n /5  group medians to be the pivot. 3. Partition around the pivot x . Let k = rank( x ). 4. if i = k then return x Same as elseif i < k R AND - then recursively S ELECT the i th smallest element in the lower part S ELECT else recursively S ELECT the ( i–k )th smallest element in the upper part CMPS 6610/4610 Algorithms 10

  11. Choosing the pivot CMPS 6610/4610 Algorithms 11

  12. Choosing the pivot 1. Divide the n elements into groups of 5. CMPS 6610/4610 Algorithms 12

  13. Choosing the pivot lesser 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. greater CMPS 6610/4610 Algorithms 13

  14. Choosing the pivot x lesser 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively S ELECT the median x of the  n /5  group medians to be the pivot. greater CMPS 6610/4610 Algorithms 14

  15. Developing the recurrence S ELECT ( i, n ) T ( n ) 1. Divide the n elements into groups of 5. Find  ( n ) the median of each 5-element group by rote. 2. Recursively S ELECT the median x of the  n /5  T ( n /5) group medians to be the pivot.  ( n ) 3. Partition around the pivot x . Let k = rank( x ). if i = k then return x 4. elseif i < k then recursively S ELECT the i th ? T ( ) smallest element in the lower part else recursively S ELECT the ( i–k )th smallest element in the upper part CMPS 6610/4610 Algorithms 15

  16. Analysis (Assume all elements are distinct.) x At least half the group medians are  x , which lesser is at least   n /5  /2  =  n /10  group medians. greater CMPS 6610/4610 Algorithms 16

  17. Analysis (Assume all elements are distinct.) x At least half the group medians are  x , which lesser is at least   n /5  /2  =  n /10  group medians. • Therefore, at least 3  n /10  elements are  x . greater CMPS 6610/4610 Algorithms 17

  18. Analysis (Assume all elements are distinct.) x At least half the group medians are  x , which lesser is at least   n /5  /2  =  n /10  group medians. • Therefore, at least 3  n /10  elements are  x . • Similarly, at least 3  n /10  elements are  x . greater CMPS 6610/4610 Algorithms 18

  19. Analysis (Assume all elements are distinct.) Need “at most” for worst-case runtime • At least 3  n /10  elements are  x  at most n -3  n /10  elements are  x • At least 3  n /10  elements are  x  at most n -3  n /10  elements are  x • The recursive call to S ELECT in Step 4 is executed recursively on n -3  n /10  elements. CMPS 6610/4610 Algorithms 19

  20. Analysis (Assume all elements are distinct.) • Use fact that  a / b   ( a -( b -1))/ b (page 51) • n -3  n /10  n -3·( n -9)/10 = (10 n -3 n +27)/10  7 n/ 10 + 3 • The recursive call to S ELECT in Step 4 is executed recursively on at most 7 n/ 10+3 elements. CMPS 6610/4610 Algorithms 20

  21. Developing the recurrence S ELECT ( i, n ) T ( n ) 1. Divide the n elements into groups of 5. Find  ( n ) the median of each 5-element group by rote. 2. Recursively S ELECT the median x of the  n /5  T ( n /5) group medians to be the pivot.  ( n ) 3. Partition around the pivot x . Let k = rank( x ). if i = k then return x 4. elseif i < k then recursively S ELECT the i th T (7 n /10 smallest element in the lower part +3) else recursively S ELECT the ( i–k )th smallest element in the upper part CMPS 6610/4610 Algorithms 21

  22. Solving the recurrence for  ( n )     1 7         T ( n ) T n T n 3 dn     5 10 1 7       Big-Oh Induction: T ( n ) c ( n 3 ) c ( n 3 3 ) dn 5 10 T ( n )  c ( n - 3) 9    cn 3 c dn 10 Technical trick. This 1 shows that T ( n )  O( n )     c ( n 3 ) cn dn 10   , c ( n 3 ) if c is chosen large enough, e.g., c= 10 d CMPS 6610/4610 Algorithms 22

  23. Conclusions • Since the work at each level of recursion is basically a constant fraction (9/10) smaller, the work per level is a geometric series dominated by the linear work at the root. • In practice, this algorithm runs slowly, because the constant in front of n is large. • The randomized algorithm is far more practical. Exercise: Try to divide into groups of 3 or 7. CMPS 6610/4610 Algorithms 23

Recommend


More recommend