Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - PowerPoint PPT Presentation

CS 3343 – Fall 2011 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small y changes by Carola Wenk 10/6/11 1 CS 3343 Analysis of Algorithms

Order statistics Order statistics Select the i th smallest of n elements (the element with rank i ). • i = 1: minimum ; • i = n : maximum ; • i = ⎣ ( n +1)/2 ⎦ or ⎡ ( n +1)/2 ⎤ : median . ( ) ( ) Naive algorithm : Sort and index i th element. Worst-case running time = Θ ( n log n + 1) Θ ( l W t i ti + 1) = Θ ( n log n ), using merge sort or heapsort ( not quicksort). i t h t ( i k t) t 10/6/11 2 CS 3343 Analysis of Algorithms

Randomized divide-and- conquer algorithm l ith R AND -S ELECT ( A , p, q, i ) ( p q ) i- th smallest of A [ p . . q ] [ p q ] if p = q then return A [ p ] r ← R AND -P ARTITION ( A , p, q ) k ← r – p + 1 k ← + 1 k = rank( A [ r ]) k( A [ ]) k if i = k then return A [ r ] if i < k if i < k then return R AND -S ELECT ( A , p, r – 1 , i ) else return R AND -S ELECT ( A , r + 1 , q, i – k ) k ≤ A [ r ] ≥ A [ r ] p r q 10/6/11 3 CS 3343 Analysis of Algorithms

Example Example Select the i = 7th smallest: Select the i = 7th smallest: 6 10 13 5 8 3 2 11 i = 7 pivot Partition: P i i 2 5 3 6 8 13 10 11 k = 4 Select the 7 – 4 = 3rd smallest recursively. Select the 7 4 3rd smallest recursively. 10/6/11 4 CS 3343 Analysis of Algorithms

Intuition for analysis Intuition for analysis (All our analyses today assume that all elements ( y y are distinct.) for R AND - P ARTITION Lucky: Lucky: log 1 = n 0 = T ( n ) = T (9 n /10) + dn 1 n 10 / 9 = Θ ( n ) Θ ( n ) C ASE 3 C ASE 3 Unlucky: T ( n ) = T ( n – 1) + dn T ( n ) T ( n 1) + dn arithmetic series arithmetic series = Θ ( n 2 ) Worse than sorting! Worse than sorting! 10/6/11 5 CS 3343 Analysis of Algorithms

Analysis of expected time Analysis of expected time The analysis follows that of randomized The analysis follows that of randomized quicksort, but it’s a little different. Let T ( n ) = the random variable for the running Let T ( n ) = the random variable for the running time of R AND -S ELECT on an input of size n , assuming random numbers are independent. assuming random numbers are independent. For k = 0, 1, …, n –1, define the indicator random variable random variable 1 if P ARTITION generates a k : n – k –1 split, X k = 0 otherwise 0 otherwise. k 10/6/11 6 CS 3343 Analysis of Algorithms

Analysis (continued) Analysis (continued) To obtain an upper bound, assume that the i th element pp always falls in the larger side of the partition: T (max{0, n –1}) + dn if 0 : n –1 split, T (max{1, n –2}) + dn if 1 : n –2 split, T ( n ) = M T (max{ n –1, 0}) + dn ( { 1 0}) if if n –1 : 0 split, 1 0 li d − 1 n X ( ( ) ) ∑ ∑ = = − − − − + + (max{ (max{ , 1 1 }) }) X T T k k n n k k dn dn k . = 0 k − 1 n ( ( ) ) ∑ ∑ ≤ + 2 ( ( ) ) X T k dn k k = ⎣ ⎦ / 2 k n 10/6/11 7 CS 3343 Analysis of Algorithms

Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ⎥ ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ = ⎣ ⎦ / 2 k n Take expectations of both sides. Take expectations of both sides. 10/6/11 8 CS 3343 Analysis of Algorithms

Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] ( ( ) ) ∑ ∑ = = + + 2 2 ( ( ) ) E E X X T T k k dn dn k = ⎣ ⎦ / 2 k n Linearity of expectation. y p 10/6/11 9 CS 3343 Analysis of Algorithms

Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] ( ( ) ) ∑ ∑ = = + + 2 2 ( ( ) ) E E X X T T k k dn dn k = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] [ [ ] ] ∑ ∑ = = ⋅ ⋅ + + 2 2 ( ( ) ) E E X X E E T T k k dn dn k = ⎣ ⎦ / 2 k n Independence of X k from other random p k choices. 10/6/11 10 CS 3343 Analysis of Algorithms

Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ ⎣ ⎦ = / 2 k n − 1 n [ [ ] ] ( ( ) ) ∑ ∑ = = + + 2 2 ( ( ) ) E E X X T T k k dn dn k = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] [ [ ] ] ∑ ∑ = = ⋅ ⋅ + + 2 2 ( ( ) ) E E X X E E T T k k dn dn k = ⎣ ⎦ / 2 k n − − 2 1 2 1 n n [ [ ] ] ∑ ∑ ∑ ∑ = = + + ( ( ) ) E E T T k k dn dn n n ⎣ ⎦ ⎣ ⎦ = = / 2 / 2 k n k n Linearity of expectation; E [ X k ] = 1/ n . Linearity of expectation; E [ X k ] 1/ n . 10/6/11 11 CS 3343 Analysis of Algorithms

Calculating expectation Calculating expectation ⎡ ⎤ − 1 n ( ( ) ) ∑ ∑ = + + [ [ ( ( )] )] 2 2 ( ( ) ) ⎢ ⎢ ⎥ ⎥ E E T T n n E E X X T T k k dn dn k k ⎣ ⎦ ⎣ ⎦ = / 2 k n − 1 n [ [ ] ] ( ( ) ) ∑ ∑ = = + + 2 2 ( ( ) ) E E X X T T k k dn dn k = ⎣ ⎦ / 2 k n − 1 n [ [ ] ] [ [ ] ] ∑ ∑ = = ⋅ ⋅ + + 2 2 ( ( ) ) E E X X E E T T k k dn dn k = ⎣ ⎦ / 2 k n − − 2 1 2 1 n n [ [ ] ] ∑ ∑ ∑ ∑ = = + + ( ( ) ) E E T T k k dn dn n n ⎣ ⎦ ⎣ ⎦ = = / 2 / 2 k n k n − 1 2 n [ [ ] ] dn ∑ ∑ = = + + ( ( ) ) E E T T k k dn n = ⎣ ⎦ / 2 k n 10/6/11 12 CS 3343 Analysis of Algorithms

Hairy recurrence Hairy recurrence (But not quite as hairy as the quicksort one.) − 1 2 n [ ] ∑ = + [ ( )] ( ) E T n E T k dn n n ⎣ ⎣ ⎦ ⎦ k = / / 2 2 k n Prove: E [ T ( n )] ≤ cn for constant c > 0. • The constant c can be chosen large enough so that E [ T ( n )] ≤ cn for the base cases. − 1 n k ∑ ∑ 3 n 2 ≤ k ( (exercise). ) Use fact: 8 8 = ⎣ ⎦ / 2 n 10/6/11 13 CS 3343 Analysis of Algorithms

Substitution method Substitution method − 1 2 n [ [ ] ] k ∑ ∑ ≤ + ( ) E T n ck dn n ⎣ ⎦ = / 2 n Substitute inductive hypothesis. 10/6/11 14 CS 3343 Analysis of Algorithms

Substitution method Substitution method − 1 2 n [ [ ] ] ∑ ∑ ≤ + ( ) E T n ck dn n ⎣ ⎦ = / 2 k n ⎛ ⎞ 2 3 c ≤ ≤ + + 2 ⎜ ⎜ ⎟ ⎟ n n dn dn ⎝ 8 ⎠ n Use fact. 10/6/11 15 CS 3343 Analysis of Algorithms

Substitution method Substitution method − 2 1 n [ [ ] ] ∑ ∑ ≤ + ( ) E T n ck dn n = ⎣ ⎦ / 2 k n ⎛ ⎞ 2 3 c ≤ ≤ + + 2 ⎜ ⎜ ⎟ ⎟ n n dn dn ⎝ 8 ⎠ n ⎛ ⎛ ⎞ ⎞ cn cn = − − ⎜ ⎜ ⎟ ⎟ cn d dn ⎝ 4 ⎠ Express as desired – residual . 10/6/11 16 CS 3343 Analysis of Algorithms

Substitution method Substitution method − 2 1 n [ [ ] ] ∑ ∑ ≤ + ( ) E T n ck dn n ⎣ ⎦ = / 2 k n ⎛ ⎞ 2 3 c ≤ ≤ + + 2 ⎜ ⎜ ⎟ ⎟ n n dn dn ⎝ 8 ⎠ n ⎛ ⎛ ⎞ ⎞ cn cn = − − ⎜ ⎜ ⎟ ⎟ cn d dn ⎝ 4 ⎠ ≤ ≤ , cn cn if c ≥ 4 d . 10/6/11 17 CS 3343 Analysis of Algorithms

Summary of randomized order-statistic selection d i i l i • Works fast: linear expected time • Works fast: linear expected time. • Excellent algorithm in practice. • But the worst case is very bad: Θ ( n 2 ) • But, the worst case is very bad: Θ ( n ). Q. Is there an algorithm that runs in linear time in the worst case? i i h ? A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. d T j [1973] I DEA : Generate a good pivot recursively. g p y 10/6/11 18 CS 3343 Analysis of Algorithms

Worst-case linear-time order statistics i i S ELECT ( i, n ) 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2 Recursively S ELECT the median x of the ⎣ n /5 ⎦ 2. Recursively S ELECT the median x of the ⎣ n /5 ⎦ group medians to be the pivot. 3. Partition around the pivot x . Let k = rank( x ). p ( ) if i = k then return x 4. Same as elseif i < k R AND R AND - then recursively S ELECT the i th then recursively S ELECT the i th smallest element in the lower part S ELECT else recursively S ELECT the ( i–k )th smallest element in the upper part 10/6/11 19 CS 3343 Analysis of Algorithms

Choosing the pivot Choosing the pivot 10/6/11 20 CS 3343 Analysis of Algorithms

Choosing the pivot Choosing the pivot 1 Divide the n elements into groups of 5 1. Divide the n elements into groups of 5. 10/6/11 21 CS 3343 Analysis of Algorithms

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - PowerPoint PPT Presentation

CS 3343 Fall 2011 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small y changes by Carola Wenk 10/6/11 1 CS 3343 Analysis of Algorithms Order statistics Order statistics Select the i th smallest of n elements

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Outline Higher Order Statistics First, second and higher-order statistics Matthias Hennig

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

On Order Holds LOPL Pilot Statistics Results Overall Summary Tracked all on order holds placed

Automated Coding of Stream-Order or: SQL Magic in GIS By Gido Langen The sample network -

ORDER PROCESSING COVER PAGE ORDER PROCESSING FLOW Optimizing the Flow Print Sales order

ORDER PROCESSING COVER PAGE ORDER PROCESSING FLOW Optimizing the Flow Print Sales order

Good points for multivariate polynomial interpolation and approximation Marc Van Barel,

Quasi-Optimal Multiplication of Linear Differential Operators Alexandre Benoit 1 , Alin Bostan 2

Finding optimal Chudnovsky-Chudnovsky multiplication algorithms Matthieu Rambaud Telecom

Optimal Verification of Operations on Dynamic Sets Charalampos Papamanthou, UC Berkeley Roberto

Lecture 4: Order Statistics Instructor: Saravanan Thirumuruganathan CSE 5311 Saravanan

Order Statistics on Binary Trees Goal: find the k th element (in order) of a binary tree where

Introduction CSCE423/823 CSCE423/823 Given an array A of n distinct numbers, the i th order

Order Statistics Algorithm Quicksort(A, first, last) if first < last then // // Partition

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson - PowerPoint PPT Presentation

CS 3343 Fall 2011 Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small y changes by Carola Wenk 10/6/11 1 CS 3343 Analysis of Algorithms Order statistics Order statistics Select the i th smallest of n elements

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Outline Higher Order Statistics First, second and higher-order statistics Matthias Hennig

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

On Order Holds LOPL Pilot Statistics Results Overall Summary Tracked all on order holds placed

Automated Coding of Stream-Order or: SQL Magic in GIS By Gido Langen The sample network -

ORDER PROCESSING COVER PAGE ORDER PROCESSING FLOW Optimizing the Flow Print Sales order

ORDER PROCESSING COVER PAGE ORDER PROCESSING FLOW Optimizing the Flow Print Sales order

Good points for multivariate polynomial interpolation and approximation Marc Van Barel,

Quasi-Optimal Multiplication of Linear Differential Operators Alexandre Benoit 1 , Alin Bostan 2

Finding optimal Chudnovsky-Chudnovsky multiplication algorithms Matthieu Rambaud Telecom

Optimal Verification of Operations on Dynamic Sets Charalampos Papamanthou, UC Berkeley Roberto

Lecture 4: Order Statistics Instructor: Saravanan Thirumuruganathan CSE 5311 Saravanan

Order Statistics on Binary Trees Goal: find the k th element (in order) of a binary tree where

Introduction CSCE423/823 CSCE423/823 Given an array A of n distinct numbers, the i th order

Order Statistics Algorithm Quicksort(A, first, last) if first &lt; last then // // Partition

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

Order Statistics Algorithm Quicksort(A, first, last) if first < last then // // Partition