Randomized Sampling Anil Maheshwari Input Sampling algorithm Randomized Sampling Problems Sorting in Parallel Selection Anil Maheshwari anil@scs.carleton.ca School of Computer Science Carleton University
Outline Randomized Sampling Anil Maheshwari Input Sampling algorithm Input 1 Problems Sorting in Parallel Selection Sampling algorithm 2 Problems 3 Sorting in Parallel 4 Selection 5
Problem Randomized Sampling Anil Maheshwari Input: A set S of n distinct numbers. Input Let elements of S be x 1 < x 2 < · · · < x n Sampling algorithm Let R = { y 1 < y 2 < . . . < y | R | } ⊆ S . Problems R -Partition of S \ R into | R | + 1 (open) subsets is: Sorting in Parallel Selection S 0 = { x ∈ S : x < y 1 } , S i = { x ∈ S : y i < x < y i +1 } , i = 1 , 2 , . . . , | R | − 1 , and S | R | = { x ∈ S : x > y | R | } Problem: Find a G �� D partition.
r-Sampling Algorithm Randomized Sampling Anil Maheshwari Fix an integer r with 1 < r < n Input Sampling Algorithm R ANDOM S AMPLE ( S, r ) algorithm p = r/n ; R = ∅ ; Problems for each x ∈ S Sorting in Parallel do with probability p , add x to R Selection endfor ; Sort the elements of R ; Compute the open intervals S 0 , S 1 , . . . , S | R | ; Return( R, S 0 , S 1 , . . . , S | R | ) r -sampling is G �� D if 1 ≤ | R | ≤ 2 r , and 1 for each i with 0 ≤ i ≤ | R | , the open interval S i 2 contains at most 2 n ln r elements of S . r
Good Sample Randomized Sampling Anil Maheshwari A good sample R is Input non-empty 1 Sampling algorithm At most twice as large as the sample size we are 2 Problems aiming for, and Sorting in Parallel Elements of S \ R are approximately evenly Selection 3 distributed over the open intervals (except for the ln r factor).
Problem 1 Randomized Sampling Anil Maheshwari Compute the expected size E ( | R | ) of the set R . Input Algorithm R ANDOM S AMPLE ( S, r ) Sampling p = r/n ; R = ∅ ; algorithm for each x ∈ S do with probability p , add x to R Problems endfor ; Sorting in Parallel Sort the elements of R ; Compute the open intervals S 0 , S 1 , . . . , S | R | ; Selection Return( R, S 0 , S 1 , . . . , S | R | )
Problem 2 Randomized Sampling Anil Maheshwari Prove that Pr ( R = ∅ ) ≤ e − r . Input ( Hint: Recall that 1 − z ≤ e − z for all real numbers z .) Sampling algorithm Algorithm R ANDOM S AMPLE ( S, r ) p = r/n ; R = ∅ ; Problems for each x ∈ S Sorting in Parallel do with probability p , add x to R endfor ; Selection Sort the elements of R ; Compute the open intervals S 0 , S 1 , . . . , S | R | ; Return( R, S 0 , S 1 , . . . , S | R | )
Problem 3 Randomized Sampling Anil Maheshwari Use Chernoff bound to show that Input Pr ( | R | > 2 r ) ≤ e − r/ 3 . Sampling algorithm Let X 1 , · · · , X n , be 0-1 i.i.d’s random variables. Let X = � n i =1 X i . Chernoff bounds estimate the Problems probability of X deviating from (1 ± ǫ ) E [ X ] , for 1 ≥ ǫ ≥ 0 . Sorting in Parallel P ( X ≥ (1 + ǫ ) E [ X ]) ≤ exp( − ǫ 2 E [ X ] / 3) Selection P ( X ≤ (1 − ǫ ) E [ X ]) ≤ exp( − ǫ 2 E [ X ] / 2) .
Problem 4 Randomized Sampling Anil Maheshwari Consider the sorted sequence x 1 < x 2 < . . . < x n of Input elements of S . Let integer k divides n . Partition S into n/k Sampling algorithm subsets B 1 , B 2 , . . . , B n/k , each containing k elements: Problems B 1 contains x 1 , . . . , x k ; Sorting in Parallel B 2 contains x k +1 , . . . , x 2 k , etc. Selection Think of B i ’s as buckets . Bucket B i is empty if B i ∩ R = ∅ . Argue that the following is true: If each bucket is non-empty, then each open interval contains at most 2 k elements of S .
Problem 5 Randomized Sampling Anil Maheshwari Prove the following: Input Pr ( each bucket is non-empty ) ≥ 1 − n Sampling k (1 − p ) k algorithm Problems Sorting in Parallel Selection
Problem 6 Randomized Sampling Anil Maheshwari Show that Input Pr ( each open interval contains at most 2 k elements of S ) Sampling ≥ 1 − n k (1 − p ) k algorithm Problems Sorting in Parallel Selection
Problem 7 Randomized Sampling Anil Maheshwari Recall that p = r/n . Let k = n ln r . Prove that r Input Pr ( at least one open interval contains more than 2 n ln r Sampling r 1 algorithm elements of S ) ≤ ln r Problems Sorting in Parallel Selection
Problem 8 Randomized Sampling Anil Maheshwari Show that Pr ( the sample R is bad ) ≤ e − r + e − r/ 3 + 1 ln r Input Sampling r -sampling is G �� D if algorithm 1 1 ≤ | R | ≤ 2 r , and Problems for each i with 0 ≤ i ≤ | R | , the open interval S i contains at most 2 n ln r 2 elements of S . r Sorting in Parallel Selection
Problem 9 Randomized Sampling Anil Maheshwari Show that if r is chosen sufficiently large than Input Pr ( the sample R is good ) ≥ 1 Sampling 2 algorithm r -sampling is G �� D if Problems 1 1 ≤ | R | ≤ 2 r , and Sorting in Parallel for each i with 0 ≤ i ≤ | R | , the open interval S i contains at most 2 n ln r 2 elements of S . r Selection
Application I(Optional) Randomized Sampling Anil Maheshwari Leslie G. Valiant Input A Bridging Model for Parallel Computation Sampling algorithm Communications of ACM 33(8): 103-111 (1990) Problems 3-attribute Bulk-Synchronous Parallel Computer : Sorting in Parallel Components: Processor/Memory Selection 1 Router: Point-to-Point messages between 2 components Synchronization mechanism for all components after 3 every "L" units of time
BSP Computation Randomized Sampling Anil Maheshwari Computation in terms of Supersteps . Input In a superset, each component Sampling algorithm - first receives messages, Problems - performs local computation, and Sorting in Parallel - prepares messages for transmission for the next Selection superstep. Routers realize h -relations: Each component sends & receives at most h -messages.
BSP Sort (Sketch) Randomized Sampling Anil Maheshwari Assume ρ processors. Processor P i consists of n/ρ data Input items, denoted by S i , at the start. Sampling algorithm Each processor chooses its elements with uniform 1 Problems probability r/n . (Assume selected elements at Sorting in Parallel processor i be R i .) Selection Route all selected elements to all the processors. 2 Let R = R 1 ∪ R 2 , ∪ . . . ∪ R ρ . Each processors sorts 3 items in R and partitions its set S i . . . . 4 Each processor P i , routes the partitions 5 appropriately. Each processor sorts its items 6
BSP Sort Analysis Randomized Sampling Anil Maheshwari Think of how random sampling is used. Input What is r in terms of ρ, n ? Sampling algorithm How quantities ρ , k , and r are connected? Problems Left as an exercise (need some BSP background). Sorting in Parallel Selection Correctness: Why does it sort? Number of Supersteps: O (1) Work done in each superset: If n >> ρ , O ( n ρ log ρ ) . h -relation: If n >> ρ , h = O ( n ρ log ρ ) suffices.
Application II (Optional) Randomized Sampling Anil Maheshwari Robert Floyd and Ronald Rivest Input Expected Time Bounds for Selection Sampling algorithm Communications of ACM 18(3):165–172 Problems 1975 Sorting in Parallel Rajeev Raman Selection Random Sampling Techniques in Parallel Computation 1998 IPPS/SPDP’98 Workshop Lecture Notes in Computer Science 1388:351–360
General Framework Randomized Sampling Anil Maheshwari Two main paradigms: Input Partitioning: Partition in independent subproblems that Sampling algorithm can be solved in parallel. (e.g. BSP Sort) Problems Pruning: Use an inefficient algorithm on a small random Sorting in Parallel sample of the input, and use that solution to reduce the Selection size of the problem. How? Obtain a random sample R of instance I of the 1 problem. Use an inefficient algorithm to solve R . 2 Use the solution of R to discard irrelevant parts of I , 3 and obtain the reduced problem I ′ . Use inefficient algorithm to solve I ′ . 4
Selection Problem Randomized Sampling Anil Maheshwari Input: n distinct values and an integer k , 1 ≤ k ≤ n . Input Output: median element of I Sampling algorithm Standard Methods: Sort and report the n/ 2 -th element Problems O ( n log n ) , or use the selection algorithm O ( n ) . Sorting in Parallel Selection Algorithm S ELECTION - BY -P RUNING ( I ) 1. Choose n 3 / 4 elements of I uniformly at random to form the random sample R 2. Sort R 3. Let l ( r ) be elements of rank | R | / 2 − √ n (resp. | R | / 2 + √ n ) in R 4. Let I ′ ⊆ I be all elements of I between l and r 5. Let n l be the number of elements of I that are < l 6. Sort I ′ 2 − n l of I ′ as the median element 7. Return the element of rank n
Remarks Randomized Sampling Anil Maheshwari Sorting R takes time O ( n 3 / 4 log n ) = O ( n ) , if 1 Input n 1 / 4 ≥ log n . Sampling algorithm Exercise: Between two consecutive elements of R , 2 Problems on average there are approx. n 1 / 4 elements of I . Sorting in Parallel We have 2 √ n samples between l and r . Selection 3 Expected number of elements of I between l and r 4 are 2 n 1 / 2 n 1 / 4 . Thus, E [ | I ′ | ] = O ( n 3 / 4 ) 5 This ensures run-time of O ( n ) . 6 For correctness: Estimate what is the probability that 7 median element is not between l and r . Using Chernoff Bounds it can be shown that it is exponentially low.
Recommend
More recommend