algorithm engineering
play

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 - PowerPoint PPT Presentation

Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 Lecture cture 6 Yan n Gu I/O Algorithms and Parallel Samplesort The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 2


  1. Algorithm Engineering (aka. How to Write Fast Code) CS26 S260 – Lecture cture 6 Yan n Gu I/O Algorithms and Parallel Samplesort

  2. The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 2

  3. 3

  4. Last week - The I/O model • The I/O O model el has two speci cial al memor mory y transfer sfer instructions: ructions: • Read transfe nsfer: load a block from slow memory • Write te transf sfer: write a block to slow memory • The co comp mplexi lexity ty of an algor orithm ithm on the I/O O model del (I/O O co complexi plexity) ty) is measur sured ed by: y: #( #(rea ead tran ansfe sfers) rs) + #( #(write e transfe ansfers) rs) Slow Memory Fast Memory 1 0 CPU 𝑁/𝐶 1 𝐶

  5. Cache-Oblivious Algorithms • Alg lgorit ithms hms not paramete meteriz ized ed by 𝐶 or 𝑁 • These algorithms are unaware of the parameters of the memory hierarchy • Analy lyze ze in in the id ideal l cache model el — same e as the I/O m model l except pt optim imal al repla laceme ement nt is is assum sumed ed Fast Memory Slow Memory 1 0 CPU 𝑁/𝐶 1 𝐶

  6. The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 6

  7. Why Sampling? • Yan has an array {𝒃 𝟏 , 𝒃 𝟐 , … , 𝒃 𝒐−𝟐 } such that 𝒃 𝒋 = 𝟏 or 𝟐 , and Yan wants to know how many 𝟏 (s) in the array • Scan, linear work, can be parallelized • Sounds like a good idea?

  8. Why Sampling? • Yan has an array {𝒃 𝟏 , 𝒃 𝟐 , … , 𝒃 𝒐−𝟐 } and a function 𝒈(⋅) such that 𝒈(𝒃 𝒋 ) = 𝟏 or 𝟐 , and Yan wants to know how many 𝒈(𝒃 𝒋 ) = 𝟏

  9. Why Sampling? • Yan has an array {𝒃 𝟏 , 𝒃 𝟐 , … , 𝒃 𝒐−𝟐 } and 𝒐 function 𝒈 𝟐 ⋅ , … , 𝒈 𝒐 (⋅) such that 𝒈 𝒌 (𝒃 𝒋 ) = 𝟏 or 𝟐 , and Yan wants to know how many 𝒈 𝒌 (𝒃 𝒋 ) = 𝟏 • Takes quadratic work, does not work for reasonable input size • Examples: • Find the median 𝑛 of 𝑏 𝑗 , 𝑔 𝑛 𝑏 𝑗 = "𝑏 𝑗 < 𝑛" , check if #(𝑔 𝑏 𝑘 𝑏 𝑗 = 0) is 𝑜/2 𝑜 3𝑜 • Find a good pivot 𝑞 in quicksort (e.g., 4 ≤ #(𝑔 𝑞 𝑏 𝑗 = 0) ≤ 4 ) • Guarantee all sorts of properties in graph, geometry and other algorithms

  10. Approximate Solution: Sampling • Yan has an array {𝒃 𝟏 , 𝒃 𝟐 , … , 𝒃 𝒐−𝟐 } and 𝒐 function 𝒈 ⋅ such that 𝒈(𝒃 𝒋 ) = 𝟏 or 𝟐 , and Yan wants to know how many 𝒈(𝒃 𝒋 ) = 𝟏 • Uniformly randomly pick 𝒍 elements, compute the 𝒈 𝒃 𝒋 = 𝟏 𝒐⋅𝒍 𝟏 case (denoted as 𝒍 𝟏 ), and estimate by 𝒍 • As long as 𝑙 is sufficiently large, we are “confident” with our estimation • On the other hand, when 𝑙 is small, the result can be random • When is the estimation good? • What is “good”?

  11. Approximate Solution: Sampling • What is “good”? • With high probability (informal): happens with probability 1 − 𝑜 −𝑑 for any constant 𝑑 > 0 • This is large when 𝑜 is reasonably large, like > 10 6 • When is the estimation good? • Claim: when 𝑙 0 is Ω log 𝑜 • How can reality off from the estimate?

  12. Approximate Solution: Sampling • When is the estimation good? • Claim: when 𝑙 0 is Ω log 𝑜 • How can reality off from the estimate? • Assume there are 𝑨 elements with 𝒈(𝒃 𝒋 ) = 𝟏 , and we have 𝑙 samples with 𝑙 0 hits. The expected #hits E 𝑙 0 = 𝑙𝑨/𝑜 . • The probability that this is off by 100% (i.e., 𝑙 0 > 2𝑙𝑨/𝑜 ) is 𝑓 − 𝑙𝑨 3𝑜 Chernoff bound: for 𝑜 independent random variables in {0, 1} , let 𝑌 be the sum, and 𝜈 = E 𝑌 , then for any 0 ≤ 𝜀 ≤ 1 , Pr 𝑌 ≥ 1 + 𝜀 𝜈 ≤ 𝑓 −𝜀 2 𝜈 3

  13. Approximate Solution: Sampling • When is the estimation good? • Claim: when 𝑙 0 is Ω log 𝑜 • How can reality off from the estimate? • Assume there are 𝑨 elements with 𝒈(𝒃 𝒋 ) = 𝟏 , and we have 𝑙 samples with 𝑙 0 hits. The expected #hits E 𝑙 0 = 𝑙𝑨/𝑜 . • The probability that this is off by 100% (i.e., 𝑙 0 > 2𝑙𝑨/𝑜 ) is 𝑓 − 𝑙𝑨 3𝑜 • Since 𝑙 0 ≈ 𝑙𝑨/𝑜 , 𝑓 − 𝑙𝑨 3𝑜 is 𝑜 −𝑑 when 𝑙 0 = Ω log 𝑜 , because 𝑓 − 𝑙𝑨 3𝑜 ≈ 𝑓 − 𝑙0 3 < 𝑓 −𝑑 ′ log 2 𝑜 = 𝑜 −𝑑

  14. Approximate Solution: Sampling • When is the estimation good? • Claim: when 𝑙 0 is Ω log 𝑜 • How can reality off from the estimate? • Assume there are 𝑨 elements with 𝒈(𝒃 𝒋 ) = 𝟏 , and we have 𝑙 samples with 𝑙 0 hits. The expected #hits E 𝑙 0 = 𝑙𝑨/𝑜 . • The probability that this is off by 1% (i.e., 𝑙 0 > 1.01𝑙𝑨/𝑜 ) is 𝑓 − 𝜀2𝑙𝑨 3𝑜 • Since 𝑙 0 ≈ 𝑙𝑨/𝑜 , 𝑓 − 𝜀2𝑙𝑨 3𝑜 is 𝑜 −𝑑 when 𝑙 0 = Ω log 𝑜 , because 𝑓 − 𝜀2𝑙𝑨 𝑙0 3⋅1002 < 𝑓 −𝑑 ′ log 2 𝑜 = 𝑜 −𝑑 3𝑜 ≈ 𝑓 − Chernoff bound: for 𝑜 independent random variables in {0, 1} , let 𝑌 be the sum, and 𝜈 = E 𝑌 , then for any 0 < 𝜀 < 1 , Pr 𝑌 ≥ 1 + 𝜀 𝜈 ≤ 𝑓 −𝜀 2 𝜈 3

  15. Rule of Thumbs for Sampling • Example Applications: • Find the median 𝑛 of 𝑏 𝑗 , 𝑔 𝑏 𝑗 = "𝑏 𝑗 < 𝑛" , check if #(𝑔 𝑏 𝑘 𝑏 𝑗 = 0) is 𝑜/2 𝑜 3𝑜 • Find a good pivot 𝑞 in quicksort (e.g., 4 ≤ #(𝑔 𝑞 𝑏 𝑗 = 0) ≤ 4 ) • Guarantee all sorts of properties in graph, geometry and other algorithms • Take some samples! Uniformly randomly pick 𝒍 elements, 𝒐⋅𝒍 𝟏 compute the 𝒈 𝒃 𝒋 = 𝟏 case (denoted as 𝒍 𝟏 ), and estimate by 𝒍 • 4 sample hits gives you reasonable result • 20 sample hits gives you confident • 100 sample hits is sufficient! • Remember: only hits count

  16. The I/O Model CS260: Algorithm Sampling in Algorithm Design Engineering Lecture 6 Parallel Samplesort 16

  17. Parallel and I/O-efficient Sorting Algorithms • Cla lassi sic c sortin ing g alg lgorit ithm hms s are easy y to b be p parallel lleliz ized ed • Quicksort: find a “good” pivot, apply partition (filter) to find elements that are smaller and that are larger, and recurse • Mergesort: apply parallel merge for log 2 𝑜 rounds • But not I/O efficient since we need log 2 𝑜 rounds of global data movement • We now introduce samplesort, which is both highly in parallel and I/O efficient

  18. Sample-sort outline Analo logou gous s to mult ltiw iway ay quic ickso ksort 1. 1. Sp Spli lit in input ut array in into 𝑂 contiguo iguous us suba barra rrays ys of siz ize 𝑂 . So Sort subar arrays rays recursi sivel vely … 𝑂 , sorted 𝑂

  19. Sample-sort outline Analo logou gous s to mult ltiw iway ay quic ickso ksort 𝑂 , sorted 1. 1. Sp Spli lit in input ut array in into 𝑂 contiguo iguous us suba barra rrays ys of siz ize 𝑂 . So Sort subar arrays rays recursi sivel vely y (sequ equent entia ially lly) …

  20. Sample-sort outline 2. 2. Choo oose se 𝑂 − 1 “good” pivots 𝑂 , sorted 𝑞 1 ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 3. 3. Dis istribu ribute te su subar barrays rays in into o buckets ckets , , ac accordin ording g to … pivot vots Size ≈ 𝑂 ≤ 𝑞 1 ≤ ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 ≤ Bucket 1 Bucket 2 Bucket 𝑂

  21. Sample-sort outline 4. Recurs 4. cursively ively sort rt the buckets ckets ≤ 𝑞 1 ≤ ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 ≤ Bucket 1 Bucket 2 Bucket 𝑂 5. 5. Copy py conca oncatenated tenated buckets ckets bac ack k to input put ar arra ray sorted

  22. Choosing good pivots based on sampling 2. 2. Cho hoose ose 𝑂 − 1 “good” pivots 𝑞 1 ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 Can an be ac achieved ieved by y ra randoml domly y pic ick k 𝑑 𝑂 log 𝑂 ra rando dom m sam amples les, , sort rt them m an and pick ck the eve very ry 𝑑 log 𝑂 -th th element ment This is step p is fa fast

  23. Sequential local sorts (e.g., call stl::sort) 1. 1. Sp Spli lit in input ut array in into 𝑂 contiguo iguous us subar array ays of siz ize 𝑂 . So Sort rt suba barray rrays s re recu cursi rsivel vely y (sequen quentia ially) lly) … 𝑂 , sorted 4. Recur ursi sively vely sort the buckets ets (sequ quenti ential al) ≤ 𝑞 1 ≤ ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 ≤ Bucket 1 Bucket 2 Bucket 𝑂

  24. Key Part: the Distribution Phase 3. . Dis istribute ribute su subarr arrays ays in into to 𝑂 , sorted buck uckets ets , , ac according cording to pivot vots … Size ≈ 𝑂 ≤ 𝑞 1 ≤ ≤ 𝑞 2 ≤ ⋯ ≤ 𝑞 𝑂−1 ≤ Bucket 1 Bucket 2 Bucket 𝑂

Recommend


More recommend