CS137: Today Electronic Design Automation • Sequential Sorting • Building on Parallel Prefix • Systolic – Sort Day 12: February 6, 2006 – Priority Queue • Streaming Sort Sorting • Mesh Sort (Shear Sort) • Sorting Networks • Parallel Merge Sort 1 2 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Sequential Sort Sequential Merge Sort • What’s your favorite sequential sort? • Observe: can merge two sorted list of length N in O(N) time • Runtime? • Start with N lists of length 1 • Merge to for N/2 lists of length 2 • Merge to form N/4 lists of length 4 • …how many times? • Each merge? 3 4 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Sequential Merge Sort • Observe: can merge two sorted list of length N in O(N) time Parallel Sorting • Merge successively longer lists • log(N) merges prefix • Each takes time O(N) • Sort in: O(N log(N)) 5 6 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 1
Day 9 Rank Finding Rank-based Sort • In O(log 2 (N)) time on N processors can find • Looking for I’th ordered element the I’th element • Do a prefix-sum on high-bit only • Use separate groups of N processors to find – Know m=number of things > 01111111… the 1 st , 2 nd , 3 rd , … element in parallel • High-low search on result • Also count the number of such elements in O(log(N)) time using parallel prefix – I.e. if number > I, recurse on half with – Give each unique offset leading zero • Send each element to its correct position – If number < I, search for (I-m)’th element in • � O(log 2 (N)) sorting algorithm with O(N 2 ) half with high-bit true processors • Find I’th element in log 2 (N) time 7 8 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Rank Sort Analysis • Area N 2 • Time log 2 (N) Systolic • Work: (N log(N)) 2 � square of sequential work One Dimensional Array 9 10 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Sort as Data Arrives Linear Systolic Sort • Often receive data as a sequential stream • Often receive data as a sequential stream • Can I sort the data as it arrive? • Can I sort the data as it arrive? • Build a systolic solution? • Build a systolic solution? – Use only local interconnect – Use only local interconnect Cell traps largest value 11 12 [Basic approach from Leighton] CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 2
Linear Systolic Sort Analysis Priority Queue • Area N • Insert top • Time N • Extract Largest • Work: N 2 • With O(N) cells • O(1) Extract • Allows interleave insert/delete 13 14 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Priority Queue Idea Priority Queue Cell Largest New • Trap Largest • If (Cin=insert) value – Like Linear Sort Alocal � largest • Largest always at Bout � smallest front • If (Cin=extract) – Always immediately Alocal � Ain available Bout � Bin • On extract • Cout � Cin – Shift up Next Largest 15 16 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Streaming Sort • Can we sort streaming data with O(log(N)) hardware? Streaming Merge Sort • How do you sort efficiently in SCORE? – Pipe-and-filter System Architecture? 17 18 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 3
Build Merge Tree Streaming Sort • Merge Sort stream Observe: early merges run at lower frequency than later… After log(N) merges, output stream is sorted. 19 20 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Streaming Sort Streaming Sort Analysis • Area log(N) compare/switch – O(N) memory – [also true of sequential case] • Time O(N) • Work: O(N log(N)) – Work efficient •Buffer lengths grow by 2× each stage. •Total memory: 2 × (N/2) + 2×(N/4) + 2×(N/8) +… ≤ 2N 21 22 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Mesh Sort • Start with N items in √ N× √ N mesh Mesh Sort • Sort into specified order • Nearest-neighbor communication only 23 24 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 4
Observation 1 Shearsort • Can sort m things on linear array in • Algorithm: alternate sorting rows and columns for log(N)+1 steps O(m) time – i.e. sort rows on odd steps; columns on – Perform Parallel Bubble sort in m steps even steps – i.e. alternate odd/even swap pairings – Sort odd rows ascending, even rows descending – Can use even/odd swapping for row/column sorts • O( √ N log(N)) 25 26 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Simplifying Lemma Shearsort Works? • General form after column sort: • 0-1 Sorting Lemma: If an oblivious – 0 rows comparison-exchange algorithm sorts – Mixed (dirty) rows all input sets consisting of solely 0’s and – 1 rows 1’s, then it sorts all input sets with • Consider all row pairs: – 3 cases arbitrary values • More zeros, more ones, equal number – proof in Leighton – Row sort puts all zeros on one side, ones on other – Column sort � one of the pair ends up all • Odd/even swapping is an oblivious ones/zeros comparison-exchange – Therefore, each row/column sort cuts the number of “dirty” rows in half 27 28 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Shearsort Works? Rounding up Steps • Each sort m= √ N steps • Consider all row pairs: – 3 cases • log( √ N ) row/column sorts to remove • More zeros, more ones, equal number dirty rows – Row sort puts all zeros on one side, ones on other – Column sort � one of the pair ends up all • 2 log( √ N ) =log(N) ones/zerso • Total steps: √ N log(N) – Therefore, each row/column sort cuts the number of “dirty” rows in half 10001000 row 00000011 column 00000000 10101001 11110000 11110011 Dirty Rows after column sort 29 30 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 5
Shear Sort Analysis Mesh Sort • Can do Mesh sort in O( √ N) steps • Area N • Time √ N log(N) – Best could hope to do – Best could hope to do is √ N w/ nearest- neighbor connections in 2D world • More complicated…see Leighton – Asymptotically in any 2D world • Work: N 1.5 log(N) 31 32 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Extend to 3D Array Sorting ∝ Movement • Can sort N numbers on N 1/3 × N 1/3 × N 1/3 • If you believe array in O(N 1/3 ) steps – We only have 3 dimensions – Sort zx into zx order – Signal transport is bounded by speed of – Sort yz into zy-order light – Sort xy into yx-order (reversing order on every-other plane) • This is asymptotically tight – Two-steps of odd/even merging within – Cannot do any better. each z-line – Will take O(N 1/3 ) time just to transport an – Sort xy into yx-order item from start location to destination 33 34 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Sorting Network • Build a spatial sorting network: Sorting Networks (from Knuth) Too big, too fast? � bit serial datapath elements? 35 36 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 6
Systematic Construction: Systematic Construction: Step 1: Merge Network Sorting Network • Recursively swap large/small elements • Perform recursive merging from halves of network – log(N) merge networks – Merge in log(N) steps • Of depth log(N), log(N)-1… A0 – Depth: O(log 2 (N)) A1 – Area: O(N log 2 (N)) A2 • Can be used in pipelined fashion A3 B3 – Only using O(N) hardware exclusively per step B2 B1 B0 37 38 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Parallel Merge Sort • With O(N) processors • Sort in O(log 2 (N)) steps Parallel Merge Sort • Sequentially executing the O(log 2 (N)) pairwise swaps of the sorting network • Randomized algorithm – Works in O(log(N)) steps • With high probability • …see Leighton 39 40 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Admin • Wednesday, Friday: NC • Project: two things due in two weeks – Sequential baseline – Proposed plan of attack 41 CALTECH CS137 Winter2006 -- DeHon 7
Recommend
More recommend