Average-Case and Distributional Analysis of Java 7’s Dual Pivot Quicksort Markus E. Nebel based on joint work with Ralph Neininger and Sebastian Wild AofA 2013 Menorca, Spain Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 1 / 22
Sorting Algorithms in Practice Many inventions Few methods vs. by algorithms comunity successful in practice C C+ + Java 6 Quicksort +Mergesort variant as stable sort .NET Haskell Python Timsort Sorting methods listed on Wikipedia Sorting methods of standard libraries for random access data Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 2 / 22
Sorting Algorithms in Practice Many inventions Few methods vs. by algorithms comunity successful in practice C C+ + Java 6 Quicksort +Mergesort variant as stable sort .NET Haskell Python Timsort Sorting methods listed on Wikipedia Sorting methods of standard libraries for random access data Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 2 / 22
History of Quicksort in Practice 1961,62 Hoare: first publication, average case analysis 1969 Singleton: median-of-three & Insertionsort on small subarrays 1975-78 Sedgewick: detailled analysis of many optimizations 1993 Bentley, McIlroy: Engineering a Sort Function 1997 Musser: O ( n log n ) worst case by bounded recursion depth � Basic algorithm settled since 1961; latest tweaks from 1990’s. Since then: Almost identical in all programming libraries! 1961 1969 1975 ’78 1993 1997 today ’62 ’77 Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 3 / 22
History of Quicksort in Practice 1961,62 Hoare: first publication, average case analysis 1969 Singleton: median-of-three & Insertionsort on small subarrays 1975-78 Sedgewick: detailled analysis of many optimizations 1993 Bentley, McIlroy: Engineering a Sort Function 1997 Musser: O ( n log n ) worst case by bounded recursion depth � Basic algorithm settled since 1961; latest tweaks from 1990’s. Since then: Almost identical in all programming libraries! 1961 1969 1975 ’78 1993 1997 today ’62 ’77 Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 3 / 22
History of Quicksort in Practice 1961,62 Hoare: first publication, average case analysis 1969 Singleton: median-of-three & Insertionsort on small subarrays 1975-78 Sedgewick: detailled analysis of many optimizations 1993 Bentley, McIlroy: Engineering a Sort Function 1997 Musser: O ( n log n ) worst case by bounded recursion depth � Basic algorithm settled since 1961; latest tweaks from 1990’s. Since then: Almost identical in all programming libraries! Until 2009 : Java 7 switches to a new dual pivot Quicksort! Sept. 2009 Vladimir Yaroslavskiy announced algorithm on Java core library mailing list � July 2011 public release of Java 7 with Yaroslavskiy’s Quicksort. 1961 1969 1975 ’78 1993 1997 2009 today ’62 ’77 Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 3 / 22
Running Time Experiments Why switch to new, unknown algorithm? 9 Java 6 Library 10 − 6 · n ln n 8 time Normalized Java runtimes (in ms ). 7 Average and standard deviation of 1000 random permutations per size. 0 0 . 5 1 1 . 5 2 n · 10 6 Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 4 / 22
Running Time Experiments Why switch to new, unknown algorithm? Because it is faster! 9 Java 6 Library Java 7 Library 10 − 6 · n ln n 8 time Normalized Java runtimes (in ms ). 7 Average and standard deviation of 1000 random permutations per size. 0 0 . 5 1 1 . 5 2 n · 10 6 Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 4 / 22
Running Time Experiments Why switch to new, unknown algorithm? Because it is faster! 9 Java 6 Library Java 7 Library Classic Quicksort 10 − 6 · n ln n Yaroslavskiy 8 time Normalized Java runtimes (in ms ). 7 Average and standard deviation of 1000 random permutations per size. 0 0 . 5 1 1 . 5 2 n · 10 6 remains true for basic variants of algorithms: vs. ! Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 4 / 22
Dual Pivot Quicksort High Level Algorithm: Partition array arround two pivots p � q . 1 Sort 3 subarrays recursively. 2 How to do partitioning? Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 5 / 22
Dual Pivot Quicksort High Level Algorithm: Partition array arround two pivots p � q . 1 Sort 3 subarrays recursively. 2 How to do partitioning? For each element x , determine its class 1 small for x < p medium for p < x < q large for q < x by comparing x to p and/or q Arrange elements according to classes p q 2 Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 5 / 22
Dual Pivot Quicksort – Previous Work Robert Sedgewick, 1975 in-place dual pivot Quicksort implementation more comparisons and swaps than classic Quicksort Pascal Hennequin, 1991 comparisons for list-based Quicksort with r pivots r = 2 � same #comparisons as classic Quicksort in one partitioning step: 5 3 comparisons per element r > 2 � very small savings, but complicated partitioning Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 6 / 22
Dual Pivot Quicksort – Previous Work Robert Sedgewick, 1975 in-place dual pivot Quicksort implementation more comparisons and swaps than classic Quicksort Pascal Hennequin, 1991 comparisons for list-based Quicksort with r pivots r = 2 � same #comparisons as classic Quicksort in one partitioning step: 5 3 comparisons per element r > 2 � very small savings, but complicated partitioning � Using two pivots does not pay, and ... ... no theoretical explanation for impressive speedup. Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 6 / 22
Dual Pivot Quicksort – Comparisons How many comparisons to determine classes ( small , medium or large ) ? Assume, we first compare with p . � small elements need 1, others 2 comparisons on average: 1 3 of all elements are small � 1 3 · 1 + 2 3 · 2 = 5 3 comparisons per element if inputs are uniform random permutations, classes of x and y are independent � Any partitioning method needs at least 5 3 ( n − 2 ) ∼ 20 12 n comparisons on average? Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 7 / 22
Dual Pivot Quicksort – Comparisons How many comparisons to determine classes ( small , medium or large ) ? Assume, we first compare with p . � small elements need 1, others 2 comparisons on average: 1 3 of all elements are small � 1 3 · 1 + 2 3 · 2 = 5 3 comparisons per element if inputs are uniform random permutations, classes of x and y are independent � Any partitioning method needs at least 5 3 ( n − 2 ) ∼ 20 12 n comparisons on average? No! (Stay tuned . . . ) Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 7 / 22
Beating the “Lower Bound” ∼ 20 12 n comparisons only needed, if there is one comparison location (giving rise to fixed order like first compare with p ), then checks for x and y independent But: Can have several comparison locations! Here: Assume two locations C 1 and C 2 s. t. C 1 first compares with p . C 1 executed often, iff p is large . C 2 first compares with q . C 2 executed often, iff q is small . C 1 executed often � iff many small elements iff good chance that C 1 needs only one comparison ( C 2 similar) � less comparisons than 5 3 per elements on average Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 8 / 22
Yaroslavskiy’s Quicksort 2 comparison locations while k � g 5 C k handles pointer k if A [ k ] < p C k 6 C g handles pointer g Swap A [ k ] and A [ ℓ ] ; ℓ := ℓ + 1 7 C ′ else if A [ k ] � q 8 k C g while A [ g ] > q and k < g do g := g − 1 end while 9 Swap A [ k ] and A [ g ] ; g := g − 1 10 C ′ if A [ k ] < p 11 g C k first checks < p Swap A [ k ] and A [ ℓ ] ; ℓ := ℓ + 1 12 k if needed � q C ′ end if 13 end if 14 k := k + 1 C g first checks > q 15 end while 16 C ′ g if needed < p Invariant: p � ◦ � q < p g > q ℓ k ? → → ← Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 9 / 22
Analysis of Yaroslavskiy’s Algorithm In this talk: leading term asymptotics of comparisons (we have results for swaps and Java bytecodes too) distribution and correlation of costs effect of pivot sampling C n expected #comparisons to sort random permutation of { 1 , . . . , n } C n satisfies recurrence relation � 2 � � C n = c n + C p − 1 + C q − p − 1 + C n − q , n ( n − 1 ) 1 � p < q � n with c n expected #comparisons in first partitioning step recurrence solvable by standard methods linear c n ∼ a · n yields C n ∼ 6 5 a · n ln n . � � need to compute c n Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 10 / 22
Analysis of Yaroslavskiy’s Algorithm first comparison for all elements (at C k or C g ) � ∼ n comparisons second comparison for some elements at C ′ k resp. C ′ g . . . but how often are C ′ k resp. C ′ g reached? C ′ k : all non - small elements reached by pointer k . C ′ g : all non - large elements reached by pointer g . second comparison for medium elements not avoidable � ∼ 1 3 n comparisons in expectation � it remains to count: large elements reached by k and small elements reached by g . Markus E. Nebel Java 7’s Dual Pivot Quicksort 2013/18/5 11 / 22
Recommend
More recommend