Engineering a Sort Function Jim Royer CIS 351 February 4, 2019 Royer (CIS 351) Engineering a Sort Function February 4, 2019 1 / 16
Bentley and MacIlroy, 1993 Engineering a Sort Function JON L. BENTLEY M. DOUGLAS McILROY AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, U.S.A. SUMMARY We recount the history of a new qsort function for a C library. Our function is clearer, faster and more robust than existing sorts. It chooses partitioning elements by a new sampling scheme; it partitions by a novel solution to Dijkstra’s Dutch National Flag problem; and it swaps efficiently. Its behavior was assessed with timing and debugging testbeds, and with a program to certify performance. The design techniques apply in domains beyond sorting. From: Software Practice and Experience , Vol. 23 (1993) 1249–1265. http://www.skidmore.edu/~meckmann/2009Spring/cs206/papers/spe862jb.pdf Royer (CIS 351) Engineering a Sort Function February 4, 2019 2 / 16
Why rewrite Unix’s quicksort? In ancient days of yore ( ≈ 1991): The old quicksort ( qsort ) had be in use for ≈ 20 years and was stable and usually fast. A colleague found that qsort ran in Θ ( n 2 ) time inputs with certain structures, e.g., on pipe-organ arrays of 2 n integers: 1,2,3,4,...,n,n,...,4,3,2,1 . They found that all the then competitors of qsort could also be driven to Θ ( n 2 ) on certain reasonable inputs. “Users complain when easy inputs don’t sort quickly.” So it was time for a new systems-level quicksort. Royer (CIS 351) Engineering a Sort Function February 4, 2019 3 / 16
Bentley and MacIlroy’s version of CLRS’s Quicksort void iqsort0(int *a, int n) { int i, j; if (n <= 1) return; for (i = 1, j = 0; i < n; i++) if (a[i] < a[0]) swap(++j, i, a); swap(0, j, a); iqsort0(a, j); iqsort0(a+j+1, n-j-1); } Program 2. A toy Quicksort, unfit for general use more efficient (and more familiar) partitioning method uses On nearly sorted arrays, the above makes ≈ n 2 2 many comparisons! Royer (CIS 351) Engineering a Sort Function February 4, 2019 4 / 16
The qsort interface (for the moment) void qsort(char *a, int n, int es, int (*cmp)()); Parameters = the array’s starting location *a n = the number of elements = the size (in bytes) of each element es cmp = the comparison function For “ char* ” think “byte addresses.” Royer (CIS 351) Engineering a Sort Function February 4, 2019 5 / 16
Insertion sort using the qsort interface isort void isort(char *a, int n, int es, int (*cmp)()) { char *pi, *pj; for (pi = a + es; pi < a + n*es; pi += es) for (pj = pi; pj > a && cmp(pj-es, pj) > 0; pj -= es) swap(pj, pj-es, es); } function swap(i,j,n) , defined in Program 1, interchanges n -byte fields pointed to A simple, straightforward, and troublesome swap void swap(char *i, char *j, int n) { do { char c = *i; *i++ = *j; *j++ = c; } while (--n > 0); } Royer (CIS 351) Engineering a Sort Function February 4, 2019 6 / 16
Bentley and MacIlroy’s starting quicksort void qsort1(char *a, int n, int es, int (*cmp)()) { int j; char *pi, *pj, *pn; if (n <= 1) return; pi = a + (rand() % n) * es; swap(a, pi, es); pi = a; pj = pn = a + n * es; for (;;) { do pi += es; while (pi < pn && cmp(pi, a) < 0); do pj -= es; while (cmp(pj, a) > 0); if (pj < pi) break; swap(pi, pj, es); } swap(a, pj, es); j = (pj - a) / es; qsort1(a, j, es, cmp); qsort1(a + (j+1)*es, n-j-1, es, cmp); } Program 4. A simple qsort the cost of about forty common sorting operations. Table I shows the cost of Royer (CIS 351) Engineering a Sort Function February 4, 2019 7 / 16
qsort1 ’s invariants As the partition process is running T ≤ T ? ≥ T 0 i j n-1 around the element a[0] , which we abbreviate as T . Increment When the partitioning is done ≤ T T ≥ T 0 j i n-1 recursively on the subarrays a[0..j-1] and a[j+1..n-1] Royer (CIS 351) Engineering a Sort Function February 4, 2019 8 / 16
Cost of basic operations (on a VAX 8550) Table I On a modern CPU the actual CPU Microseconds times will be vastly smaller. Int Operations 0.20 i1 = i2 + i3 0.20 My guess that the proportions i1 = i2 - i3 are roughly the same. Pointer Operations 0.17 p1 -= es 0.16 Note that under “Swap p1 += es Control Structures Functions” both lines are about 0.32 if (p1 == p2) i1++ swapping 4-byte int s. 0.26 while (p1 < p2) i1++ Comparison Functions MIX (standard cost models): 2.37 i1 = intcomp(&i2, &i3) 3.67 overhead ≈ comparisons < swaps i1 = floatcomp(&f2, &f3) 3.90 i1 = dblcomp(&d2, &d3) 8.74 i1 = strcmp(s2, s3) qsort (what is going on here): Swap Functions overhead < swaps ∗ < comparisons 11.50 swap(p1, p2, 4) 0.84 t = *i1, *i1 = *i2, *i2 = t ∗ done right inline swaps for integer-sized objects and a function Royer (CIS 351) Engineering a Sort Function February 4, 2019 9 / 16
Getting rid of the randomness — Median of three (Systems folk do not care for randomized algorithms.) a:b C n = expected number of comparisons for a size- n input < > when pivoting around a b:c b:c random elm: C n ≈ 1.386 n log 2 n < > < > when pivoting around the abc cba a:c a:c median of 3 random elms: C n ≈ 1.188 n log 2 n < > < > Program 5 makes 8/3 acb cab bac bca comparisons (on average). static char *med3(char *a, char *b, char *c, int (*cmp)()) { return cmp(a, b) < 0 ? (cmp(b, c) < 0 ? b : cmp(a, c) < 0 ? c : a) : (cmp(b, c) > 0 ? b : cmp(a, c) > 0 ? c : a); } Program 5. Decision tree and program for median of three Royer (CIS 351) Engineering a Sort Function February 4, 2019 10 / 16
Getting rid of the randomness — Ninther ninther = a median of three medians, at most 12 comparisons pm = a + (n/2)*es; /* Small arrays, middle element */ if (n > 7) { pl = a; pn = a + (n-1)*es; if (n > 40) { /* Big arrays, pseudomedian of 9 */ s = (n/8)*es; pl = med3(pl, pl+s, pl+2*s, cmp); pm = med3(pm-s, pm, pm+s, cmp); pn = med3(pn-2*s, pn-s, pn, cmp); } pm = med3(pl, pm, pn, cmp); /* Mid-size, med of 3 */ } Results of experiments on a modified program 2: Behaved well on non-random inputs On random input arrays C n ≈ 1.362 n log 2 n − 1.41 n Royer (CIS 351) Engineering a Sort Function February 4, 2019 11 / 16
Take advantage of repeated elements void iqsort2(int *x, int n) During partitioning { int a, b, c, d, l, h, s, v; = < ? > = if (n <= 1) return; v = x[rand() % n]; a b c d a = b = 0; c = d = n-1; After partitioning for (;;) { while (b <= c && x[b] <= v) { if (x[b] == v) iswap(a++, b, x); = < > = b++; } a c b d while (c >= b && x[c] >= v) { After copying if (x[c] == v) iswap(d--, c, x); c--; < > = } if (b > c) break; iswap(b++, c--, x); } s = min(a, b-a); for(l = 0, h = b-s; s; s--) iswap(l++, h++, x); s = min(d-c, n-1-d); for(l = b, h = n-s; s; s--) iswap(l++, h++, x); iqsort2(x, b-a); iqsort2(x + n-(d-c), d-c); } Program 6. An integer qsort with split-end partitioning Quicksort with split-end partitioning (Program 7) is about twice as fast as the Royer (CIS 351) Engineering a Sort Function February 4, 2019 12 / 16
Other Tricks Used A more efficient swap macro. On small arrays ( n < 7), use insertion sort in place of quicksort. Avoid recursive calls when n = 1. Added a vecswap function to do the copying of the equal elements to the middle. Tuned the constants via testing and measurement. Not Used Replaced the recursions with stacks. A fancier insertion sort. Sentinels on the array ends. Fancier sampling schemes. Special tuning for different machines/compilers. Royer (CIS 351) Engineering a Sort Function February 4, 2019 13 / 16
Final code, part 1 void qsort(char *a, size_t n, size_t es, int (*cmp)()) { char *pa, *pb, *pc, *pd, *pl, *pm, *pn, *pv; int r, swaptype; WORD t, v; size_t s; SWAPINIT(a, es); if (n < 7) { /* Insertion sort on smallest arrays */ for (pm = a + es; pm < a + n*es; pm += es) for (pl = pm; pl > a && cmp(pl-es, pl) > 0; pl -= es) swap(pl, pl-es); return; } pm = a + (n/2)*es; /* Small arrays, middle element */ if (n > 7) { pl = a; pn = a + (n-1)*es; if (n > 40) { /* Big arrays, pseudomedian of 9 */ s = (n/8)*es; pl = med3(pl, pl+s, pl+2*s, cmp); pm = med3(pm-s, pm, pm+s, cmp); pn = med3(pn-2*s, pn-s, pn, cmp); } pm = med3(pl, pm, pn, cmp); /* Mid-size, med of 3 */ } PVINIT(pv, pm); /* pv points to partition value */ pa = pb = a; Royer (CIS 351) Engineering a Sort Function February 4, 2019 14 / 16
Final code, part 2 } PVINIT(pv, pm); /* pv points to partition value */ pa = pb = a; pc = pd = a + (n-1)*es; for (;;) { while (pb <= pc && (r = cmp(pb, pv)) <= 0) { if (r == 0) { swap(pa, pb); pa += es; } pb += es; } while (pc >= pb && (r = cmp(pc, pv)) >= 0) { if (r == 0) { swap(pc, pd); pd -= es; } pc -= es; } if (pb > pc) break; swap(pb, pc); pb += es; pc -= es; } pn = a + n*es; s = min(pa-a, pb-pa ); vecswap(a, pb-s, s); s = min(pd-pc, pn-pd-es); vecswap(pb, pn-s, s); if ((s = pb-pa) > es) qsort(a, s/es, es, cmp); if ((s = pd-pc) > es) qsort(pn-s, s/es, es, cmp); } Royer (CIS 351) Engineering a Sort Function February 4, 2019 15 / 16
Recommend
More recommend