comparison sorting
play

Comparison Sorting Alphabetical list of people List of countries - PDF document

11/25/2016 Introduction to Sorting Stacks, queues, priority queues, and dictionaries all focused on providing one element at a time But often we know we want all the things in some order Humans can sort, but computers can


  1. 11/25/2016 Introduction to Sorting • Stacks, queues, priority queues, and dictionaries all focused on providing one element at a time • But often we know we want “all the things” in some order – Humans can sort, but computers can sort fast CSE373: Data Structures and Algorithms – Very common to need data sorted somehow Comparison Sorting • Alphabetical list of people • List of countries ordered by population • Search engine results by relevance • … Steve Tanimoto • Algorithms have different asymptotic and constant-factor trade-offs Autumn 2016 – No single “best” sort for all scenarios – Knowing one way to sort just isn’t enough This lecture material represents the work of multiple instructors at the University of Washington. Thank you to all who have contributed! Autumn 2016 CSE 373: Data Structures & Algorithms 2 More Reasons to Sort Why Study Sorting in this Class? General technique in computing: • You might never need to reimplement a sorting algorithm yourself Preprocess data to make subsequent operations faster – Standard libraries will generally implement one or more (Java implements 2) Example: Sort the data so that you can – Find the k th largest in constant time for any k • You will almost certainly use sorting algorithms – Important to understand relative merits and expected performance – Perform binary search to find elements in logarithmic time • Excellent set of algorithms for practicing analysis and comparing design Whether the performance of the preprocessing matters depends on techniques – How often the data will change (and how much it will change) – Classic part of a data structures class, so you’ll be expected to know it – How much data there is Autumn 2016 CSE 373: Data Structures & Algorithms 3 Autumn 2016 CSE 373: Data Structures & Algorithms 4 Variations on the Basic Problem The main problem, stated carefully 1. Maybe elements are in a linked list (could convert to array and For now, assume we have n comparable elements in an array and back in linear time, but some algorithms needn’t do so) we want to rearrange them to be in increasing order 2. Maybe ties need to be resolved by “original array position” Input: – Sorts that do this naturally are called stable sorts – An array A of data records – Others could tag each item with its original position and – A key value in each data record adjust comparisons accordingly (non-trivial constant factors) – A comparison function (consistent and total) 3. Maybe we must not use more than O (1) “auxiliary space” Effect: – Sorts meeting this requirement are called in-place sorts – Reorganize the elements of A such that for any i and j , 4. Maybe we can do more with elements than just compare if i < j then A[i]  A[j] – Sometimes leads to faster algorithms – (Also, A must have exactly the same data it started with) – Could also sort in reverse order, of course 5. Maybe we have too much data to fit in memory – Use an “external sorting” algorithm An algorithm doing this is a comparison sort Autumn 2016 CSE 373: Data Structures & Algorithms 5 Autumn 2016 CSE 373: Data Structures & Algorithms 6 1

  2. 11/25/2016 Sorting: The Big Picture Insertion Sort Idea: At step k , put the k th element in the correct position among • the first k elements Surprising amount of neat stuff to say about sorting: • Alternate way of saying this: Simple Fancier Comparison Specialized Handling – Sort first two elements algorithms: algorithms: lower bound: algorithms: huge data – Now insert 3 rd element in order  ( n log n ) O( n 2 ) O( n log n ) O( n ) sets – Now insert 4 th element in order – … • “Loop invariant”: when loop index is i , first i elements are sorted Insertion sort Heap sort Bucket sort External Selection sort Merge sort Radix sort sorting Shell sort Quick sort • Let’s see a visualization (http://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html) … … • Time? Best-case _____ Worst-case _____ “Average” case ____ Autumn 2016 CSE 373: Data Structures & Algorithms 7 Autumn 2016 CSE 373: Data Structures & Algorithms 8 Insertion Sort Selection sort Idea: At step k , put the k th element in the correct position among • • Idea: At step k , find the smallest element among the not-yet- the first k elements sorted elements and put it at position k • Alternate way of saying this: • Alternate way of saying this: – Sort first two elements – Now insert 3 rd element in order – Find smallest element, put it 1 st – Now insert 4 th element in order – Find next smallest element, put it 2 nd – Find next smallest element, put it 3 rd … – … • “Loop invariant”: when loop index is i , first i elements are sorted • “Loop invariant”: when loop index is i , first i elements are the i smallest elements in sorted order • Let’s see a visualization (http://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html) • Let’s see a visualization ( http://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html) • Time? • Time? Best-case _____ Worst-case _____ “Average” case ____ Best-case O(n) Worst-case O(n 2 ) “Average” case O(n 2 ) start sorted start reverse sorted (see text) Autumn 2016 CSE 373: Data Structures & Algorithms 10 Autumn 2016 CSE 373: Data Structures & Algorithms 9 Selection sort Insertion Sort vs. Selection Sort • Idea: At step k , find the smallest element among the not-yet- • Different algorithms sorted elements and put it at position k • Alternate way of saying this: • Solve the same problem – Find smallest element, put it 1 st – Find next smallest element, put it 2 nd • Have the same worst-case and average-case asymptotic – Find next smallest element, put it 3 rd … complexity – Insertion-sort has better best-case complexity; preferable • “Loop invariant”: when loop index is i , first i elements are the i when input is “mostly sorted” smallest elements in sorted order • Let’s see a visualization ( http://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html ) • Other algorithms are more efficient for large arrays that are not already almost sorted • Time? – Insertion sort may do well on small arrays Best-case O(n 2 ) Worst-case O(n 2 ) “Average” case O(n 2 ) Always T(1) = 1 and T(n) = n + T(n-1) Autumn 2016 CSE 373: Data Structures & Algorithms 11 Autumn 2016 CSE 373: Data Structures & Algorithms 12 2

  3. 11/25/2016 Aside: We Will Not Cover Bubble Sort The Big Picture • It is not, in my opinion, what a “normal person” would think of Surprising amount of juicy computer science: 2-3 lectures… It doesn’t have good asymptotic complexity: O ( n 2 ) • Simple Fancier Comparison Specialized Handling algorithms: algorithms: lower bound: algorithms: huge data • It’s not particularly efficient with respect to constant factors  ( n log n ) O( n 2 ) O( n log n ) O( n ) sets Basically, almost everything it is good at some other algorithm is at least as good at – Perhaps people teach it just because someone taught it to Insertion sort Heap sort Bucket sort External them? Selection sort Merge sort Radix sort sorting Shell sort Quick sort (avg) … … Fun, short, optional read: Bubble Sort: An Archaeological Algorithmic Analysis , Owen Astrachan, SIGCSE 2003, http://www.cs.duke.edu/~ola/bubble/bubble.pdf Autumn 2016 CSE 373: Data Structures & Algorithms 13 Autumn 2016 CSE 373: Data Structures & Algorithms 14 Heap sort In-place heap sort But this reverse sorts – how would you fix that? • Sorting with a heap is easy: – Treat the initial array as a heap (via buildHeap ) – insert each arr[i] , or better yet use buildHeap – When you delete the i th element, put it at arr[n-i] – for i in range(len(arr)): • That array location isn’t needed for the heap anymore! arr[i] = deleteMin() 4 7 5 9 8 6 10 3 2 1 • Worst-case running time: O ( n log n ) heap part sorted part • We have the array-to-sort and the heap – So this is not an in-place sort 5 7 6 9 8 10 4 3 2 1 – There’s a trick to make it in-place… arr[n-i]= deleteMin() heap part sorted part Autumn 2016 CSE 373: Data Structures & Algorithms 15 Autumn 2016 CSE 373: Data Structures & Algorithms 16 “AVL sort” “Hash sort”??? • We can also use a balanced tree to: – insert each element: total time O ( n log n ) • Don’t even think about trying to sort with a hash table! – Repeatedly deleteMin : total time O ( n log n ) • Better: in-order traversal O ( n ), but still O ( n log n ) overall • Finding min item in a hashtable is O (n), so this would be a slower, more complicated selection sort • But this cannot be made in-place and has worse constant factors than heap sort • And we’ve already seen that selection sort is pretty bad! – both are O ( n log n ) in worst, best, and average case – neither parallelizes well – heap sort is better Autumn 2016 CSE 373: Data Structures & Algorithms 17 Autumn 2016 CSE 373: Data Structures & Algorithms 18 3

Recommend


More recommend