Roadmap for next few lectures: • Heapsort • Priority queues using heaps • Quicksort • Some probability theory • Randomised Quicksort • Analysing Quicksort 1
Heapsort The data structure heap is a linear array that stores a binary tree . Heaps don’t store arbitrary trees but only nearly complete trees : • all levels except perhaps lowest one are full • the bottom level is filled left to right 2
Suppose array A stores (or represents) a binary heap Two major attributes: • length( A ) is the number of elements of array A , meaning the array size • heap-size( A ) is the number of elements of the heap (elements stored within array A ) Only elements A [1] , . . . , A [heap-size( A )] actually store elements of the heap , hence heap-size( A ) ≤ length( A ) 3
Assignment of tree vertices to array elements: Very easy: • root is A [1] • given index i of some node, we have – Parent( i ) = ⌊ i/ 2 ⌋ – Left( i ) = 2 i – Right( i ) = 2 i + 1 4
Implementation straightforward: • i → 2 i left-shift by one of bitstring representing i • i → 2 i + 1 left-shift by one plus changing LSB to 1 • i → ⌊ i/ 2 ⌋ right-shift by one, dropping previous LSB 5
1 k 2 3 b l 4 5 6 7 a c h i g j e f d 8 9 10 11 12 k b l h i a c g f j d e 1 2 3 4 5 6 7 8 9 10 11 12 6
Two kinds: min-heaps and max-heaps • max-heap with max-heap property : for every node i (other that root) A [Parent( i )] ≥ A [ i ] meaning: value of node is at most value of parent, largest value is stored at root • min-heap with min-heap property : for every node i (other that root) A [Parent( i )] ≤ A [ i ] meaning: value of node is at least value of parent, smallest value is stored at root 7
Suppose given values 1,2,4,6,6,7,8,12,14,15,19,21 1 21 2 3 19 12 4 5 6 7 15 6 2 8 14 7 6 4 1 8 9 10 11 12 1 1 2 3 2 12 4 5 6 7 4 7 15 14 6 6 8 19 21 8 9 10 11 12 number inside vertices are values, numbers outside are array indices 8
Note: given fixed set of values, there are many possible proper min- heaps and max-heaps (except for what’s at root) We define the height of vertex as # of edges on longest simple downward path from vertex to some leaf 3 2 2 1 1 1 0 0 0 0 0 0 Height of heap is height of root. Homework: heap of n elements is based on complete binary tree, its height is thus Θ(log n ) 9
Important basic procedures (here only for max-heaps) 1. Max-Heapify : maintains max-heap property after after insertion of new element in root (runs in time O (log n )) 2. Build-Max-Heap : produces a max-heap from unsorted data (runs in time O ( n )) 3. Heap-Extract-Max : returns and deletes max. element from the heap (runs in time O (log n )) 10
Max-Heapify Maintains max-heap property Inputs are array A and index i Assumption: sub-trees rooted in Left( i ) and Right( i ) are proper max- heaps, but A [ i ] may be smaller than its children 9 12 6 10 7 5 3 Task of Max-Heapify is to let A [ i ] float down in the max-heap below it so that heap rooted in i becomes proper max-heap 11
12 10 6 9 7 5 3
Max-Heapify ( A, i ) 1: ℓ ← Left( i ) 2: r ← Right( i ) 3: if ℓ ≤ heap-size( A ) and A [ ℓ ] > A [ i ] then largest ← ℓ 4: 5: else largest ← i 6: 7: end if 8: if r ≤ heap-size( A ) and A [ r ] > A [largest] then largest ← r 9: 10: end if 11: if largest � = i then exchange A [ i ] ↔ A [largest] 12: Max-Heapify ( A, largest) 13: 14: end if 12
Idea: Lines 3–10 find largest of elements A [ i ], A [ ℓ ], and A [ r ] Lines 11–14 1. first check if there’s anything to be done at all, 2. and if yes, (a) move the bad guy one level down, (b) and make a recursive call one level deeper We know that after the exchange we have the largest of A [ i ], A [ ℓ ], and A [ r ] in position i , so among these three, everything is OK. However, further down may still be problems! 13
1 21 2 3 12 12 4 5 6 7 15 6 2 8 14 7 6 4 1 8 9 10 11 12 1 21 2 3 15 12 4 5 6 7 12 6 2 8 14 7 6 4 1 8 9 10 11 12 1 21 2 3 15 12 4 5 6 7 14 6 2 8 12 7 6 4 1 14 8 9 10 11 12
Running time of Max-Heapify on subtree of size n rooted at i is • Θ(1) for finding largest and possibly swapping • plus time to run Max-Heapify on sub-tree rooted in one of the children of i Size of i ’s sub-tree in consideration is about ( n − 1) / 2 if complete trees We allow for “nearly” complete trees, but here the size of any sub- tree is at most ⌈ 2 n/ 3 ⌉ This worst-case occurs if last level is exactly half full 15
This gives T ( n ) ≤ T (2 n/ 3) + Θ(1) Case 2 of Master Theorem gives T ( n ) = Θ(log n ) Alternative: running time of Max-Heapify on node of height h (counted from bottom!) is O ( h ). 16
Building a heap Easy using Max-Heapify Suppose given unordered array A [1] , . . . , A [ n ] with n = length( A ) Can show ( homework ) that the elements A [ ⌊ n/ 2 ⌋ + 1] , A [ ⌊ n/ 2 ⌋ + 2] , . . . , A [ n ] always represent leaves of a heap. It’s OK to run Max-Heapify “on top” of them, once for each non-leaf element (1-element heaps are always proper heaps!). Build-Max-Heap ( A ) 1: heap-size( A ) ← length( A ) 2: for i ← ⌊ length( A ) / 2 ⌋ downto 1 do Max-Heapify( A, i ) 3: 4: end for 17
4 1 3 2 9 16 10 14 8 7 Input array A 1 1 4 4 2 3 2 3 1 3 1 3 7 7 5 6 5 6 4 4 2 16 9 10 2 16 9 10 14 8 7 14 8 7 8 9 10 8 9 10 Binary tree representing A 1 1 4 4 2 3 2 3 1 3 1 10 7 7 5 6 5 6 4 4 14 16 9 10 14 16 9 3 2 8 7 2 8 7 8 9 10 8 9 10 1 1 4 16 2 3 2 3 16 10 14 10 7 7 5 6 5 6 4 4 14 7 9 3 8 7 9 3 2 8 1 2 4 1 Result 8 9 10 8 9 10
Correctness Loop invariant: At start of each iteration, each node i + 1 , i + 2 , . . . , n is root of a proper max-heap Initialisation: Initially, i = ⌊ n/ 2 ⌋ . Nodes ⌊ n/ 2 ⌋ + 1 , ⌊ n/ 2 ⌋ + 2 , . . . , n are leaves. Leaves are always roots of trivial max-heaps. Observe, children nodes of i are numbered higher Maintenance: than i . By invariant, both are roots of max-heaps. Thus, we can call Max-Heapify( A, i ) and after that i is max-heap root (Heapify works correctly!) Decrementing i reestablishes invariant for next iteration. Termination: Clear! 19
Running time: Simple bound: calls to Max-Heapify cost O (log n ), there are O ( n ) of them, thus O ( n log n ). The Truth is Θ( n ). Some simple observations: 1. time for Max-Heapify depends on height of node (clearly). 2. n -element heap has height ⌊ log n ⌋ . 3. it has at most ⌈ n/ 2 h +1 ⌉ nodes of any height h . 20
Time required by Max-Heapify on node of height h is O ( h ). Thus, running time of Build-Max-Heap is upper-bounded by ⌊ log n ⌋ ⌊ log n ⌋ n h � � � � T ( n ) = · O ( h ) = O n 2 h +1 2 h h =0 h =0 By ∞ x kx k = � (1 − x ) 2 k =0 and substituting x = 1 / 2 we obtain ∞ h 1 / 2 � 2 h = (1 − 1 / 2) 2 = 2 h =0 and thus ⌊ log n ⌋ ∞ h h = O ( n ) � � T ( n ) = O n ≤ O n 2 h 2 h h =0 h =0 21
Heap-Extract-Max Easy: take element out of the root, insert the last element into the root, and call Max-Heapify with pointer to the root. 22
Finally, Heapsort Now it’s very easy to actually write down Heapsort. Idea as follows: 1. Given unsorted array A , build heap on A , using Build-Max-Heap 2. Extract largest element (is in A [1]), and move it to the back 3. Now we have smallest,. . . ,2nd-largest element in A [1] , . . . , A [ n − 1], with perhaps the root not satisfying heap property (that’s where the element formerly in A [ n ] now is) 4. Rebuild heap on remaining elements, using Max-Heapify 5. Extract 2nd-largest element (again, in A [1]), and so on. . . 23
Heapsort ( A ) 1: Build-Max-Heap( A ) 2: for i ← length( A ) downto 2 do exchange A [1] ↔ A [ i ] 3: heap-size( A ) ← heap-size( A ) − 1 4: Max-Heapify( A, 1) 5: 6: end for Running time: O ( n log n ) (Build-Max-Heap takes O ( n ), and then O ( n ) rounds with O (log n ) each). 24
Recommend
More recommend