We conclude elementary data structures

  We conclude elementary data structures by discussing and implementing arrays, lists, pointers and trees. We also consider binary search. Reading from CLRS for week 7 Chapter 10, Sections 10.2, 10.3, 10.4.

Arrays are the most fundamental data structure: An array A is a static data-structure, with a fixed length n ∈ N 0 , holding n objects of the same type. Access to elements happens via A [ i ] for indices i , typically 0-based (C-based languages), that is, i ∈ { 0 , . . . , n − 1 } , or 1-based, that is, i ∈ { 1 , . . . , n } . This access, called random access , happens in constant time, and can be used for reading and writing. Due to the fixed length of arrays, one cannot really speak of "insertion" and "deletion" for arrays. Search in general is slow (one has to run through all elements in the worst case), however fast in sorted arrays, via "binary search".

Vectors: The dynamic form of an array (i.e., it can grow) can be called a vector (as for C++; or "dynamic array"): The growth of the vector happens by internally holding an array, and when the need arises, to allocate a new, bigger array, copy the old content, and delete the old array. When done "infrequently", insertions (and deletions) at the end of the vector require only amortised constant time; see the tutorial. However insertions and deletions at the beginning of the vector (or somewhere else) needs time linear in the current size of the vector, since the elements need to be shifted. A vector with additional structure, where also insertions and deletions at the beginning happens in amortised constant time, is typically called a deque (a "double-ended queue").

  Searching in sorted vectors: Searching in general vectors takes linear time (running through all elements): However, if the vector is sorted (we assume, as it is the default, ascending order), then it can be done in logarithmic time (in the length n of the vector). We present the Java-function binary search , which searches for an element x in an array A .

Instead of just returning true or false (for found or not), it is more informative to return an index i with A [ i ] = x , if found, and to return − 1 otherwise. Since it might not be so easy to (efficiently) form sub-arrays, our version of binary search allows to specify a sub-array by its indices begin and end . As it is usually best, this so-called "range" is right-open, i.e., the beginning is included, but the ending excluded . The role model for that is begin = 0 and end = n .

class BinarySearch {
public s t a t i c int b i n a r y s e a r c h ( f i n a l int [ ] A, int begin , int end , f i n a l int x ) {
i f (A == n u l l ) return − 1;
i f ( begin == end ) return − 1;
while ( true ) {
f i n a l int mid = ( begin+end ) /2;
i f (A[ mid ] == x ) return mid ;
i f ( begin+1 == end ) return − 1;
i f (A[ mid ] < x ) {
begin = mid+1;
i f ( begin == end ) return − 1;
}
else end = mid ;
}
}

Binary search with assertions:
public s t a t i c int b i n a r y s e a r c h ( f i n a l int [ ] A, int begin , int end , f i n a l int x ) {
i f (A == n u l l ) return − 1;
a s s e r t (0 < = begin < = end < = A. length ) ;
i f ( begin == end ) return − 1;
while ( true ) {
a s s e r t (0 < = begin < end < = A. length ) ;
f i n a l int mid = ( begin+end ) /2;
a s s e r t ( begin < = mid < end ) ;
i f (A[ mid ] == x ) return mid ;
i f ( begin+1 == end ) return − 1;
a s s e r t ( begin < mid ) ;
i f (A[ mid ] < x ) {
begin = mid+1;
i f ( begin == end ) return − 1;
}
else end = mid ;
}
}

public s t a t i c int b i n a r y s e a r c h ( f i n a l int [ ] A, f i n a l int x ) {
i f (A == n u l l ) return − 1;
return b i n a r y s e a r c h (A, 0 , A. length , x ) ;
}

  Analysing binary search: We have a divide-and-conquer algorithm, with the characteristic recurrence T ( n ) = T ( n / 2) + 1 . That's because we divide the array into two (nearly) equal parts, i.e., b = 2 in the standard form of the recurrence for the Master Theorem. While we only need to investigate one of the two parts (due to the sorting!), i.e., a = 1 for the Master Theorem. Finally the work done for splitting happens in constant time, and thus c = 0 for the Master Theorem.

We obtain the second case of the Master Theorem (log 2 (1) = 0), whence T ( n ) = Θ(lg n ) . Recall that this actually only implies an upper bound for the run-time of binary search — the lower bound implied by the implicit Ω holds only for the recurrence, but not necessarily for the run-time. However, it is not too hard to see that also for the algorithm, and actually for every possible search algorithm , we need at least lg( n ) comparisons.

Removing random access from vectors, gaining fast general insertion and deletion: Linked lists. Like a vector, the elements of a list are arranged in a linear order. With vectors we obtain random access — via indices, which are just natural numbers, and thus arbitrary arithmetic can be performed with them — due to the contiguous and uniform storage scheme: underlying is an array, which is stored as one contiguous block of memory cells, all of the same size. But to maintain contiguity, only deletions and insertions at the end of the vector are efficient (amortised constant-time) — if we give up contiguity, then we loose random access, but we gain efficient arbitrary deletions and insertions: (linked) lists . Lists formally implement a dictionary (search, insertion, deletion), but, different from "real" dictionaries, search is slow, while insertion and deletion is very fast, i.e., constant-time.

Pointers to next and previous elements: The basic idea here is that each elements contains a pointer to the next and the previous element of the list. So a list-object x is a triple: x.prev is a pointer to the previous element in the list; is a pointer to the next element in the list; x.key contains the key (or the data, if there is no "key"). For the first element of the list, x.prev is NIL , and for the last element, is NIL . The whole list is represented by a pointer L to the first element (as usual, NIL if the list is empty).

