Dictionaries 1/19/2005 11:37 PM Hash Functions and Hash Tables (§2.5.2) A hash function h maps keys of a given type to Dictionaries and Hash Tables integers in a fixed interval [0, N − 1] Example: h ( x ) = x mod N 0 is a hash function for integer keys ∅ 1 025-612-0001 The integer h ( x ) is called the hash value of key x 2 981-101-0002 3 ∅ A hash table for a given key type consists of 4 451-229-0004 � Hash function h � Array (called table) of size N When implementing a dictionary with a hash table, the goal is to store item ( k , o ) at index i = h ( k ) Dictionaries and Hash Tables 1 Dictionaries and Hash Tables 4 Dictionary ADT (§2.5.1) Example The dictionary ADT models a Dictionary ADT methods: We design a hash table for 0 ∅ searchable collection of key- � findElement(k): if the 1 element items a dictionary storing items 025-612-0001 dictionary has an item with 2 key k, returns its element, 981-101-0002 The main operations of a (SSN, Name), where SSN else, returns the special 3 ∅ dictionary are searching, (social security number) is a element NO_SUCH_KEY inserting, and deleting items 4 451-229-0004 � insertItem(k, o): inserts item nine-digit positive integer Multiple items with the same (k, o) into the dictionary … key are allowed Our hash table uses an � removeElement(k): if the 9997 Applications: ∅ dictionary has an item with array of size N = 10,000 and 9998 key k, removes it from the 200-751-9998 � address book the hash function dictionary and returns its 9999 ∅ � credit card authorization element, else returns the h ( x ) = last four digits of x � mapping host names (e.g., special element cs16.net) to internet addresses NO_SUCH_KEY (e.g., 128.148.34.101) � size(), isEmpty() � keys(), elements() Dictionaries and Hash Tables 2 Dictionaries and Hash Tables 5 Log File (§2.5.1) Hash Functions (§ 2.5.3) A log file is a dictionary implemented by means of an unsorted sequence A hash function is The hash code map is � We store the items of the dictionary in a sequence (based on a applied first, and the usually specified as the doubly-linked lists or a circular array), in arbitrary order compression map is Performance: composition of two applied next on the � insertItem takes O (1) time since we can insert the new item at the functions: beginning or at the end of the sequence result, i.e., � findElement and removeElement take O ( n ) time since in the worst Hash code map: h ( x ) = h 2 ( h 1 ( x )) case (the item is not found) we traverse the entire sequence to look for an item with the given key h 1 : keys → integers The goal of the hash The log file is effective only for dictionaries of small size or for function is to Compression map: dictionaries on which insertions are the most common “disperse” the keys in operations, while searches and removals are rarely performed h 2 : integers → [0, N − 1] (e.g., historical record of logins to a workstation) an apparently random way Dictionaries and Hash Tables 3 Dictionaries and Hash Tables 6 1
Dictionaries 1/19/2005 11:37 PM Collision Handling Hash Code Maps (§2.5.3) (§ 2.5.5) Memory address: Component sum: Collisions occur when 0 � We reinterpret the memory ∅ � We partition the bits of address of the key object as 1 025-612-0001 the key into components different elements are an integer (default hash code 2 ∅ of fixed length (e.g., 16 mapped to the same of all Java objects) 3 ∅ or 32 bits) and we sum � Good in general, except for 4 cell 451-229-0004 981-101-0004 the components numeric and string keys (ignoring overflows) Chaining : let each Integer cast: � Suitable for numeric keys Chaining is simple, � We reinterpret the bits of the cell in the table point of fixed length greater key as an integer than or equal to the to a linked list of but requires � Suitable for keys of length number of bits of the elements that map additional memory less than or equal to the integer type (e.g., long number of bits of the integer outside the table and double in Java) there type (e.g., byte, short, int and float in Java) Dictionaries and Hash Tables 7 Dictionaries and Hash Tables 10 Hash Code Maps (cont.) Linear Probing (§2.5.5) Open addressing: the Example: Polynomial accumulation: Polynomial p ( z ) can be colliding item is placed in a � We partition the bits of the evaluated in O ( n ) time � h ( x ) = x mod 13 different cell of the table key into a sequence of using Horner’s rule: Linear probing handles � Insert keys 18, 41, components of fixed length collisions by placing the � The following (e.g., 8, 16 or 32 bits) 22, 44, 59, 32, 31, colliding item in the next polynomials are a 0 a 1 … a n − 1 73, in this order (circularly) available table cell � We evaluate the polynomial successively computed, Each table cell inspected is p ( z ) = a 0 + a 1 z + a 2 z 2 + … each from the previous referred to as a “probe” one in O (1) time … + a n − 1 z n − 1 Colliding items lump together, at a fixed value z , ignoring p 0 ( z ) = a n − 1 causing future collisions to 0 1 2 3 4 5 6 7 8 9 10 11 12 overflows p i ( z ) = a n − i − 1 + zp i − 1 ( z ) cause a longer sequence of � Especially suitable for strings ( i = 1, 2, …, n − 1) probes (e.g., the choice z = 33 gives 41 18 44 59 32 22 31 73 We have p ( z ) = p n − 1 ( z ) at most 6 collisions on a set 0 1 2 3 4 5 6 7 8 9 10 11 12 of 50,000 English words) Dictionaries and Hash Tables 8 Dictionaries and Hash Tables 11 Compression Search with Linear Probing Maps (§2.5.4) Consider a hash table A Algorithm findElement ( k ) Division: Multiply, Add and i ← h ( k ) that uses linear probing p ← 0 Divide (MAD): � h 2 ( y ) = y mod N findElement ( k ) repeat � The size N of the � h 2 ( y ) = ( ay + b ) mod N c ← A [ i ] � We start at cell h ( k ) if c = ∅ hash table is usually � a and b are � We probe consecutive return NO_SUCH_KEY chosen to be a prime locations until one of the nonnegative integers else if c.key () = k following occurs � The reason has to do such that return c.element () � An item with key k is with number theory a mod N ≠ 0 else found, or i ← ( i + 1) mod N and is beyond the � Otherwise, every � An empty cell is found, p ← p + 1 scope of this course or integer would map to until p = N � N cells have been the same value b return NO_SUCH_KEY unsuccessfully probed Dictionaries and Hash Tables 9 Dictionaries and Hash Tables 12 2
Recommend
More recommend