dictionaries and hash tables
play

Dictionaries and Hash Tables 0 1 025-612-0001 2 981-101-0002 - PowerPoint PPT Presentation

Dictionaries and Hash Tables 0 1 025-612-0001 2 981-101-0002 3 4 451-229-0004 Dictionaries and Hash Tables 1 Dictionary ADT (8.1.1) The dictionary ADT models a Dictionary ADT methods: searchable collection of key- find(k):


  1. Dictionaries and Hash Tables ∅ 0 1 025-612-0001 2 981-101-0002 ∅ 3 4 451-229-0004 Dictionaries and Hash Tables 1

  2. Dictionary ADT (§8.1.1) The dictionary ADT models a Dictionary ADT methods: searchable collection of key- find(k): if the dictionary has � element items an item with key k, returns The main operations of a the position of this element, dictionary are searching, else, returns a null position. inserting, and deleting items insertItem(k, o): inserts item � Multiple items with the same (k, o) into the dictionary key are allowed removeElement(k): if the � Applications: dictionary has an item with key k, removes it from the address book � dictionary and returns its credit card authorization � element. An error occurs if mapping host names (e.g., � there is no such element. cs16.net) to internet addresses (e.g., 128.148.34.101) size(), isEmpty() � keys(), Elements() � Dictionaries and Hash Tables 2

  3. Log File (§8.1.2) A log file is a dictionary implemented by means of an unsorted sequence We store the items of the dictionary in a sequence (based on a � doubly-linked lists or a circular array), in arbitrary order Performance: insertItem takes O (1) time since we can insert the new item at the � beginning or at the end of the sequence find and removeElement take O ( n ) time since in the worst case � (the item is not found) we traverse the entire sequence to look for an item with the given key The log file is effective only for dictionaries of small size or for dictionaries on which insertions are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation) Dictionaries and Hash Tables 3

  4. Hash Functions and Hash Tables (§8.2) A hash function h maps keys of a given type to integers in a fixed interval [0, N − 1] Example: h ( x ) = x mod N is a hash function for integer keys The integer h ( x ) is called the hash value of key x A hash table for a given key type consists of � Hash function h � Array (called table) of size N When implementing a dictionary with a hash table, the goal is to store item ( k , o ) at index i = h ( k ) Dictionaries and Hash Tables 4

  5. Example ∅ We design a hash table for 0 1 025-612-0001 a dictionary storing items 2 981-101-0002 (SSN, Name), where SSN ∅ 3 (social security number) is a 4 451-229-0004 nine-digit positive integer … Our hash table uses an ∅ 9997 array of size N = 10,000 and 9998 200-751-9998 the hash function ∅ 9999 h ( x ) = last four digits of x Dictionaries and Hash Tables 5

  6. Hash Functions (§8.2.2) A hash function is usually The hash code map is specified as the composition applied first, and the of two functions: compression map is applied next on the Hash code map: h 1 : keys → integers result, i.e., h ( x ) = h 2 ( h 1 ( x )) Compression map: h 2 : integers → [0, N − 1] The goal of the hash function is to “disperse” the keys as uniformly as possible Dictionaries and Hash Tables 6

  7. Hash Code Maps (§8.2.3) Memory address: Component sum: � Reinterpret the memory � We partition the bits of address of the key object as the key into components an integer of fixed length (e.g., 16 or 32 bits) and we sum Integer cast: the components � We reinterpret the bits of the (ignoring overflows) key as an integer � Suitable for numeric keys � Suitable for keys of length of fixed length greater less than or equal to the than or equal to the number of bits of the integer number of bits of the type (e.g., char, short, int integer type (e.g., long and float on many machines) and double on many machines) Dictionaries and Hash Tables 7

  8. Hash Code Maps (cont.) Polynomial accumulation: Polynomial p ( z ) can be � We partition the bits of the evaluated in O ( n ) time key into a sequence of using Horner’s rule: components of fixed length � The following (e.g., 8, 16 or 32 bits) a 0 a 1 … a n − 1 polynomials are successively computed, � We evaluate the polynomial p ( z ) = a 0 + a 1 z + a 2 z 2 + … each from the previous … + a n − 1 z n − 1 one in O (1) time p 0 ( z ) = a n − 1 at a fixed value z , ignoring overflows p i ( z ) = a n − i − 1 + zp i − 1 ( z ) ( i = 1, 2, …, n − 1) � Especially suitable for strings (e.g., the choice z = 33 gives We have p ( z ) = p n − 1 ( z ) at most 6 collisions on a set of 50,000 English words) Dictionaries and Hash Tables 8

  9. Compression Maps (§8.2.4) Division: Multiply, Add and � h 2 ( y ) = y mod N Divide (MAD): � h 2 ( y ) = ( ay + b ) mod N � The size N of the hash table is usually � a and b are chosen to be a prime nonnegative integers such that a mod N ≠ 0 � Otherwise, every integer would map to the same value b Dictionaries and Hash Tables 9

  10. Collision Handling (§8.2.5) ∅ 0 Collisions occur when 1 025-612-0001 different elements are ∅ 2 mapped to the same ∅ 3 4 451-229-0004 981-101-0004 cell Chaining : let each Chaining is simple, cell in the table point but requires to a linked list of additional memory elements that map outside the table there Dictionaries and Hash Tables 10

  11. Linear Probing Open addressing: the Example: colliding item is placed in a � h ( x ) = x mod 13 different cell of the table Linear probing handles � Insert keys 18, 41, collisions by placing the 22, 44, 59, 32, 31, colliding item in the next (circularly) available table cell 73, in this order Each table cell inspected is referred to as a “probe” Colliding items lump together, 0 1 2 3 4 5 6 7 8 9 10 11 12 causing future collisions to cause a longer sequence of probes 41 18 44 59 32 22 31 73 0 1 2 3 4 5 6 7 8 9 10 11 12 Dictionaries and Hash Tables 11

  12. Search with Linear Probing Algorithm find ( k ) Consider a hash table A i ← h ( k ) that uses linear probing p ← 0 find ( k ) repeat c ← A [ i ] � We start at cell h ( k ) if c = ∅ � We probe consecutive return Position(null) locations until one of the else if c.key () = k following occurs return Position(c) � An item with key k is else found, or i ← ( i + 1) mod N � An empty cell is found, p ← p + 1 or until p = N � N cells have been return Position(null) unsuccessfully probed Dictionaries and Hash Tables 12

  13. Updates with Linear Probing To handle insertions and insertItem ( k, o ) deletions, we introduce a � We throw an exception special object, called if the table is full AVAILABLE , which replaces � We start at cell h ( k ) deleted elements � We probe consecutive removeElement ( k ) cells until one of the We search for an item with � following occurs key k � A cell i is found that is If such an item ( k, o ) is � either empty or stores found, we replace it with the AVAILABLE , or special item AVAILABLE � N cells have been and we return the position of unsuccessfully probed this item � We store item ( k, o ) in Else, we return a null � cell i position Dictionaries and Hash Tables 13

  14. Double Hashing Double hashing uses a Common choice of secondary hash function compression map for the d ( k ) and handles secondary hash function: collisions by placing an d 2 ( k ) = q − k mod q item in the first available cell of the series where ( i + jd ( k )) mod N � q < N for j = 0, 1, … , N − 1 � q is a prime The secondary hash The possible values for function d ( k ) cannot d 2 ( k ) are have zero values 1, 2, … , q The table size N must be a prime to allow probing of all the cells Dictionaries and Hash Tables 14

  15. Example of Double Hashing k h ( k ) d ( k ) Probes Consider a hash 18 5 3 5 table storing integer 41 2 1 2 keys that handles 22 9 6 9 collision with double 44 5 5 5 10 hashing 59 7 4 7 � N = 13, q = 7 32 6 3 6 31 5 4 5 9 0 � h ( k ) = k mod 13 73 8 4 8 � d ( k ) = 7 − k mod 7 � ( h(k) + jd ( k )) mod N � j = 0, 1, … 0 1 2 3 4 5 6 7 8 9 10 11 12 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 31 41 18 32 59 73 22 44 0 1 2 3 4 5 6 7 8 9 10 11 12 Dictionaries and Hash Tables 15

  16. Performance of Hashing In the worst case, searches, The expected running insertions and removals on a time of all the dictionary hash table take O ( n ) time ADT operations in a The worst case occurs when hash table is O (1) all the keys inserted into the dictionary collide In practice, hashing is The load factor α = n / N very fast provided the affects the performance of a load factor is not close hash table to 100% Assuming that the keys are random numbers, it can be Applications of hash shown that the expected tables: number of probes for an small databases � insertion with open compilers � addressing is 1 / (1 − α ) browser caches � Dictionaries and Hash Tables 16

Recommend


More recommend