hashing introduction
play

Hashing - Introduction Dictionary Dictionary = a dynamic set that - PowerPoint PPT Presentation

Hashing - Introduction Dictionary Dictionary = a dynamic set that supports the = a dynamic set that supports the operations INSERT, DELETE, SEARCH operations INSERT, DELETE, SEARCH Examples : Examples : a symbol table


  1. Hashing - Introduction � Dictionary Dictionary = a dynamic set that supports the = a dynamic set that supports the � operations INSERT, DELETE, SEARCH operations INSERT, DELETE, SEARCH � Examples : Examples : � � a symbol table created by a compiler a symbol table created by a compiler � � a phone book a phone book � � an actual dictionary an actual dictionary � � Hash table Hash table = a data structure good at = a data structure good at � implementing dictionaries implementing dictionaries 1

  2. Hashing - Introduction � Why not just use an array with Why not just use an array with direct addressing direct addressing � (where each array cell corresponds to a key)? (where each array cell corresponds to a key)? � Direct Direct- -addressing guarantees addressing guarantees O(1) worst O(1) worst- -case case � time for Insert/Delete/Search. time for Insert/Delete/Search. � BUT sometimes, the number BUT sometimes, the number K K of keys actually of keys actually � stored is very small compared to the number N N stored is very small compared to the number of possible keys. Using an array of size N N of possible keys. Using an array of size would waste space. waste space. would Θ ( We’d like to use a structure that takes up Θ � We’d like to use a structure that takes up ( K K ) ) � space and O(1) average- -case time for case time for space and O(1) average Insert/Delete/ Search Insert/Delete/ Search 2

  3. Hashing � Hashing Hashing = = � � use a table (array/vector) of size use a table (array/vector) of size m m to store to store � elements from a set of much larger size elements from a set of much larger size � given a key given a key k k , use a function , use a function h h to compute the to compute the � slot h h ( ( k k ) for that key. ) for that key. slot � Terminology: Terminology: � � h h is a is a hash function hash function � � k k hashes hashes to slot to slot h h ( ( k k ) ) � � the the hash value hash value of of k k is is h h ( ( k k ) ) � � collision collision : when two keys have the same hash : when two keys have the same hash � value value 3

  4. Hashing � What makes a What makes a good hash function good hash function? ? � � It is easy to compute It is easy to compute � � It satisfies uniform hashing It satisfies uniform hashing � � hash = hash = to chop into small pieces (Merriam to chop into small pieces (Merriam- - � Webster) Webster) = to chop any patterns in the keys so to chop any patterns in the keys so = that the results are uniformly that the results are uniformly distributed (cs311) (cs311) distributed 4

  5. Hashing � What if the key is not a natural number? What if the key is not a natural number? � � We must find a way to represent it as a natural We must find a way to represent it as a natural � number. number. � Examples: Examples: � i → → Use its � key key i Use its ascii ascii decimal value, 105 decimal value, 105 � inx → → Combine the individual � key key inx Combine the individual ascii ascii values values � in some way, for example, in some way, for example, 105*128 2 2 +110*128+120= 1734520 +110*128+120= 1734520 105*128 5

  6. Hashing - hash functions Truncation Truncation � Ignore Ignore part of the key and use the remaining part part of the key and use the remaining part � directly as the index. directly as the index. � Example Example : if the keys are 8 : if the keys are 8- -digit numbers and the digit numbers and the � hash table has 1000 entries, then the first, fourth hash table has 1000 entries, then the first, fourth and eighth digit could make the hash function. and eighth digit could make the hash function. � Not a very good method : does not distribute keys Not a very good method : does not distribute keys � uniformly uniformly 6

  7. Hashing Folding Folding � Break up the key in parts and combine them in Break up the key in parts and combine them in � some way. some way. � Example Example : if the keys are 8 digit numbers and the : if the keys are 8 digit numbers and the � hash table has 1000 entries, break up a key into hash table has 1000 entries, break up a key into three, three and two digits, add them up and, if three, three and two digits, add them up and, if necessary, truncate them. necessary, truncate them. � Better than truncation. Better than truncation. � 7

  8. Hashing Division Division � If the hash table has If the hash table has m m slots, define slots, define � h ( ( k k )= )= k k mod mod m m h � Fast Fast � � Not all values of Not all values of m m are suitable for this. For are suitable for this. For � example powers of 2 should be avoided. example powers of 2 should be avoided. � Good values for Good values for m m are are prime numbers prime numbers that are not that are not � very close to powers of 2. very close to powers of 2. 8

  9. Hashing Multiplication Multiplication  m m ∗ ∗ ( k ∗ ∗ c -   k k ∗ ∗ c  )  , 0< )=  c  )  � h h ( ( k k )= ( k c - , 0< c c <1 <1 � � In English : In English : � � Multiply the key Multiply the key k k by a constant by a constant c c , 0< , 0< c c <1 <1 � k ∗ ∗ c � Take the fractional part of Take the fractional part of k c � � Multiply that by Multiply that by m m � � Take the floor of the result Take the floor of the result � � The value of The value of m m does not make a difference does not make a difference � � Some values of Some values of c c work better than others work better than others � − ( 5 1 ) / 2 � A good value is A good value is � 9

  10. Hashing Multiplication Multiplication � Example: Example: � Suppose the size of the table, m m , is 1301. , is 1301. Suppose the size of the table, For k k =1234, =1234, h h ( ( k k )=850 )=850 For For k k =1235, =1235, h h ( ( k k )=353 )=353 For pattern broken For k k =1236, =1236, h h ( ( k k )=115 )=115 For For k k =1237, =1237, h h ( ( k k )=660 )=660 For distribution fairly For k k =1238, =1238, h h ( ( k k )=164 )=164 For uniform For k k =1239, =1239, h h ( ( k k )=968 )=968 For For k k =1240, =1240, h h ( ( k k )=471 )=471 For 10

  11. Hashing Universal Hashing Universal Hashing � Worst Worst- -case scenario: The chosen keys all hash to case scenario: The chosen keys all hash to � the same slot. This can be avoided if the hash hash the same slot. This can be avoided if the function is not fixed: : function is not fixed � Start with a collection of hash functions Start with a collection of hash functions � � Select one in random and use that. Select one in random and use that. � � Good performance on average Good performance on average: the probability that : the probability that � the randomly chosen hash function exhibits the the randomly chosen hash function exhibits the worst- -case behavior is very low. case behavior is very low. worst 11

  12. Hashing Universal Hashing Universal Hashing � Let Let H H be a collection of hash functions that map a be a collection of hash functions that map a � given universe U U of keys into the range {0, 1,..., of keys into the range {0, 1,..., given universe m - -1}. 1}. m ∈ U l ∈ � If for each pair of distinct keys If for each pair of distinct keys k k , , l U the number the number � ∈ H h ∈ of hash functions h H for which for which h h ( ( k k )== )== h h ( ( l l ) is ) is of hash functions  H  /  H  / m m , then , then H H is called is called universal universal. . 12

  13. Hashing � Given a hash table with Given a hash table with m m slots and slots and n n elements elements � stored in it, we define the load factor load factor of the table of the table stored in it, we define the λ = as λ = n n / / m m as � The load factor gives us an The load factor gives us an indication of how full indication of how full � the table is. the table is. � The possible values of the load factor depend on The possible values of the load factor depend on � the method we use for resolving collisions. the method we use for resolving collisions. 13

  14. Hashing - resolving collisions Chaining a.k.a closed addressing Chaining a.k.a closed addressing � Idea Idea : put all elements that hash to the same slot in : put all elements that hash to the same slot in � a linked list linked list (chain). The slot contains a pointer to (chain). The slot contains a pointer to a the head of the list. the head of the list. � The load factor indicates the average number of The load factor indicates the average number of � elements stored in a chain. It could be less than, elements stored in a chain. It could be less than, equal to, or larger than 1. equal to, or larger than 1. 14

  15. Hashing - resolving collisions Chaining Chaining � Insert : O(1) Insert : O(1) � � worst case worst case � � Delete : O(1) Delete : O(1) � � worst case worst case � � assuming doubly assuming doubly- -linked list linked list � � it’s O(1) after the element has been found it’s O(1) after the element has been found � � Search : ? Search : ? � � depends on length of chain. depends on length of chain. � 15

Recommend


More recommend