hash tables
play

Hash Tables Direct-Address Tables Hash Functions Universal Hashing - PowerPoint PPT Presentation

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables Let U = { 0 , . . . , m 1 } , the set of possible keys. direct


  1. Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide – 1

  2. Direct-Address Tables Let U = { 0 , . . . , m − 1 } , the set of possible keys. ⊲ direct address Use array T [0 . . . m − 1] as a direct-address table. hash tables hash functions Implies 1-1 correspondence between keys and slots. hash functions universal hashing chaining chaining 2 open address Direct-Address-Search ( T, k ) open address 2 return T [ k ] analysis analysis 2 practical Direct-Address-Insert ( T, x ) practical T [ x.key ] ← x Direct-Address-Delete ( T, k ) T [ x.key ] ← nil Advantage: operations are Θ(1) . Disadvantage: Θ( | U | ) space required. CS 5633 Analysis of Algorithms Chapter 11: Slide – 2

  3. Hash Tables Let K be the set of keys to be stored. direct address ⊲ hash tables hash functions hash functions Goal: use Θ( | K | ) space and Θ(1) time/op. universal hashing chaining chaining 2 open address Idea: Use array T [0 . . . m − 1] as a hash table, open address 2 analysis and use a Θ(1) hash function h , where analysis 2 practical h : U → { 0 , . . . , m − 1 } maps from keys to slots. practical A collision is when two keys map to the same slot. CS 5633 Analysis of Algorithms Chapter 11: Slide – 3

  4. Good Hash Functions direct address Division method: h ( k ) = k mod m hash tables ⊲ hash functions m is prime, not close to any 2 i . hash functions universal hashing chaining Division variation: h ( k ) = ( k mod M ) mod m chaining 2 open address M is prime, << than | U | , not close to any 2 i . open address 2 analysis m is << than M . analysis 2 practical practical Multiplication method: h ( k ) = ⌊ m (( kA ) mod 1) ⌋ √ m is a power of 2. A = ( 5 − 1) / 2 CS 5633 Analysis of Algorithms Chapter 11: Slide – 4

  5. Horner’s Method for Division Hash Function direct address If k = � k [1] , . . . , k [ l ] � , and if 0 ≤ k [ i ] < r , then hash tables compute hash function by: hash functions ⊲ hash functions universal hashing chaining chaining 2 open address open address 2 analysis h ← k [1] mod m analysis 2 practical for i ← 2 to l practical do h ← ( rh + k [ i ]) mod m CS 5633 Analysis of Algorithms Chapter 11: Slide – 5

  6. Universal Hashing Let H be a set of hashing functions. direct address hash tables H is universal if h ( k ) = h ( k ′ ) with prob. 1 /m hash functions hash functions ⊲ universal hashing chaining m is a prime number. chaining 2 open address k = � k [1] , . . . , k [ l ] � , where 0 ≤ k [ i ] < m open address 2 analysis Assign a [ i ] ← Random ( 0 , m − 1 ) analysis 2 practical practical � � l h ( k ) = i =1 a [ i ] ∗ k [ i ] mod m Σ The set of possible functions h ( k ) is universal. h ( k ) = h ( k ′ ) with prob. 1 /m . If k [ i ] � = k ′ [ i ] , ( a [ i ] ∗ ( k [ i ] − k ′ [ i ])) mod m has equally likely results. CS 5633 Analysis of Algorithms Chapter 11: Slide – 6

  7. Chaining In chaining, slots are linked lists of the elements direct address hash tables that hash to that slot, i.e., collisions. hash functions hash functions universal hashing ⊲ chaining Consider m slots, n elts., load factor α = n/m . chaining 2 open address Worst-case: Θ( n ) if all elts. hash to same slot. open address 2 analysis Best-case: Θ(1 + α ) , each slot has ⌊ α ⌋ or ⌈ α ⌉ . analysis 2 practical practical Average-case: Assume each slot is equally likely. Unsuccessful search: Θ(1 + α ) This is because average slot length = α . CS 5633 Analysis of Algorithms Chapter 11: Slide – 7

  8. Chaining, Part 2 direct address Successful search: Θ(1 + α ) hash tables Before i th elt. inserted, avg. length = ( i − 1) /m . hash functions hash functions Expected position of i th elt. = 1 + ( i − 1) /m . universal hashing chaining ⊲ chaining 2 open address Expected search length is the summation: open address 2 analysis analysis 2 n practical n elements to search for. Σ practical i =1 1 /n Prob. for i th element is 1 /n . 1 + ( i − 1) /m Expected position of i th elt. � 1 � � � 1 + i − 1 = 1 + α 2 − 1 n Σ n m 2 m i =1 CS 5633 Analysis of Algorithms Chapter 11: Slide – 8

  9. Open-Address Hashing In open addressing, when a collision occurs, probe direct address hash tables for an empty slot and insert the new elt. there. hash functions hash functions universal hashing chaining The hash function becomes: chaining 2 ⊲ open address h : U × { 0 , . . . , m − 1 } → { 0 , . . . , m − 1 } open address 2 analysis analysis 2 The probe sequence � h ( k, 0) , . . . , h ( k, m − 1) � practical practical should include all the slots. CS 5633 Analysis of Algorithms Chapter 11: Slide – 9

  10. Open-Address Hashing, Part 2 direct address Hash-Insert ( T, x ) hash tables hash functions for i ← 0 to m − 1 hash functions universal hashing do j ← h ( x.key, i ) chaining chaining 2 if T [ j ] = nil open address ⊲ open address 2 then T [ j ] ← x analysis analysis 2 return j practical practical error “hash table overflow” Hash-Delete marks the slot as deleted. Hash-Search must continue past deleted slots. Hash-Insert can put new elts. in deleted slots. CS 5633 Analysis of Algorithms Chapter 11: Slide – 10

  11. Uniform Hashing Analysis Uniform hashing assumes each open-address direct address hash tables probe-sequence is equally likely. hash functions hash functions universal hashing chaining 1 � � Unsuccessful Search: Θ chaining 2 1 − α open address open address 2 Let p i = prob. exactly i probes find full slots. ⊲ analysis Let q i = prob. first i probes find full slots. analysis 2 practical p i = q i − q i +1 practical � n � � n − 1 < α 2 � q 1 = n/m = α and q 2 = m − 1 m � n n − k � i i − 1 = α i q i = m − k ≤ Π m k =0 CS 5633 Analysis of Algorithms Chapter 11: Slide – 11

  12. Uniform Hashing Analysis, Part 2 Average number of probes is: direct address hash tables hash functions 1 hash functions ∞ n n i =0 α i = 1 + i =1 i p i = 1 + i =1 q i ≤ universal hashing Σ Σ Σ chaining 1 − α chaining 2 open address � 1 open address 2 1 � Successful Search: Θ α ln analysis 1 − α ⊲ analysis 2 Inserting i th elt. = unsuccessful search i − 1 elts. practical practical Average number of probes is: � 1 � � � 1 ≤ 1 1 n α ln Σ n 1 − ( i − 1) /m 1 − α i =1 CS 5633 Analysis of Algorithms Chapter 11: Slide – 12

  13. Performance of Practical Methods Linear Probing: h ( k, i ) = ( h ′ ( k ) + i ) mod m direct address hash tables hash functions 1 � � Successful Search: Θ hash functions 1 − α universal hashing � � chaining 1 Unsuccessful Search: Θ chaining 2 (1 − α ) 2 open address open address 2 analysis analysis 2 ⊲ practical Linear probing suffers from primary clustering , practical from long runs of occupied slots. An empty slot preceded by i full slots gets filled next with probability ( i + 1) /m . CS 5633 Analysis of Algorithms Chapter 11: Slide – 13

  14. Performance of Practical Methods Quadratic Probing assumes m is a power of 2. direct address hash tables hash functions 2 + i 2 h ( k, i ) = ( h ′ ( k ) + i hash functions 2 ) mod m universal hashing chaining chaining 2 � 1 open address 1 � Successful Search: Θ α ln open address 2 1 − α analysis 1 � � analysis 2 Unsuccessful Search: Θ practical 1 − α ⊲ practical Double Hashing, m is prime, 1 ≤ h 2 ( k ) ≤ m − 1 h ( k, i ) = ( h 1 ( k ) + i h 2 ( k )) mod m � 1 1 � Successful Search: Θ α ln 1 − α 1 � � Unsuccessful Search: Θ 1 − α CS 5633 Analysis of Algorithms Chapter 11: Slide – 14

Recommend


More recommend