cs 758 858 algorithms
play

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching - PowerPoint PPT Presentation

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions Wheeler Ruml (UNH) Class 4, CS 758 1 / 15 Searching Dictionaries Hash Tables Hash Functions Searching Wheeler Ruml (UNH) Class 4, CS


  1. CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions Wheeler Ruml (UNH) Class 4, CS 758 – 1 / 15

  2. Searching ■ Dictionaries Hash Tables Hash Functions Searching Wheeler Ruml (UNH) Class 4, CS 758 – 2 / 15

  3. Dictionaries ‘associative array’, ‘map’, ‘look-up table’, ‘set’ Searching ■ Dictionaries Hash Tables Hash Functions Wheeler Ruml (UNH) Class 4, CS 758 – 3 / 15

  4. Dictionaries ‘associative array’, ‘map’, ‘look-up table’, ‘set’ Searching n items, key length k ■ Dictionaries Hash Tables Hash Functions Structure Find Insert Delete List (unsorted) List (sorted) Array (unsorted) Array (sorted) Heap Hash table Binary tree (unbalanced) Binary tree (balanced) Wheeler Ruml (UNH) Class 4, CS 758 – 3 / 15

  5. Searching Hash Tables ■ Hash Tables ■ Time Complexity ■ More Collisions ■ Open Addressing ■ Break Hash Functions Hash Tables Wheeler Ruml (UNH) Class 4, CS 758 – 4 / 15

  6. Hash Tables applications: Searching Hash Tables 1. dictionaries ■ Hash Tables 2. object method tables ■ Time Complexity ■ More Collisions 3. string matching ■ Open Addressing 4. set operations: ∪ , ∩ , − ■ Break Hash Functions first methods: 1. direct-address tables: small key range. eg, bit vectors. 2. chaining: deletion? Wheeler Ruml (UNH) Class 4, CS 758 – 5 / 15

  7. Time Complexity n items in m buckets Searching time complexity of search = Hash Tables ■ Hash Tables ■ Time Complexity ■ More Collisions ■ Open Addressing ■ Break Hash Functions Wheeler Ruml (UNH) Class 4, CS 758 – 6 / 15

  8. Time Complexity n items in m buckets Searching time complexity of search = number of items per bucket Hash Tables ■ Hash Tables assume nice hash: P ( h ( i ) = x ) = 1 /m ■ Time Complexity ■ More Collisions ■ Open Addressing ■ Break Hash Functions Wheeler Ruml (UNH) Class 4, CS 758 – 6 / 15

  9. Time Complexity n items in m buckets Searching time complexity of search = number of items per bucket Hash Tables ■ Hash Tables assume nice hash: P ( h ( i ) = x ) = 1 /m ■ Time Complexity let X i be 1 iff h ( i ) = x , 0 otherwise ■ More Collisions ■ Open Addressing ■ Break n n Hash Functions � � E [ X i ] = E [ X i ] i =1 i =1 n � = 1 /m i =1 = n/m let α = n m ‘load factor’ expected number of items per bucket is α expected time is Θ(1 + α ) Wheeler Ruml (UNH) Class 4, CS 758 – 6 / 15

  10. More Collisions probability that k of n elements land in same of m bins: Searching let α = n m ‘load factor’ Hash Tables ■ Hash Tables � � 1 ■ Time Complexity � k � � n − k ≈ α k � n 1 − 1 ■ More Collisions ■ Open Addressing e α k ! k m m ■ Break Hash Functions k probability 0 0.37 1 0.37 2 0.18 1 if n = m , ≈ ek ! : 3 0.06 4 0.015 5 0.003 > 5 0.002 total Wheeler Ruml (UNH) Class 4, CS 758 – 7 / 15

  11. Open Addressing 1. linear probing: h ( k, i ) = ( h 1 ( k ) + i ) mod m for increasing i Searching Hash Tables the runs ■ ■ Hash Tables ■ Time Complexity 2. double hashing: h ( k, i ) = ( h 1 ( k ) + ih 2 ( k )) mod m for ■ More Collisions increasing i ■ Open Addressing ■ Break requires: h 2 � = 0 , h 2 ( k ) and m relatively prime ■ Hash Functions eg, m prime and h 2 ( k ) < m ■ or, m = 2 x and h 2 ( k ) odd ■ 3. cuckoo hashing: lookups O (1) , insertions amortized expected O (1) moral: low load factor deletion? Wheeler Ruml (UNH) Class 4, CS 758 – 8 / 15

  12. Break asst 2 ■ Searching asst 3 ■ Hash Tables ■ Hash Tables ■ Time Complexity ■ More Collisions ■ Open Addressing ■ Break Hash Functions Wheeler Ruml (UNH) Class 4, CS 758 – 9 / 15

  13. Searching Hash Tables Hash Functions ■ Hash Functions ■ Hash Functions ■ Basic Hash ■ Better Hash ■ EOLQs Hash Functions Wheeler Ruml (UNH) Class 4, CS 758 – 10 / 15

  14. Hash Functions h : key → 0 ..m − 1 Searching Hash Tables 1. mediocre is easy, good takes effort Hash Functions 2. want time (at most) linear in key size ■ Hash Functions 3. perfect hashing is possible (and efficient) if keys known ■ Hash Functions ■ Basic Hash linear time to construct, linear space to store ■ Better Hash ■ ■ EOLQs 4. minimal perfect hashing is possible! Wheeler Ruml (UNH) Class 4, CS 758 – 11 / 15

  15. Hash Functions h : key → 0 ..m − 1 Searching Hash Tables 1. mediocre is easy, good takes effort Hash Functions 2. want time (at most) linear in key size ■ Hash Functions 3. perfect hashing is possible (and efficient) if keys known ■ Hash Functions ■ Basic Hash linear time to construct, linear space to store ■ Better Hash ■ ■ EOLQs 4. minimal perfect hashing is possible! bad news: if | keys | ≥ m , there must be collisions ■ if | keys | ≥ n · m , then ∃ set of n that map to same bin ■ Wheeler Ruml (UNH) Class 4, CS 758 – 11 / 15

  16. Hash Functions Desiderata: Searching Hash Tables make collisions unlikely ■ Hash Functions spread keys across all hashes ◆ ■ Hash Functions ■ Hash Functions for each key, each hash equally likely ◆ ■ Basic Hash ■ Better Hash similar keys get different hashes ■ ■ EOLQs all bits of key affect the hash ◆ every bit of key affects every bit of hash ◆ no input always gives worst-case behavior ■ fast to compute ■ low memory requirement ■ easy to implement ■ Wheeler Ruml (UNH) Class 4, CS 758 – 12 / 15

  17. Basic Multiplicative Hashing 1. hash ← 0 Searching Hash Tables 2. for each byte of key Hash Functions 3. hash ← ( hash × multiplier ) + byte ■ Hash Functions 5. return hash mod table-size ■ Hash Functions ■ Basic Hash ■ Better Hash want multiplier to smear bits, not shift them (to avoid ■ EOLQs interaction with table size) multiplier = 31 or 127 Wheeler Ruml (UNH) Class 4, CS 758 – 13 / 15

  18. Tabulation Hashing assume we have a table of 256 random integers Searching Hash Tables 1. hash ← 0 Hash Functions 2. for each byte of key ■ Hash Functions ■ Hash Functions 3. rotate the bits in hash by 1 ■ Basic Hash ■ Better Hash 4. hash ← hash xor table [ byte ] ■ EOLQs 5. return hash mod table-size each byte affects all bits rotate makes order matter universal class of hash functions : for randomly chosen keys, randomly chosen function from class has P ( collision ) = 1 /m good on average case (over inputs) � = good average case on any input Wheeler Ruml (UNH) Class 4, CS 758 – 14 / 15

  19. EOLQs What’s still confusing? ■ Searching What question didn’t you get to ask today? ■ Hash Tables What would you like to hear more about? ■ Hash Functions ■ Hash Functions ■ Hash Functions Please write down your most pressing question about algorithms ■ Basic Hash and put it in the box on your way out. ■ Better Hash ■ EOLQs Thanks! Wheeler Ruml (UNH) Class 4, CS 758 – 15 / 15

Recommend


More recommend