cse 326 data structures
play

CSE 326: Data Structures (amortized) linked list Array Hash - PowerPoint PPT Presentation

Dictionary Implementations So Far BST AVL Splay Unsorted Sorted CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins Spring 2007 Delete Lecture 16 1 2 Hash Tables Example 0 Constant


  1. Dictionary Implementations So Far BST AVL Splay Unsorted Sorted CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins Spring 2007 Delete Lecture 16 1 2 Hash Tables Example 0 • Constant time accesses! • key space = integers hash table 1 • A hash table is an array of some • TableSize = 10 2 0 fixed size, usually a prime number. 3 • General idea: 4 • h (K) = K mod 10 5 hash function: h(K) 6 … • Insert : 7, 18, 41, 94 7 8 9 key space (e.g., integers, strings) TableSize –1 3 4 1

  2. Another Example Hash Functions • key space = integers 0 • TableSize = 6 1. simple/fast to compute, 1 2. Avoid collisions 2 • h (K) = K mod 6 3. have keys distributed evenly among cells. 3 4 5 • Insert : 7, 18, 41, 34 Perfect Hash function: 5 6 Sample Hash Functions: Collision Resolution • key space = strings Collision : when two keys map to the same location in the hash table. • s = s 0 s 1 s 2 … s k-1 1. h(s) = s 0 mod TableSize Two ways to resolve collisions: ⎛ ∑ ⎞ − k 1 ⎜ ⎟ s 1. Separate Chaining 2. h(s) = mod TableSize i ⎝ ⎠ = i 0 2. Open Addressing (linear probing, ⎛ − ⎞ 1 k ∑ quadratic probing, double hashing) ⎜ ⋅ ⎟ 3. h(s) = mod TableSize i 37 s i ⎝ ⎠ = i 0 7 8 2

  3. Separate Chaining Analysis of find Insert : • Defn: The load factor, λ , of a hash table is 10 0 22 the ratio: ← no. of elements 1 N 107 M ← table size 2 12 42 3 For separate chaining, λ = average # of • Separate chaining : 4 elements in a bucket All keys that map to 5 • Unsuccessful find: the same hash value 6 are kept in a list (or 7 • Successful find: “bucket”). 8 9 9 10 tableSize: Why Prime? How big should the hash table be? • Suppose – data stored in hash table: 7160, 493, 60, 55, 321, • For Separate Chaining: 900, 810 Real-life data tends to have a pattern – tableSize = 10 data hashes to 0, 3, 0, 5, 1, 0, 0 Being a multiple of 11 is usually not the pattern ☺ – tableSize = 11 data hashes to 10, 9, 5, 0, 2, 9, 7 11 12 3

  4. Open Addressing Insert : Terminology Alert! 38 0 19 8 1 109 2 10 “ Open Hashing” “Closed Hashing” 3 equals equals • Linear Probing : 4 “Separate Chaining” “ Open Addressing” after checking spot Weiss 5 h(k), try spot 6 h(k)+1, if that is 7 full, try h(k)+2, 8 then h(k)+3, etc. 9 13 14 Linear Probing Linear Probing – Clustering f(i) = i no collision collision in small cluster no collision • Probe sequence: 0 th probe = h(k) mod TableSize 1 th probe = (h(k) + 1) mod TableSize 2 th probe = (h(k) + 2) mod TableSize collision in large cluster . . . i th probe = (h(k) + i) mod TableSize [R. Sedgewick] 15 16 4

  5. Quadratic Probing Less likely Load Factor in Linear Probing to encounter Primary f(i) = i 2 • For any λ < 1, linear probing will find an empty slot Clustering • Expected # of probes (for large table sizes) • Probe sequence: – successful search: ⎛ ⎞ 1 1 0 th probe = h(k) mod TableSize ⎜ + ⎟ ⎜ 1 ) ⎟ ( − λ ⎝ ⎠ 2 1 1 th probe = (h(k) + 1) mod TableSize 2 th probe = (h(k) + 4) mod TableSize ⎛ ⎞ – unsuccessful search: 1 1 ⎜ + ⎟ ⎜ 1 ) ⎟ 3 th probe = (h(k) + 9) mod TableSize ( − λ 2 2 ⎝ 1 ⎠ . . . i th probe = (h(k) + i 2 ) mod TableSize • Linear probing suffers from primary clustering • Performance quickly degrades for λ > 1/2 17 18 Quadratic Probing Quadratic Probing Example 0 insert(76) insert(40) insert(48) insert(5) insert(55) Insert: 76%7 = 6 40%7 = 5 48%7 = 6 5%7 = 5 55%7 = 6 89 1 0 18 2 49 insert(47) 1 But… 58 3 47%7 = 5 2 79 4 3 5 4 6 5 7 6 76 8 9 19 20 5

  6. Quadratic Probing: Success guarantee for λ < ½ Quadratic Probing: Properties • If size is prime and λ < ½, then quadratic probing will • For any λ < ½, quadratic probing will find an find an empty slot in size/2 probes or fewer. empty slot; for bigger λ , quadratic probing may – show for all 0 ≤ i,j ≤ size/2 and i ≠ j (h(x) + i 2 ) mod size ≠ (h(x) + j 2 ) mod size find a slot – by contradiction: suppose that for some i ≠ j: (h(x) + i 2 ) mod size = (h(x) + j 2 ) mod size • Quadratic probing does not suffer from primary ⇒ i 2 mod size = j 2 mod size ⇒ (i 2 - j 2 ) mod size = 0 clustering: keys hashing to the same area are ⇒ [(i + j)(i - j)] mod size = 0 not bad BUT size does not divide (i-j) or (i+j) • But what about keys that hash to the same spot ? – Secondary Clustering! 21 22 Double Hashing Quadratic Probing Works for λ < 1/2 f(i) = i * g(k) • If HSize is prime then where g is a second hash function (h(x) + i 2 ) mod HSize ≠ (h(x) + j 2 ) mod HSize for i ≠ j and 0 < i,j < HSize/2. • Probe sequence: • Proof 0 th probe = h(k) mod TableSize (h(x) + i 2 ) mod HSize = (h(x) + j 2 ) mod HSize 1 th probe = (h(k) + g(k)) mod TableSize (h(x) + i 2 ) - (h(x) + j 2 ) mod HSize = 0 2 th probe = (h(k) + 2*g(k)) mod TableSize (i 2 - j 2 ) mod HSize = 0 3 th probe = (h(k) + 3*g(k)) mod TableSize (i-j)(i+j) mod HSize = 0 ⇒⇐ HSize does not divide (i-j) or (i+j) . . . i th probe = (h(k) + i*g(k)) mod TableSize 23 24 6

  7. Resolving Collisions with Double Hashing Double Hashing Example Hash Functions: 0 H(K) = K mod M 1 h(k) = k mod 7 and g(k) = 5 – (k mod 5) H 2 (K) = 1 + ((K/M) mod (M-1)) 2 M = 76 93 40 47 10 55 3 4 Insert these values into the hash table 0 0 0 0 0 0 in this order. Resolve any collisions 1 1 1 47 1 47 1 47 1 5 with double hashing : 2 2 93 2 93 2 93 2 93 2 93 13 6 3 3 3 3 3 10 3 10 28 7 4 4 4 4 4 55 4 33 5 5 5 40 5 40 5 40 5 40 8 147 6 76 6 76 6 76 6 76 6 76 6 76 9 43 Probes 1 1 1 2 1 2 25 26 Rehashing Java hashCode() Method Idea : When the table gets too full, create a • Class Object defines a hashCode method bigger table (usually 2x as large) and hash – Intent: returns a suitable hashcode for the object all the items from the original table into the – Result is arbitrary int; must scale to fit a hash new table. table (e.g. obj.hashCode() % nBuckets) • When to rehash? – Used by collection classes like HashMap – half full ( λ = 0.5) • Classes should override with calculation – when an insertion fails appropriate for instances of the class – some other threshold – Calculation should involve semantically “significant” fields of objects • Cost of rehashing? 27 28 7

  8. hashCode() and equals() Hashing Summary • To work right, particularly with collection • Hashing is one of the most important data classes like HashMap, hashCode() and structures. equals() must obey this rule: • Hashing has many applications where if a.equals(b) then it must be true that operations are limited to find, insert, and delete. a.hashCode() == b.hashCode() • Dynamic hash tables have good amortized – Why? complexity. • Reverse is not required 29 30 8

Recommend


More recommend