CS 270 Algorithms Week 9 Oliver Kullmann Generalising arrays Hash tables Direct addressing Hashing in general Generalising arrays 1 Hashing through chaining Direct addressing 2 Hash functions Hashing in general 3 Tutorial Hashing through chaining 4 Hash functions 5 Tutorial 6
CS 270 General remarks Algorithms Oliver Kullmann Generalising arrays Direct addressing Hashing in general We continue data structures by discussing hash tables. Hashing through chaining Reading from CLRS for week 7 Hash 1 Chapter 11, Sections 11.1, 11.2, 11.3. functions Tutorial
CS 270 Recall: Dictionaries Algorithms Oliver Kullmann Recall the three operations for a dictionary : Generalising arrays Direct 1 INSERT(x) (input pointer x to element to be inserted) addressing 2 SEARCH(k) (input key k , returns a pointer) Hashing in general 3 DELETE(x) (input pointer x to element to be deleted) Hashing through chaining Via binary search trees (last week) we get such a dictionary: Hash functions We actually get a full-fledged implementation of dynamic Tutorial sets (including the four order-related operations). Hashing is a technique specialised for dictionaries (not supporting the four order-related operations). It usually is faster for dictionaries (only).
CS 270 Recall: Dictionaries Algorithms Oliver Kullmann Recall the three operations for a dictionary : Generalising arrays Direct 1 INSERT(x) (input pointer x to element to be inserted) addressing 2 SEARCH(k) (input key k , returns a pointer) Hashing in general 3 DELETE(x) (input pointer x to element to be deleted) Hashing through chaining Via binary search trees (last week) we get such a dictionary: Hash functions We actually get a full-fledged implementation of dynamic Tutorial sets (including the four order-related operations). Hashing is a technique specialised for dictionaries (not supporting the four order-related operations). It usually is faster for dictionaries (only).
CS 270 Applications of dictionaries Algorithms Oliver Kullmann A standard application is for example in a compiler: Generalising arrays Direct 1 We have many different “identifiers”, for variables, addressing functions and classes for example. Hashing in general 2 For such an identifier, for example the class-name Hashing BreadthFirst , a lot of information needs to be stored. through chaining 3 The dictionary now translates the character sequence Hash functions “BreadthFirst” into a pointer to the data associated with Tutorial this class. But dictionaries are everywhere — it’s always there when you have to associate data to some “keys”! Can you think of some examples?
CS 270 The fastest implementation: keys as array indices Algorithms Oliver Kullmann With binary search trees we an achieve worst-case logarithmic Generalising arrays time for the three dictionary operations. Direct addressing Hashing in We now want constant time for the three operations — general on average, and if we provide enough space. Hashing through chaining The basic idea for this is to use arrays: Hash functions Tutorial If we can use the keys as array indices, we are done (mostly). Hashing is the process of handling arbitrary key-spaces K as if they were array indices.
CS 270 The fastest implementation: keys as array indices Algorithms Oliver Kullmann With binary search trees we an achieve worst-case logarithmic Generalising arrays time for the three dictionary operations. Direct addressing Hashing in We now want constant time for the three operations — general on average, and if we provide enough space. Hashing through chaining The basic idea for this is to use arrays: Hash functions Tutorial If we can use the keys as array indices, we are done (mostly). Hashing is the process of handling arbitrary key-spaces K as if they were array indices.
CS 270 The fastest implementation: keys as array indices Algorithms Oliver Kullmann With binary search trees we an achieve worst-case logarithmic Generalising arrays time for the three dictionary operations. Direct addressing Hashing in We now want constant time for the three operations — general on average, and if we provide enough space. Hashing through chaining The basic idea for this is to use arrays: Hash functions Tutorial If we can use the keys as array indices, we are done (mostly). Hashing is the process of handling arbitrary key-spaces K as if they were array indices.
CS 270 The basic idea of hashing Algorithms Oliver Kullmann We consider the key-space K (an arbitrary set), and we can use Generalising arrays an array of length m . The basic idea is to use a hash function Direct addressing h : K → { 0 , . . . , m − 1 } . Hashing in general Hashing which translates key-values into indices. through chaining Hash The simplest case is when h is injective , i.e., maps different functions keys to different indices. Tutorial Injective hash functions are called perfect . For that to be possible we need | K | ≤ m , i.e., there are at most m different keys. Otherwise we have to handle collisions .
CS 270 The simplest case of hashing Algorithms Oliver Kullmann Generalising arrays The simplest case of hashing is when for h we can use the Direct addressing identity, that is, Hashing in general the keys are natural numbers ≥ 0, i.e., K ⊂ { 0 , 1 , 2 , . . . } ; Hashing through within a feasible range, i.e., m = max( K ) + 1 is not too chaining large (note that in general max( K ), i.e., the maximum of Hash functions possible indices, is much larger than | K | ). Tutorial The array is called a direct-address (hash) table ; the book uses the letter T (for “table”).
CS 270 Key or not Algorithms Oliver Kullmann Generalising arrays Direct addressing A basic problem is how to show that an element is not there: Hashing in general 1 Conceptually simplest is to use pointers, where then the Hashing through chaining NIL -pointer shows that the element is not there. Hash 2 Alternatively we can use a special “singular” key-value. functions 3 Or for example an additional boolean array. Tutorial
CS 270 Using pointers Algorithms Oliver Kullmann Generalising arrays Direct addressing Hashing in general Hashing through chaining Hash functions Tutorial How to implement a dynamic set by a direct-address table T : The keys of the key-space K = { 0 , 1 , . . . , 9 } = U are used as indices in the table. The empty slots in the table contain NIL .
CS 270 The basics of the implementation Algorithms Oliver Kullmann Generalising arrays Direct Search(T,k) addressing 1 return T[k] Hashing in general Hashing through Insert(T,x) chaining Hash 1 T[x.key] = x functions Tutorial Delete(T,x) 1 T[x.key] = NIL
CS 270 Examples where some simple translation is needed Algorithms Oliver Kullmann If K is small, then we typically can find a nice injective (i.e., Generalising perfect) hash function h , for example: arrays Direct The keys are integers in a known range — just move them. addressing Hashing in The keys are (arbitrary) images with 20 pixels — use binary general encoding. Hashing through chaining Do you know other examples? Hash functions In principle we can always use an injective hash function: Tutorial If we have enough memory, then this is very fast. However in practice this is often not feasible, for example for strings, and so we need to handle “collisions”, that is, cases where the hash function yields the same index for different keys.
CS 270 Examples where some simple translation is needed Algorithms Oliver Kullmann If K is small, then we typically can find a nice injective (i.e., Generalising perfect) hash function h , for example: arrays Direct The keys are integers in a known range — just move them. addressing Hashing in The keys are (arbitrary) images with 20 pixels — use binary general encoding. Hashing through chaining Do you know other examples? Hash functions In principle we can always use an injective hash function: Tutorial If we have enough memory, then this is very fast. However in practice this is often not feasible, for example for strings, and so we need to handle “collisions”, that is, cases where the hash function yields the same index for different keys.
CS 270 General hash tables Algorithms Oliver Kullmann Generalising arrays Direct As said already, the general idea of “hashing” is to use a “hash addressing function” Hashing in general h : K → { 0 , . . . , m − 1 } Hashing through (here using “ K ” instead of “ U ” as in the book). chaining Hash functions m is the size of the hash table. Tutorial An element with key k hashes to slot h ( k ). h ( k ) is the hash value of key k .
CS 270 General hash tables (cont.) Algorithms Oliver Kullmann Generalising arrays Direct addressing Hashing in general Hashing through chaining Hash functions Tutorial We see that we have a collision for keys k 2 and k 5 .
CS 270 How to handle collisions? Algorithms Oliver Kullmann Generalising arrays Direct addressing In general, | K | is much bigger than m : Hashing in general The hash function should be as “random” as possible. Hashing through That is, it should hash “unpredictably”. chaining That is, it should be independent of our choices of keys. Hash functions That is, we should get as few collisions as possible. Tutorial However we have to handle collisions nevertheless!
Recommend
More recommend