Advanced Programming Dictionaries, Hash Tables Dictionaries (Maps) Hash tables ADT Dictionary or Map Has following operations: n I NSERT : inserts a new element, associated to unique value of a field (key) n S EARCH : searches an element with a certain value of the key. If it esists, it returns it n D ELETE : cancels element with given key, if exists 2 1
Advanced Programming Dictionaries, Hash Tables Uses of dictionaries n Symbol table in a compiler n Key: nameof identifier n Values: types, context n Citizens in a country n Key: social security number n Values: name, surname, age, address 3 Associative array A dictionary would be easily implemented with an associative array (index of value = key instead of position) Ex: n Citizens = {{“jr50”, “john”, “red”}, {“bg40”, “bill”, “green”}, } n Citizens[“jr50”] = {“jr50”, “john”, “red”} 4 2
Advanced Programming Dictionaries, Hash Tables Goal Complexity of insert/search/delete: n O(1) average case n Θ (n) worst case 5 Hash tables Implementation of associative arrays An array containing elements. Address of element is computed by hash function, in time O(1). Ex: n Hash(“jr50”) = 117: element john red is in position 117 of vector 6 3
Advanced Programming Dictionaries, Hash Tables Associative array key U (all keys) T 0 value 1 • 0 • 7 2 2 3 • 4 3 • 9 • 6 4 • 1 5 5 6 • 2 • 3 • 5 7 • 8 8 8 9 K (used keys) 7 Dictionary implemented w associative array n T: associative array, key: key, x: value n Search(T, key) n Return T[key] n Insert(T, x) O(|U|) number of n T[key[x]] ← x different values of key n Delete(T, x) n T[key[x]] ← NIL n Complexity O(1), memory O(|U|) 8 4
Advanced Programming Dictionaries, Hash Tables Assumptions Two assumptions are needed: n No two elements with same key (keys are unique) n Size of T == size of max number of possible values of key, |U|. n This is critical, if |U| is large, array unfeasible n Ex: key = SSN, 10chars, |U| = 24 10 ≈ 10 13 n Assuming 24 values alphabet n But, the citizens of a country are in the order 10 7 - 10 9 n It is essential that size of array be O(|K|) and not O(|U|) 9 Hash tables n A kind of associative array with size O(|K|) and not O(|U|) n Insert/search/delete are O(1) on average n However, the way of computing index given key must be different: hash function 10 5
Advanced Programming Dictionaries, Hash Tables Hash function n Hash table is array with size m (m<<|U|) n Hash function h , from key to position in array (index) n h: U → { 0, 1, ..., m-1 } n Element x is stored in n T[h(key[x])] 11 Hash function T 0 1 U 2 h(k 1 ) 3 h(k 4 ) 4 • k 3 • k 4 h(k 2 )=h(k 5 ) 5 • k 2 6 • k 1 • k 5 7 8 h(k 3 ) m-1 12 6
Advanced Programming Dictionaries, Hash Tables Collision n Collision n when h(k i )=h(k j ) and k i ≠ k j , n Essential to: n Minimize number of collisions n Depend on hash function n Manage collisions 13 Example Key is a string of characters Hash function h(k) = Σ (c i ) mod m with n c i ASCII code of i-th char of string k n m number of elements (size) of array T 14 7
Advanced Programming Dictionaries, Hash Tables Ex (II) Collision with strings “paperino” and “paperoga” m = 15. n h(“pippo”) = (112+105+112+112+111)mod 15= 552 mod 15 = 12 n h(“pluto”) = (112+108+117+116+111)mod 15= 564 mod 15 = 9 n h(“paperino”) = (112+97+112+101+114+105+110+111)mod 15= 862 mod 15 = 7 n h(“topolino”) = (116+111+112+111+108+105+110+111)mod 15= 884 mod 15 = 14 n h(“paperoga”) = (112+97+112+101+114+111+103+97)mod 15= 847 mod 15 = 7 15 Ex (II) m = 15. n h("Mickey”) = (77 + 105 + 99 + 107 + 101 + 121) mod 15 = 10 n h("Minnie") = (77 + 105 + 110 + 110 + 105 + 101) mod 15 = 8 n h("Donald") = (68 + 111 + 110 + 97 + 108 + 100) mod 15 = 9 n h("Daisy") = (68 + 97 + 105 + 115 + 121) mod 15 = 11 n h("foo") = (102 + 111 + 111) mod 15 = 9 n h("bar") = (98 + 97 + 114) mod 15 = 9 Collision with strings “foo” and “bar” 16 8
Advanced Programming Dictionaries, Hash Tables Collisions mitigation The best hash functions are capable of distributing as uniformly (randomly) as possible the |K| elements among the m positions available Typical strategies: pick m as a prime number manipulate bits of k 17 Collision management n Chaining n Open Addressing 18 9
Advanced Programming Dictionaries, Hash Tables Chaining (I) Position i can contain more than one element This can be implemented through a linked list 19 Chaining (II) T 0 1 U 2 k 1 k 6 3 k 4 • k 6 4 • k 3 • k 4 5 k 2 k 5 • k 2 6 • k 1 • k 5 7 8 k 3 m-1 20 10
Advanced Programming Dictionaries, Hash Tables Chaining (III) n T[i] is a pointer to a list, initially NIL. n C HAINED -H ASH -I NSERT (T,x) n insert x at head of list T[h(key[x])] n C HAINED -H ASH -S EARCH (T,k) n Search element with key k in list T[h(k)] n C HAINED -H ASH -D ELETE (T,x) n Cancel x from list T[h(key[x])] 21 Chaining - Complexity n Assumption: unorderd list, single chaining n Insert: O(1) n Search: O(length of lists) n Cancel: O(length of lists) n Requires a search 22 11
Advanced Programming Dictionaries, Hash Tables Search (hash + chaining) - complexity n We have n n : number of elements in hash table T n m : size of hash table T n α =n/m: load factor for hash table T n Normally α >1 n What if m,n →∞ (with same α ) ? 23 Search (hash + chaining) – complexity (II) n Search n Worst case: a linked list, not ordered n Time to compute h(k) + n Time to transverse the list, Θ (n) n Best case: depends on how uniformly h(k) distributes the elements n Let’s assume h(k) is capable of simple uniform hashing (distributes in perfect uniform way) (this requires that the table grows with the elements, so that α remains constant) 24 12
Advanced Programming Dictionaries, Hash Tables Search (hash + chaining) – complexity (II) Search Time to compute h(k) = O(1). Time to trasverse the list, depends on length of list T[h(k)] depends on element found/not found In both cases complexity is Θ (1+ α ). summing up O(1) + Θ (1+ α ) = O(1) 25 Open Addressing T[i] can contain only one element In case of collision another free cell is searched for next one, after next, etc Must be α <1. 26 13
Advanced Programming Dictionaries, Hash Tables Hash-Insert H ASH -I NSERT ( T , k ) 1 i ← 0 2 repeat j ← h ( k , i ) 3 if T [ j ] = NIL 4 then T [ j ] ← k 5 return 6 else i ← i + 1 7 until i = m 8 error “hash table overflow” 27 Hash-Search H ASH -S EARCH ( T , k ) 1 i ← 0 2 repeat j ← h ( k , i ) 3 if T [ j ] = k 4 then return j 5 i ← i + 1 6 until T [ j ] = NIL or i = m 7 return NIL 28 14
Advanced Programming Dictionaries, Hash Tables Re-hash functions n Linear probing n h(k, i) = (h’(k)+i) mod m n Quadratic probing n h(k, i) = (h’(k)+ c 1 i + c 2 i 2 ) mod m n Double hashing n h(k, i) = (h 1 (k)+ i h 2 (k) ) mod m 29 Ex - insert n m = 10 n open addressing with linear probing. Hash values sequence: n h(A)=5, h(B)=4, h(C)=9, h(D)=4, h(E)=8, h(F)=8, h(G)=10 30 15
Advanced Programming Dictionaries, Hash Tables Ex - insert (II) A 5 B A 4 B A C 9 B A D C 4 B A D E C 8 B A D E C F 8 G B A D E C F 10 31 Ex - search (III) search: n D: (h(D)=4) n Read 4 n Read 5 n Read 6 ⇒ found n G: (h(G)=10) n Read 10 n Read 1 ⇒ found n M: (h(M)=4) n Read 4, n Read 5, n Read 6, n Read 7, ⇒ not found 32 16
Advanced Programming Dictionaries, Hash Tables Delete Very complex, because changes the rehash/ collision sequence In practice open hashing is used only if no delete 33 Complexity With uniform hashing and linear probing: n The number of probing trials is 1/(1– α ), and complexity is the same as for insert n Complexity of search is 1 1 1 ln + 1 α − α α 34 17
Advanced Programming Dictionaries, Hash Tables Hash functions 35 Uniform hashing Best hash functions do a uniform hashing: if keys have the same probability, also h(k) should have equal probability 1 P ( k ) , j 0 , 1 , … , m 1 ∑ = = − m k : h ( k ) j = 36 18
Advanced Programming Dictionaries, Hash Tables Keys are not uniform However, keys often are not equally distributed (ex words in a language, ex names and surnames) use all characters amplify the differences 37 Keys as numbers Usually keys are strings of characters Easiest thing is to treat them as integers n Ex: “abc” becomes ‘a’*256 2 + ‘b’*256 + ‘c’ However, with very long strings this is impractical, variants have to be used In the following the key is an integer 38 19
Advanced Programming Dictionaries, Hash Tables Hash function = mod m n k is an integer : n h(k) = k mod m n Requires m ≥ n/ α . n m size, n number of elements 39 Choice of m n Avoid n Powers of 2 n Division by m looses high bits of k n Powers of 10 n Same as above, if k is decimal number n Use n A prime number n Far from powers of 2 40 20
Recommend
More recommend