Hash Tables � Outline Overview � Implementation style for the Table ADT that is � Definition good in a wide range of situations is the hash � Hash functions table � Open hashing � efficient Insert, Delete, and Search operations � Closed hashing � difficult Sorted Traversal � efficient unsorted traversal � collision resolution techniques � Good approach as long as sorted output � Efficiency comparatively rare in the total set of hash table operations EECS 268 Programming II 1 EECS 268 Programming II 2 Definition Definitions � Hash table is defined by: � An Array of buckets B[0 ... m-1] holds all data managed by the hash table � set of records R = { r 1 , r 2 , ... , r n } stored by the table � set of input keys K = { k 1 , k 2 , ...., k n }, n >= 0 that can be � Open or External Hashing associated with records (k x , r y ) � bucket locations store pointers (references) to record � Array of buckets B[0 ... m-1]: each array element is pairs (k x , r y ) capable of holding one or more (k x , r y ) pairs � colliding records stored in a linked list � Hash Function H: K � {0, 1, ... , m-1} � Closed or Internal Hashing � for any given (k x , r y ), B[H(k x )] is the designated storage � buckets store actual objects location for (k x , r y ) � colliding records stored in other bucket locations � Collision resolution scheme � Note that the associated keys may be implicit � when (k x , r y ) and (k a , r b ) map to the same bucket under H, this scheme determines where the second record is stored rather than explicitly stored EECS 268 Programming II 3 EECS 268 Programming II 4
� ����������������������������������������������������� Hash Functions Hash Function � 2 � H(i) = i � Strings are a common search key in many cases � reduces the hash table to an array � convert string to an integer � Selecting digits � ������������������� � choose some subset of digits in a large number � Approaches � specific slice or positions � add characters or slices of characters together as n-bit � Folding unsigned numbers with the sum rolling over within x- � take digits or slices of a number and add them bits together with roll-over � bit shifting to form numbers possible � H(i) = i modulo m � where m is Hash Table size � x-bits chose for table size or x modulo m � several other options possible ���������������������� EECS 268 Programming II 5 EECS 268 Programming II 6 Open Hashing Open Hashing � 2 � Example: take a hash table size of 7 (prime) and a hash � Advantages of Open Hashing with chaining function h(x) = x mod 7 � simple in concept and implementation � insert 64, 26, 56, 72, 8, 36, 42 � insertion is always possible � If data set is large compared to hash table size, or the � Disadvantages of hashing with chaining hash function clusters data, then length of the list holding the bucket contents can be significant � unbalanced distribution decreases efficiency � sorted list will reduce the average failure time � O(n) for a linked list, O(log n) for a BST � can identify failure before the end of the list � greater memory overhead � use binary search tree instead of list � higher execution overhead of stepping through � why not a BST for the whole data set? pointers � use second Hash table EECS 268 Programming II 7 EECS 268 Programming II 8
� ������������������������������������������������� Closed Hashing Closed Hashing � Collision Resolution � Create a sequence of collision resolution � Closed hashing with Open addressing functions � storing all data items within single hash table, but � h 0 (x) is base hash function ��������������������������������������������������� � h 1 (x) used to find first alternate storage location after � Hash table of size m can hold at most m items a collision � h 2 (x) used to find the next alternate if first alternate is occupied items to m different table elements � Each h i (x) must be guaranteed to choose different table locations � collisions will generally occur before table is full � Hash function series should ideally check all table � Collision resolution is thus crucial to efficient use locations of closed hash tables EECS 268 Programming II 9 EECS 268 Programming II 10 Collision Resolution � Quadratic Collision Resolution � Linear Probing Probing � Spread probed locations across the table � Search hash table sequentially starting from the original location specified by the hash � � � � � � � � � � � ������ ��� � � function � Example: Insert 64, 26, 56, 72, 8, 36, 42 � Series of probed locations is not guaranteed to � � � � � � � � � � ������ ��� � � cover the whole table without duplication � Insert 64, 26, 56, 72, 8, 36, 42 in an empty � Closed hashing schemes can fail even though the table of size 7 � table is not full � Fragile � causes primary clusters by occupying � and secondary clusters may form adjacent table locations � if the probing scheme will not visit all table locations ���������������������������������������� � similar to long chains in open hashing EECS 268 Programming II 11 EECS 268 Programming II 12
� �������������������������������������������� Collision Resolution � Collision Resolution � Double Hashing Linear Probing with Fixed Increment � � � � � � � � � �� � ��� ������ ��� � � � Use a second hash function (h'(x)) to generate � FI is relatively prime to m the probe sequence used after a collision � linear probing will visit all table locations without � � � � � � � � � �������� ������ ��� � � repeats � ������������� (x mod R), where R < m is prime � X is relatively prime to Y iff GCD(X,Y) = 1 � Example: m=7, R=5, insert 64,26,56,72,8,36,42 EECS 268 Programming II 13 EECS 268 Programming II 14 Closed Hashing -- Deletions Closed Hashing -- Deletions � During a probing sequence, � Example: Insert 64, 56, 72, 8 using linear probling � if an AE bucket is found, searching can stop � delete 64; delete 8 � if an ED bucket is found, searching must continue � Closed Hashing is thus subject to a form of creates a problem because the empty cell could ��������� be there for two reasons � as cells are deleted, probing sequences generally � no further elements exist along this probing sequence lengthen as the probability of encountering ED cells increases � deletion of an item along the sequence took place � failed searches get more expensive because they � Two types of empty buckets cannot terminate until � bucket has always been empty (AE) (flag 0) � an AE cell is found � bucket emptied by deletion (ED) (flag 1) � all cells of the table can be visited EECS 268 Programming II 15 EECS 268 Programming II 16
Closed Hashing The Efficiency of Hashing � Advantages of Closed Hashing with Open Addressing � An analysis of the average-case efficiency � lower execution overhead as addresses are calculated rather than read from pointers in memory � Load factor � � lower memory overhead as pointers are not stored � ratio of the current number of items in the table to the � Disadvantages maximum size of the array table � more complex than chaining � measures how full a hash table is � can degenerate into linear search due to primary or secondary clustering � should not exceed 2/3 � Delete and Find operations are more complex � Hashing efficiency for a particular search also � Insert is not always possible even though the table is not full depends on whether the search is successful � Delete can increase probe sequence length by making search termination conditions ambiguous � unsuccessful searches generally require more time than successful searches EECS 268 Programming II 17 EECS 268 Programming II 18 The Efficiency of Hashing Summary � Hash Tables are useful and efficient data structures in a wide range of applications � Open hashing with chaining is simple, easy to implement, and usually efficient � length of the chains is key to performance � Closed hashing with various approaches to generating a probe sequence can also be efficient � lower space and computation overhead � more complex implementation � performance is sensitive to probe sequence � Monitoring load factor and other hash-table behavior parameters is important in maintaining performance EECS 268 Programming II 19 EECS 268 Programming II 20
Recommend
More recommend