Motivation Introduction to Algorithms Introduction to Algorithms � Arrays provide an indirect way to access a set . y p y � Many times we need an association between two sets, or a set of keys and associated data. Hash Tables Hash Tables � Ideally we would like to access this data directly � Ideally we would like to access this data directly with the keys. CSE 680 � We would like a data structure that supports fast search, insertion, and deletion. h i ti d d l ti Prof. Roger Crawfis � Do not usually care about sorting. � The abstract data type is usually called a The abstract data type is usually called a Dictionary or Partial Map � float googleStockPrice = stocks[“Goog”].CurrentPrice; Dictionaries Direct Addressing � What is the best way to implement this? y p � Let’s look at an easy case, suppose: Let s look at an easy case, suppose: � Linked Lists? � The range of keys is 0.. m -1 � Double Linked Lists? � Keys are distinct � Queues? � Queues? � Possible solution � Stacks? � Multiple indexed arrays (e.g., data[key[i]])? � Set up an array T[0..m-1] in which � To answer this, ask what the complexity of the T thi k h t th l it f th if x ∈ T and key[ x ] = i � T[ i ] = x operations are: � T[ i ] = NULL otherwise � Insertion � This is called a direct-address table � This is called a direct address table � Deletion � Operations take O(1) time! � Search � So what’s the problem?
Direct Addressing Hash Table � Hash Tables provide O (1) support for all � Hash Tables provide O (1) support for all � Direct addressing works well when the � Direct addressing works well when the of these operations! range m of keys is relatively small � The key is rather than index an array � The key is rather than index an array � But what if the keys are 32-bit integers? � But what if the keys are 32-bit integers? directly, index it through some function, � Problem 1: direct-address table will have h ( x ), called a hash function . ( ) 2 32 entries, more than 4 billion , � myArray[ h (index) ] � Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be � Key questions: y q � Solution: map keys to smaller range 0.. p -1 � What is the set that the x comes from? � What is h() and what is its range? g � Desire p = O ( m ). () Hash Table Hash Functions � In general a difficult problem. Try something simpler. g p y g p � Consider this problem: � Consider this problem: � If I know a prior the m keys from some finite 0 U set U , is it possible to develop a function set U is it possible to develop a function (universe of keys) (universe of keys) h(x) that will uniquely map the m keys onto h(k 1 ) k 1 the set of numbers 0.. m -1? h(k 4 ) k k 4 K k 5 (actual h(k 2 ) h(k 2 ) = h(k 5 ) keys) k 2 h(k 3 ) k 3 m - 1
Hash Functions Hash Functions � A collision occurs when h(x) maps two keys to the � A hash function, h, maps keys of a given type to , p y g yp , same location. l ti integers in a fixed interval [0, N − 1] 0 � Example: U h ( x ) = x mod N (universe of keys) (universe of keys) h ( ) d N h(k 1 ) is a hash function for integer keys collision k 1 h(k 4 ) � The integer h ( x ) is called the hash value of x. The integer h ( x ) is called the hash value of x. k k 4 K k 5 � A hash table for a given key type consists of (actual h(k 2 ) = h(k 5 ) h(k 2 ) keys) � Hash function h � Array (called table) of size N k 2 h(k 3 ) k 3 � The goal is to store item ( k , o ) at index i = h ( k ) p - 1 Example p Example p � Our hash table uses an � We design a hash table g 0 0 ∅ ∅ ∅ ∅ array of size N = 100. f i N 100 storing employees 1 1 � We have n = 49 025-612-0001 025-612-0001 records using their 2 2 employees. 981-101-0002 981-101-0002 social security number, social security number � Need a method to handle 3 3 ∅ ∅ SSN as the key. collisions . 4 4 As long as the chance 451-229-0004 451-229-0004 � SSN is a nine-digit � for collision is low, we f lli i i l positive integer iti i t … … can achieve this goal. � Our hash table uses an Setting N = 1000 and g array of size N = 10,000 y , 9997 � 9997 ∅ ∅ ∅ ∅ looking at the last four and the hash function 9998 9998 200-751-9998 digits will reduce the 200-751-9998 h ( x ) = last four digits of x 9999 9999 176-354-9998 ∅ chance of collision. ∅
Collisions Chaining � Can collisions be avoided? � Can collisions be avoided? � Chaining puts elements that hash to the � Chaining puts elements that hash to the same slot in a linked list: � In general, no. See perfect hashing for the case were the set of keys is static (not covered). —— U U � Two primary techniques for resolving k 1 k 4 (universe of keys) —— —— collisions: collisions: k 1 1 —— k 4 � Chaining – keep a collection at each key k 5 K —— (actual slot. k 7 k 5 k 2 k 7 —— keys) y ) —— � Open addressing – if the current slot is full k 3 k 2 k 3 k 8 —— k 6 use the next open one. p k 8 k 6 —— —— Chaining Chaining � How do we delete an element? � How do we insert an element? � How do we insert an element? � Do we need a doubly-linked list for efficient delete? —— —— U U U U k 1 k 1 k 4 k 4 (universe of keys) —— (universe of keys) —— —— —— k 1 k 1 1 1 —— —— k 4 k 4 k 5 k 5 K K —— —— (actual (actual k 5 k 2 k 7 k 5 k 2 k 7 k 7 k 7 —— —— keys) y ) keys) y ) —— —— k 3 k 3 k 2 k 2 k 3 k 3 k 8 —— k 8 —— k 6 k 6 k 8 k 6 k 8 k 6 —— —— —— ——
Chaining Open Addressing p g � Basic idea: � Basic idea: � How do we search for a element with a � How do we search for a element with a � To insert: if slot is full, try another slot, …, until given key? T an open slot is found ( probing ) p ( p g ) —— —— U � To search, follow same sequence of probes as k 1 k 4 (universe of keys) —— would be used when inserting the element —— k 1 k � If reach element with correct key, return it —— k 4 k 5 � If reach a NULL pointer, element is not in table K —— (actual k 7 k 7 k 5 k 5 k 2 k 2 k 7 k 7 —— � Good for fixed sets (adding but no deletion) G d f fi d t ( ddi b t d l ti ) keys) —— � Example: spell checking k 3 k 2 k 3 k 8 —— k 6 6 k 8 k 6 —— —— Open Addressing p g Probing � The colliding item is placed in a The colliding item is placed in a � They key question is what should the � They key question is what should the different cell of the table. next cell to try be? � No dynamic memory. � Random would be great, but we need to � Random would be great but we need to � Fixed Table size. be able to repeat it. � Load factor: n/N , where n is the number of items to store and N the size of the hash f it t t d N th i f th h h � Three common techniques: Th t h i table. � Linear Probing (useful for discussion only) � Cleary, n ≤ N, or n/ N ≤ 1. � Cleary n ≤ N or n/ N ≤ 1 � Quadratic Probing � To get a reasonable performance, n/ N<0.5. � Double Hashing
Recommend
More recommend