Something very different https://nextstrain.org/narratives/ncov/sit-rep/2020-03-04 , http://data-science-sequencing.github.io/Win2018/lectures/lecture7/ , http://virological.org/t/ 1 response-to-on-the-origin-and-continuing-evolution-of-sars-cov-2/418
Back to hashing (( ax + b ) mod p ) mod m Warmup: Find the largest set of keys that collide hash ( x ) = ( 3 x + 2 ) mod9 hash ( x ) = ( 3 x + 2 ) mod11 Which is a better hash function? 2
Hashing with chaining Store multiple key in each array slot 0 How? 1 AT GA • We will consider linked lists • Any dictionary ADT could be 2 used provided ... 3 Result (using linked list) 4 CT • We can hash more than m things 5 into an array of size m 6 AA TA • Worst case runtime depends on length of largest chain • Memory is allocated on each insert 3
Acces time for chaining Load factor: # items hashed # size of array = n = α m Assuming a uniform hash function i.e. probability of hashing to any slot is equal Search cost: • Unsuccessful search examines items • Successful search examines 1 + n − 1 2 m = 1 + α 2 − α 2 n items For good performance we want a small load factor 4
Open adressing Each array element contains one item. The hash 0 TA function specifies a sequence of elements to try. Insert: If first slot is occupied check next location in 1 AT hash function sequence. Find: If slot does not match keep trying the next slot in 2 GA sequence until either the item is found or an empty slot is visited (item not found). 3 Remove: Find and replace item with a tombstone . 4 CT Result: • Cannot hash more than m items by pigeonhole 5 principle • Hash table memory allocated once 6 AA • Performance will depend on how many times we check slots 5
Linear probing Try ( h ( k ) + i ) mod m for i = 0 , 1 , 2 ,... m − 1 0 1 2 3 4 5 6 For this example h ( k ) = k mod7 and m = 7 6
Double hashing Try ( h ( k ) + i · h 2 ( k )) mod m for i = 0 , 1 , 2 ,... m − 1 0 1 2 3 4 5 6 For this example h ( k ) = k mod7, h 2 ( k ) = 5 − k mod 5 and m = 7 7
Rehashing Sometimes we need to resize the hash table • For open addressing this will have to happen when we fill the table • For separate chaining we want to do this when the load factor gets big To resize we: • Resize the hash table • Θ ( 1 ) amortized time if doubling • Get a new hash function Result: • Spread the keys out • Remove tombstones (open addressing) • Allows arbitrarily large tables 8
Hashing summary What collision resolution strategy is best? What is the best implementation of a dictionary ADT? Why did we talk about trees? More in depth info: http://jeffe.cs.illinois.edu/teaching/ algorithms/notes/05-hashing.pdf 9
Something new What is interesting about this tree? 2 5 6 9 8 7 14 29 21 42 15 33 10
Recommend
More recommend