Inf 2B: Hash Tables Lecture 4 of ADS thread Kyriakos Kalorkoti - PowerPoint PPT Presentation

1 / 22 Inf 2B: Hash Tables Lecture 4 of ADS thread Kyriakos Kalorkoti School of Informatics University of Edinburgh

2 / 22 Dictionaries A Dictionary stores key–element pairs, called items . Several elements might have the same key. Provides three methods: ◮ findElement ( k ) : If the dictionary contains an item with key k , then return its element; otherwise return the special element NO SUCH KEY. ◮ insertItem ( k , e ) : Insert an item with key k and element e . ◮ removeItem ( k ) : If the dictionary contains an item with key k , then delete it and return its element; otherwise return NO SUCH KEY.

3 / 22 List Dictionaries ◮ Items are stored in a singly linked list (in any order). ◮ Algorithms for all methods are straightforward. ◮ Running Time: insertItem : Θ( 1 ) findElement : Θ( n ) removeItem : Θ( n ) ( n always denotes the number of items stored in the dictionary)

4 / 22 Direct Addressing Suppose: ◮ Keys are integers in the range 0 , . . . , N − 1. ◮ All elements have distinct keys. A data structure realising Dictionary (sometimes called a direct address table ): ◮ Elements are stored in array B of length N . ◮ The element with key k is stored in B [ k ] . ◮ Running Time: Θ( 1 ) for all methods.

5 / 22 Bucket Arrays Suppose: ◮ Keys are integers in the range 0 , . . . , N − 1. ◮ Several elements might have the same key, so collisions may occur. What do we do about these collisions? Store them all together in a List pointed to by B [ k ] (sometimes called chaining ).

6 / 22 Bucket Arrays Bucket array implementation of Dictionary : ◮ Bucket array B of length N holding List s ◮ Element with key k is stored in the List B [ k ] . ◮ Methods of Dictionary are implemented using insertFirst () , first () , and remove ( p ) of List Running Time: Θ( 1 ) for all methods (with linked list implementation of List - p is always the first pointer, so we can easily keep track of it). ◮ Works because findElement ( k ) and removeItem ( k ) only need 1 item with key k . A good solution if N is not much larger than the number of keys (a small constant multiple).

7 / 22 Hash Tables Dictionary implementation for arbitrary keys (not necessarily all distinct). Two components: ◮ Hash function h mapping keys to integers in the range 0 , ..., N − 1 (for some suitable N ∈ N ). ◮ Bucket array B of length N to hold the items. Item (key–element pair) with key k is stored in the bucket B [ h ( k )] .

8 / 22 Issues for Hash Tables ◮ Need to consider collision handling. (Here we might have h ( k 1 ) = h ( k 2 ) even for k 1 � = k 2 , so List implementation is more complicated. ◮ Analyse the running time. ◮ Find good hash functions. ◮ Choose appropriate N .

9 / 22 Implementation Problem: Elements with distinct keys might go into the same bucket. Solution: Let buckets be list dictionaries storing the items (key-element pairs). The methods: Algorithm findElement ( k ) 1. Compute h ( k ) 2. return B [ h ( k )] . findElement ( k )

10 / 22 Implementation Algorithm InsertItem ( k , e ) 1. Compute h ( k ) 2. B [ h ( k )] . insertItem ( k , e ) Algorithm removeItem ( k ) 1. Compute h ( k ) 2. return B [ h ( k )] . removeItem ( k )

11 / 22 Implementation Running time? Depends on the list methods ◮ B [ h ( k )] . findElement ( k ) , ◮ B [ h ( k )] . insertItem ( k , e ) , and ◮ B [ h ( k )] . removeItem ( k ) . Assume we Insert at front (or end): ◮ Θ( 1 ) time for B [ h ( k )] . insertItem ( k , e ) .

12 / 22 Analysis ◮ Let T h be the running time required for computing h (more precisely: T h ( n key ) , where n key is the size of the key) ◮ Let m be the maximum size of a bucket. Then the running time of the hash table methods is: insertItem : T h + Θ( 1 ) findElement : T h + Θ( m ) removeItem : T h + Θ( m ) Worst case: m = n . ◮ m depends on hash function and on input distribution of keys.

13 / 22 Hash functions Hash function h maps keys to { 0 , . . . , N − 1 } . Criteria for a good hash function: (H1) h evenly distributes the keys over the range of buckets (hope input keys are well distributed originally) . (H2) h is easy to compute.

14 / 22 Hash functions ◮ Simpler if we start with keys that are already integers. ◮ Trickier if the original key is not Integer type (eg string ). One approach: Split hash function into: ◮ hash code and ◮ compression map. Arbitrary hash code compression Integers {0,...,N−1} Objects map

15 / 22 Hash Codes ◮ Keys (of any type) are just sequences of bits in memory. ◮ Basic idea: Convert bit representation of key to a binary integer, giving the hash code of the key. ◮ But computer integers have bounded length (say 32 bits). ◮ consider bit representation of key as sequence of 32-bit integers a 0 , . . . , a ℓ − 1 ◮ Summation method: Hash code is a 0 + · · · + a ℓ − 1 mod N ◮ Polynomial method: Hash code is a 0 + a 1 · x + a 2 · x 2 + · · · + a ℓ − 1 · x ℓ − 1 mod N (for some integer x ). Sometimes N = 2 32 .

16 / 22 Evaluating Polynomials Horner’s Rule : a 0 + a 1 · x + a 2 · x 2 + · · · + a ℓ − 1 · x ℓ − 1 = [Θ( ℓ 2 ) operations a 0 + a 1 · x + a 2 · x · x + · · · + a ℓ − 1 · x · x · · · x = a 0 + x ( a 1 + x ( a 2 + · · · + x ( a ℓ − 2 + x · + a ℓ − 1 ) · · · )) [Θ( ℓ ) operations ] Has been proved to be best possible. Note: Sensible to reduce mod N after each operation. Warning: Deciding what is a “good hash function” is something of a “black art”. Polynomials look good because it is harder to see regularities (many keys mapping to the same hash value). Warning: we haven’t proved anything! For some situations there are bad regularities, usually due to a bad choice of N .

17 / 22 Hash functions for character strings Characters are 7-bit numbers (0 , . . . , 127). ◮ x = 128 , N = 96. Bad for small words. (because gcd ( 96 , 128 ) = 32. NOT coprime) ◮ x = 128 , N = 97, good. ◮ x = 127 , N = 96, good.

18 / 22 Compression Map Integer k is mapped to | ak + b | mod N , where a , b are randomly chosen integers. Whole point of hashing is to “Compress” (evenly). Works particularly well if a , N are coprime ( experimental observation only ).

19 / 22 Quick quiz question Consider the hash function h ( k ) = 3 k mod 9 . Suppose we use h to hash exactly one item for every key k = 0 , . . . , 9 M − 1 (for some big M ) into a bucket array with 9 buckets B [ 0 ] , B [ 1 ] , . . . , B [ 8 ] . How many items end up in bucket B [ 5 ] ? 1. 0. 2. M . 3. 2 M . 4. 4 M . Answer is 0.

20 / 22 Load Factors and Re-hashing Number of items: n ◮ Length of bucket array: N n Load factor : N ◮ High load factor ( definitely ) causes many collisions (large buckets). Low load factor - waste of memory space. Good compromise: Load factor around 3 / 4. ◮ Choose N to be a prime number around ( 4 / 3 ) n . ◮ If load factor gets too high or too low, re-hash (amortised analysis similar to dynamic arrays ).

21 / 22 JVC and HashMap ◮ No duplicate keys. ◮ will hash many different types of key. ◮ User can specify - initial capacity (def. N=16), load factor (def. 3 / 4). ◮ Dynamic Hash table - “re-hash” takes place frequently behind scenes. ◮ Different hash functions for different key domains. For String , uses polynomial hash code with a = 31. ◮ Hashtable is more-or-less identical.

22 / 22 Reading and Resources ◮ If you have [GT]: The “Maps and Dictionaries” chapter. ◮ If you have [CLRS]: The “Hash tables” chapter. Nicest: “Algorithms in Java”, by Robert Sedgewick (3rd ed), chapter 14. ◮ Two nice exercises on Lecture Note 4 (handed out).

Inf 2B: Hash Tables Lecture 4 of ADS thread Kyriakos Kalorkoti - PowerPoint PPT Presentation

1 / 22 Inf 2B: Hash Tables Lecture 4 of ADS thread Kyriakos Kalorkoti School of Informatics University of Edinburgh 2 / 22 Dictionaries A Dictionary stores keyelement pairs, called items . Several elements might have the same key.

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

CS261 Data Structures Hash Tables Buckets/Chaining Hash Tables:

Hash tables Most data structures that were going to see are about storing and manipulating data

Hash Tables Outline Overview Implementation style for the Table ADT that is Definition

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #20: PARALLEL

Notes on Hashing Owen Jow Last updated May 05, 2018 This document is intended to give a

1 Starting point: for every hash function, there is a really bad input. A possible

CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins

Applying Hash-based Indexing in Text-based Information Retrieval Benno Stein and Martin Potthast

Collision Attacks on the Reduced Dual-Stream Hash Function RIPEMD-128 Florian Mendel 1 , Tomislav

Inf 2B: Hash Tables Lecture 4 of ADS thread Kyriakos Kalorkoti - PowerPoint PPT Presentation

1 / 22 Inf 2B: Hash Tables Lecture 4 of ADS thread Kyriakos Kalorkoti School of Informatics University of Edinburgh 2 / 22 Dictionaries A Dictionary stores keyelement pairs, called items . Several elements might have the same key.

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Topic 22 Hash Tables &quot; hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

CS261 Data Structures Hash Tables Buckets/Chaining Hash Tables:

Hash tables Most data structures that were going to see are about storing and manipulating data

Hash Tables Outline Overview Implementation style for the Table ADT that is Definition

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #20: PARALLEL

Notes on Hashing Owen Jow Last updated May 05, 2018 This document is intended to give a

1 Starting point: for every hash function, there is a really bad input. A possible

CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins

Applying Hash-based Indexing in Text-based Information Retrieval Benno Stein and Martin Potthast

Collision Attacks on the Reduced Dual-Stream Hash Function RIPEMD-128 Florian Mendel 1 , Tomislav

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used