Hash tables Most data structures that were going to see are about - PowerPoint PPT Presentation

Hash tables Most data structures that we’re going to see are about storing and manipulating data When only operations Insert , Search and Delete as dictionary operations are needed, hash tables can be quite good Many variations of hash tables (or rather the functions implementing them), from not-so-fast but simple to extremely fast but complicated Elements are pairs (key,data); keys distinct Intuition: you have some, say, “clever array”, and • Insert(elem) inserts elem somewhere into the array • Search(elem) knows where elem is stored and returns correspond- ing data • Delete(elem) also knows where elem is and removes it 1

Actual “positions” (somehow) depend on keys Important: we want to maintain dynamic set (insertions and dele- tions) Given universe U = { 0 , . . . , u − 1 } for some (typically large) u Keys are from U Simple approach : Array of size | U | , operations are straightforward, element with key k is stored in slot k But what if K , set of keys actually stored , is much much smaller than U ? Waste of memory (but time-efficient) 2

What we want is to reduce size of table Hashing: element with key k is stored in slot h ( k ); we use hash function h to compute slot We hope to be able to reduce size of table, say, m : h : U → { 0 , . . . , m − 1 } for some m << | U | We say element with key k hashes into slot h ( k ), and that h ( k ) is hash value of k But. . . two or more keys may hash to same slot ( collisions ) 3

Best idea: just avoid collisions; tailor hash function accordingly However: by assumption | U | > m , so there must be at least two keys with same hash value, thus complete avoidance impossible Thus: whatever h we choose, we still need some form of collision- resolution 4

Hashing with chaining The simplest of all collision-resolution protocols Does just what you’d expect: each slot really is a list . When elements collide, just insert new guy into the list (“the chain”). Suppose T is your hash table, h your hash function Chained-Hash-Insert(T,x) insert x at head of list T [ h (key[ x ])] Chained-Hash-Search(T,k) search for element with key k in list T [ h ( k )] Chained-Hash-Delete(T,x) delete x from list T [ h (key[ x ])] 5

What about running times? • Insert: clearly O (1) under assumption that element is not yet in table; otherwise first search • Search: proportional to length of list; more details to come • Delete: note: argument is x , not k , thus constant time access, then another O (1) if doubly-linked lists. If argument were key, then search necessary. If lists singly-linked, still essentially search necessay (need predecessor of x ) 6

Given hash table T with m slots that stores n elements Def load factor α = n/m (avg list size) Analysis in terms of α (not necessarily greater than one!) Clear: worst-case performance poor: if all n keys hash to same slot, then we might just as well have used just one list Average performance depends on how well hash fct h (that we still don’t know) distributes keys, on average 7

We’ll see more details, but for now (very strong) assumption: Any given element is equally likely to hash into any of the m slots, independently of where other elements hash to. Assumption is called simple uniform hashing Two intuitions come to mind: 1. input is some random sample, hash function is fixed 2. input is fixed, hash function is somehow randomised 8

For j ∈ { 0 , . . . , m − 1 } let n j = length( T [ j ]) Clearly, n 0 + n 1 + · · · + n m − 1 = n Also, average value of n j is E [ n j ] = α = n/m (recall: “equally likely. . . ”) Another assumption (not necessarly true): hash function h can be ecaluated in O (1) time Thus, time required to search for some element with key k depends linearly on length n h ( k ) of list T [ h ( k )] 9

We consider unsuccessful (no element in table has key k ), and successful searches. Theorem. Under simple uniform hashing, if using collision resulotion hashing with chaining then an unsuccessful search takes expected time Θ(1 + α ) with α = n/m . Proof. • any key k not already in table (recall: unsuccessful) is equally likely hashed to any of the m slots (read: they all look the same for us) • expected time to search unsuccessfully for k is expected time to search to end of T [ h ( k )] • T [ h ( k )] has expected length E [ n h ( k ) ] = α • thus expected # examined elements is α • add 1 for evaluation of h Recall: α could be very small, thus Θ(1 + α ) does make sense! 10

For successful searches not all lists equally likely to be searched Probability that list is searched is proportional to # elements it contains (under certain assumptions) We assume element being searched for is equally likely any of the n elements in table. Then we get Theorem. Under simple uniform hashing, if using collision resolution hashing with chaining then a successful search takes expected time Θ(1 + α ) with α = n/m . Proof. • # elements examined is 1 more than # elements before x in x ’s list • elements before x were inserted after x itself (new elements are plced at front) 11

Let x i be i -th element inserted into table, 1 ≤ i ≤ n let k i = key( x i ) For keys k i , k j , define Bernoulli r.v. X ij = 1 iff h ( k i ) = h ( k j ) Under simple uniform hashing, m � P ( X ij = 1) = P ( h ( k i ) = z ) · P ( h ( k j ) = z ) z =1 m (1 /m ) 2 = 1 /m � = z =1

Thus E [ X ij ] = P ( X ij = 1) = 1 /m , and     n n  1 � � E  1 + X ij   n i =1 j = i +1     n n n n 1  = 1 1 � � � � =  1 + E [ X ij ]  1 +  n n m i =1 i =1 j = i +1 j = i +1     n n n  1  1 1  + � � � = 1  n n m i =1 i =1 j = i +1 n n n 1 + 1 1 = 1 + 1 � � � = ( n − i ) nm nm i =1 i =1 j = i +1   n n � � 1 + 1  = 1 + 1 n 2 − n ( n + 1) � � = n − i  nm nm 2 i =1 i =1 1 + n m − n + 1 = 1 + α − n + 1 = 2 n α 2 m 1 − n + 1 � � = 1 + α = Θ(1 + α ) 2 n 12

Consequence: if m (# slots) is at least proportional to n (# elements), then n = O ( m ) and α = n/m = O (1), thus searching takes constant time on average! Insertion and Deletion also take (worst-case even) constant time (if doubly-linked lists are used), thus all operations take constant time on average! (However: we need assumption of single uniform hashing) 13

So far, haven’t seen a single hash function What make a good hash function? Satisfies (more or less) assumption of single uniform hashing: Each key is equally likely to hash to any of the m slots, independently of where other keys hash to However, typically impossible , certainly depending on how keys are chosen (think of evil adversary) Sometimes we know key distribution. Ex: keys are real random numbers in k ∈ [0 , 1), independently and uniformly chosen, then h ( k ) = ⌊ k · m ⌋ satisfies condition 14

Usual assumption: universe of keys is { 0 , 1 , 2 , . . . } , i.e., somehow interpret real keys as natural numbers (“usually” easy enough. . . ) Two very simple hash functions: 1. Division method : h ( k ) = k mod m Ex: hash table has size 25, key k = 234, then h ( k ) = 234 mod 25 = 9 Quite fast, but drawbacks Want to avoid certain values of m , e.g. pwrs of 2 If m = 2 p , then h ( k ) = k mod m = k mod 2 p , the p lowest- Why? order bits of k Ex: m = 2 5 = 32, k = 168, h ( k ) = 168 mod 32 = 8 = (1000) 2 , and k = 168 = (10101000) 2 Better to make hash depend on all bits of key Good idea (usually) for m : prime not too close to power of two 15

2. Multiplication mthd : h ( k ) = ⌊ m ( kA mod 1) ⌋ Uh, what’s that? • A is constant with 0 < A < 1 • Thus kA is real with 0 ≤ kA < k • kA mod 1 is fractional part of kA , i.e., kA − ⌊ kA ⌋ Ex: A = 0 . 23, k = 234, then kA = 53 . 82 and kA mod 1 = 0 . 82 IOW: kA mod 1 ∈ [0 , 1) • Therefore m ( kA mod 1) ∈ [0 , m ), and ⌊ m ( kA mod 1) ⌋ ∈ [0 , 1 , . . . , m − 1] Voila! Advantage: value of m not critical Typically power of two (no good with division method!), since then implementation easy (some comments in textbook) 16

Hash tables Most data structures that were going to see are about - PowerPoint PPT Presentation

Hash tables Most data structures that were going to see are about storing and manipulating data When only operations Insert , Search and Delete as dictionary op- erations are needed, hash tables can be quite good Many variations of hash tables

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

CS261 Data Structures Hash Tables Buckets/Chaining Hash Tables:

Hash Tables Bryce Boe 2013/08/20 CS24, Summer 2013 C

Randomness in Computing L ECTURE 16 Last time Hashing Universal hash families Today

Randomness in Computing L ECTURE 15 Last time Poisson approximation Application: max load

Monday Week 05 *op = '\0'; return out; // what is the precise effect? } 1/36 Dynamic Memory

ADTs, Arrays, and Linked-Lists EECS2030: Advanced Object Oriented Programming Fall 2017 C HEN -W

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Lecture 8: Dictionaries and Hash Tables Instructor: Saravanan Thirumuruganathan CSE 5311

IMPROVING THE ROOT INTERPRETER Javier Lpez-Gmez <jalopezg@inf.uc3m.es> A CS PhD. student

INF5110 Compiler Construction Symbol tables Spring 2016 1 / 43 Outline 1. Symbol tables

Hash tables Most data structures that were going to see are about - PowerPoint PPT Presentation

Hash tables Most data structures that were going to see are about storing and manipulating data When only operations Insert , Search and Delete as dictionary op- erations are needed, hash tables can be quite good Many variations of hash tables

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Topic 22 Hash Tables &quot; hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

CS261 Data Structures Hash Tables Buckets/Chaining Hash Tables:

Hash Tables Bryce Boe 2013/08/20 CS24, Summer 2013 C

Randomness in Computing L ECTURE 16 Last time Hashing Universal hash families Today

Randomness in Computing L ECTURE 15 Last time Poisson approximation Application: max load

Monday Week 05 *op = '\0'; return out; // what is the precise effect? } 1/36 Dynamic Memory

ADTs, Arrays, and Linked-Lists EECS2030: Advanced Object Oriented Programming Fall 2017 C HEN -W

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Lecture 8: Dictionaries and Hash Tables Instructor: Saravanan Thirumuruganathan CSE 5311

IMPROVING THE ROOT INTERPRETER Javier Lpez-Gmez &lt;jalopezg@inf.uc3m.es&gt; A CS PhD. student

INF5110 Compiler Construction Symbol tables Spring 2016 1 / 43 Outline 1. Symbol tables

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

IMPROVING THE ROOT INTERPRETER Javier Lpez-Gmez <jalopezg@inf.uc3m.es> A CS PhD. student