Hashing Anil Maheshwari Setting Balls & Bins Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs anil@scs.carleton.ca School of Computer Science Carleton University Canada
Outline Hashing Anil Maheshwari Setting Balls & Bins Setting 1 Connections 2-Universal Hash Function Perfect Hashing Balls & Bins Connections 2 Proofs 2-Universal Hash Function 3 Perfect Hashing 4 Proofs 5
Setting Hashing Anil Maheshwari Setting Input Balls & Bins Connections U = Universe of size u 2-Universal Hash S = A subset of U consisting of m elements Function Perfect Hashing Objective Proofs Construct a hash map (a data structure) h : U → [ n ] , where n = O ( | S | ) = O ( m ) . ∀ S ⊆ U of size m , the number of memory access required for lookup is O (1) per element.
Possible Approaches Hashing Anil Maheshwari Setting Use a binary search tree to store elements of S . 1 Balls & Bins Maintain membership bit for each element in U to 2 Connections indicate its membership in S . 2-Universal Hash Function . . . 3 Perfect Hashing Proofs
Collisions Hashing Anil Maheshwari # Hash functions of type h : U → [ n ] are n | U | = n u Setting Balls & Bins Possible Strategy: Connections Pick a random function h among n u such functions. 2-Universal Hash 1 Function Initialize an array A (Hash Table) of size n . Perfect Hashing 2 Each element of A also stores a link list. Proofs Insert( x ): Set A [ h ( x )] = 1 and append x in the link 3 list stored at A [ h ( x )] . Locate( x ): if A [ h ( x )] = 0 report x �∈ S , else check if x is stored in the link list at A [ h ( x )] . Space = O ( n + u log n ) log n Time = O ( log log n ) /element (w.h.p.)
2 -Universal Family of Hash Functions Hashing Anil Maheshwari A random hash function h : U → [ n ] requires u log n bits. Setting Balls & Bins Required Property: ∀ x, y ∈ U ( x � = y ) and i, j ∈ [ n ] , Connections 1 Pr ( h ( x ) = i ∧ h ( y ) = j ) = Pr ( h ( x ) = i ) Pr ( h ( y ) = j ) = 2-Universal Hash n 2 Function Any family of hash-functions that satisfy the property is Perfect Hashing called a 2-Universal Family. Proofs Can we construct a 2-Universal Family that requires less space?
2-Universal Families Hashing Anil Maheshwari Setting Let X 1 , X 2 be uniform r.v. on { 0 , 1 , . . . , p − 1 } , where 1 Balls & Bins p is prime. Define Y i = X 1 + iX 2 (mod p ) . Connections Claim: { Y 0 , Y 1 , . . . , Y p − 1 } are pairwise independent, 2-Universal Hash Function i.e. Pr ( Y i = a ∧ Y j = b ) = Pr ( Y i = a ) Pr ( Y j = b ) = 1 p 2 Perfect Hashing Space Used: O (log p ) Proofs Let X = { x 1 , . . . , x k } be a set of k random bits. 2 Consider 2 k − 1 (non-empty) subsets of X . For each subset S ⊆ X , generate a bit Y S = � x (mod 2) . x ∈ S Claim: Y bits are pairwise independent. Space Used: O ( k )
2-Universal Families Contd. Hashing Anil Maheshwari Let U = { 0 , 1 } log u and Index set I = { 0 , 1 } log n . Setting 3 Balls & Bins Hash function family is the set of random Boolean Connections matrices H of dimension log n × log u . For example, 2-Universal Hash for U = { 0 , 1 } 6 and I = { 0 , 1 } 4 (i.e., n = 2 4 ): Function Perfect Hashing Proofs 1 1 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 = (mod 2) 0 1 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 The matrix maps 101100 ∈ U to index (1110) 2 = 13 . Claim: Pr ( Hx = Hy ) = 1 n for any x � = y ∈ U . Space Used: O (log u × log n )
2-level Hash Table Hashing Anil Maheshwari Setting Input Balls & Bins Connections U = Universe of size u 2-Universal Hash S = A subset of U consisting of m elements Function Perfect Hashing Proofs 1st Level: Apply a random hash function from a 2-Universal Hash Family to map elements of S to Hash Table of size n = O ( m ) . 2nd Level: If s i elements are mapped to an index i in a Hash Table, create a secondary Hash Table for these elements of size s 2 i using another random hash function.
2-level Hash Table Contd. Hashing Anil Maheshwari � m E[# of Collisions in 1st Level] = 1 � = O ( m ) Setting n 2 Balls & Bins E[# of Collisions when s i elements mapped to a table of Connections = s 2 i − s i � s i i ] = 1 < 1 2-Universal Hash size s 2 � Function s 2 2 2 s 2 2 i i � n Perfect Hashing � s 2 Claim: E � = O ( m ) Proofs i i =1 � n � n � ��� � � s i � � s 2 E = E s i + 2 i 2 i =1 i =1 � n �� � s i � = m + 2 E 2 i =1 = m + 2 E [ # of collisions in 1st Level ] = O ( m )
2-level Hash Table Contd. Hashing Anil Maheshwari Expected Lookup Time: Setting Balls & Bins E[Time for 1st Level + Time for 2nd Level] Connections = 1 + O (1) = O (1) 2-Universal Hash Function Perfect Hashing Expected Space Used: Proofs E[Hash functions + 1st Level + 2nd Level] n s 2 = ( n + 1) + m + � i = O ( m ) i =1 Suppose E[Space Used] ≤ 6 m . By Markov’s inequality, Pr(Actual Space Used > 12 m ) ≤ 6 m 12 m = 1 2
References Hashing Anil Maheshwari Setting Probability and Computing (Chapter 13) by 1 Balls & Bins Mitzenmacher and Upfal, Cambridge Univ. Press Connections 2005. 2-Universal Hash Function Introduction to Algorithms (Chapter 11), Cormen, 2 Perfect Hashing Leiserson, Rivest and Stein, MIT Press 2009. Proofs
Missing Details Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Example I: 2 -Universal Family Hashing Anil Maheshwari Let X 1 , X 2 be uniform r.v. on { 0 , 1 , . . . , p − 1 } , p is prime. Setting Define: Y i = X 1 + iX 2 (mod p ) . Balls & Bins Connections Claim: { Y 0 , Y 1 , . . . , Y p − 1 } are pairwise independent r.v. 2-Universal Hash To Show: Function Pr ( Y i = a ∧ Y j = b ) = Pr ( Y i = a ) Pr ( Y j = b ) = 1 Perfect Hashing p 2 . Proofs Pr ( Y i = a ) = 1 p : For a fixed X 2 , Y i (mod p ) is equally 1 likely to take any of the values { 0 , . . . , p − 1 } as X 1 varies from { 0 , . . . , p − 1 } . Given Y i = a = X 1 + iX 2 and Y j = b = X 1 + jX 2 . 2 ⇒ X 2 = ( a − b )( i − j ) − 1 , X 1 = a − i ( a − b )( i − j ) − 1 . = The inverse always exists in this setting. Pair ( X 1 , X 2 ) can take p 2 possible values, but for Y i = a and Y j = b there is a fixed choice. Thus, Pr ( Y i = a ∧ Y j = b ) = Pr ( Y i = a ) Pr ( Y j = b ) = 1 p 2 Storage Requirement: Need to store p, X 1 , X 2 .
Example II: 2 -Universal Family Hashing Anil Maheshwari Let X = { x 1 , . . . , x k } be a set of k random bits. Setting Consider 2 k − 1 subsets of X (excluding the empty set). Balls & Bins Connections For each subset s ⊆ X , generate a bit 2-Universal Hash y s = � x (mod 2) , i.e. the sum of the bits in s modulo 2 . Function x ∈ s Perfect Hashing Claim All the y -bits corresponding to 2 k − 1 subsets of X Proofs are pairwise independent. Consider any two bits y s and y s ′ , where s � = s ′ . Pr ( y s = 0) = Pr ( y s = 1) = 1 2 as even if we fix all but one of the random bits of set s , the value of y s depends on that bit. Since s � = s ′ : Either s ∩ s ′ = ∅ or s ∩ s ′ � = ∅ If s ∩ s ′ = ∅ , y s and y s ′ are mutually independent.
Example II: 2 -Universal Family contd. Hashing Anil Maheshwari Consider s ∩ s ′ � = ∅ and w.l.o.g. assume ∃ x i ∈ s − s ′ . Setting Balls & Bins Since bit x i is random, Pr ( Y s = α/Y s ′ = β ) = 1 2 for Connections any α, β ∈ { 0 , 1 } . 2-Universal Hash Function Pr ( Y s = α ∧ Y s ′ = β ) = Pr ( Y s = α/Y s ′ = β ) Pr ( Y s ′ = Perfect Hashing β ) = 1 2 ∗ 1 2 = 1 Proofs 4 = ⇒ y s and y s ′ are mutually independent. Storage Requirements: Set X of k bits to generate 2 k − 1 random mutually independent bits. Question: Is it a 3-Universal family? Consider k = 3 . There are 7 non-empty subsets of three random bits { x 1 , x 2 , x 3 } . Bits y { x 1 } and y { x 2 } completely determine the bit y { x 1 ,x 2 } .
Example III: 2 -Universal Family Hashing Anil Maheshwari U = { 0 , 1 } log u and Index set I = { 0 , 1 } log n Setting Balls & Bins Hash function family is the set of Random Boolean Matrix Connections of dimension log n × log u . 2-Universal Hash Function For example, for U = { 0 , 1 } 6 and n = 2 4 , we may have Perfect Hashing Proofs 1 1 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 = (mod 2) 0 1 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 The matrix maps 101100 ∈ U to the index (1110) 2 = 13 Property: Pr ( Hx = Hy ) = 1 n for any x � = y ∈ U . Space= | H | = O (log u log n )
Recommend
More recommend