Randomness in Computing L ECTURE 16 Last time • Hashing • Universal hash families Today • Using universal hash families • Perfect hashing • Bloom filters 3/24/2020 Sofya Raskhodnikova;Randomness in Computing
Static dictionary problem Motivating example Password checker to prevent people from using common passwords. • S is the set of common passwords • Universe: set 𝑉 • 𝑇 ⊆ 𝑉 and 𝑛 = |𝑇| • 𝑛 ≪ |𝑉 | Goal: A data structure for storing 𝑻 that supports the search query “ Does 𝑥 ∈ 𝑻 ?” for all words 𝑥 ∈ 𝑽 . 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Solutions Deterministic solutions • Store 𝑻 as a sorted array (or as a binary search tree) Search time : O ( log 𝑛 ), Space : O ( 𝑛 ) • Store an array that for each 𝑥 ∈ 𝑽 has 1 if 𝑥 ∈ 𝑻 and 0 otherwise. Search time: O ( 1 ), Space : O ( |𝑽| ) A randomized solution • Hashing
Chain Hashing • Hash table: 𝒐 bins, words that fall in the Elements of 𝑻 same bin are chained into a linked list. • Hash function: ℎ : 𝑉 [𝑜] 1 To construct the table 2 hash all elements of 𝑇 ⋮ To search for word 𝒙 check if 𝑥 is in bin ℎ(𝑥) ⋮ Desiderata for 𝒊 : • O(1) evaluation time. 𝒐 • O(1) space to store ℎ .
Universal hash family • A set ℋ of hash functions is universal if for every pair 𝑥 1 , 𝑥 2 ∈ 𝑉 and for ℎ chosen uniformly from ℋ ≤ 1 Pr ℎ 𝑥 1 = ℎ 𝑥 2 𝑜 Constructing a universal hash family • Fix a prime 𝑞 ≥ |𝑉| and think of the range as 0,1, … , 𝑜 − 1 . • Define 𝒊 𝒃,𝒄 𝒚 = 𝑏𝑦 + 𝑐 𝑛𝑝𝑒 𝑞 𝑛𝑝𝑒 𝑜 ℋ = ℎ 𝑏,𝑐 𝑏 ∈ 𝑞 − 1 , 0 ≤ 𝑐 ≤ 𝑞 − 1} Theorem ℋ is universal. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Using a universal family As before : ≤ 𝒏 • If 𝑥 ∉ 𝑇, expected number of words in bin ℎ(𝑥) is 𝒐 ≤ 𝟐 + 𝒏 − 𝟐 • If 𝑥 ∈ 𝑇, expected number of words in bin ℎ(𝑥) is 𝒐 The previous guarantee on max load no longer holds! Goal: Given 𝑇, find a hash function with no collisions for words in S. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Perfect hashing: no collisions Theorem If ℎ: 𝑉 → {0,1, … , 𝑜 − 1} is chosen uniformly at random from a universal hash family, then ∀𝑇 of size 𝑛 , such that 𝑜 ≥ 𝑛 2 , Pr ℎ is perfect ≥ 1/2. Proof: Let 𝑡 1 , … , 𝑡 𝑛 be elements of 𝑇. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Perfect hashing Theorem If ℎ: 𝑉 → {0,1, … , 𝑜 − 1} is chosen uniformly at random from a universal hash family, then ∀𝑇 of size 𝑛 , such that 𝑜 ≥ 𝑛 2 , Pr ℎ is perfect ≥ 1/2. • Select ℎ ∈ ℋ until a perfect ℎ is found. • Expected number of tries is at most 2. • Each try takes 𝑃(𝑛) time. • Drawback: Ω 𝑛 2 space. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
2-level scheme for perfect hashing • Set 𝑛 = 𝑜 . • Select ℎ ∈ ℋ until ℎ with at most 𝑛 collisions is found. • For each bin 𝑗 with collisions, that is, with 𝑙 > 1 items: – select a new hash function ℎ 𝑗 with 𝑙 2 bins from a universal family until ℎ 𝑗 has no collisions. 0 0 1 1 2 2 . . . . . . 𝑜 − 1 𝑜 − 1 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
2-level scheme for perfect hashing • Set 𝑛 = 𝑜 . • Select ℎ ∈ ℋ until ℎ with at most 𝑛 collisions is found. • For each bin 𝑗 with collisions, that is, with 𝑙 > 1 items: – select a new hash function ℎ 𝑗 with 𝑙 2 bins from a universal family until ℎ 𝑗 has no collisions. Theorem 2-level scheme achieves perfect hashing with 𝑃(𝑛) space. A solution for static dictionary problem with: • O( 1 ) worst case guarantee on search time. • O( 𝒏 ) space. • Expected O( 𝒏 ) preprocessing time. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Analysis of 2-level scheme Theorem 2-level scheme achieves perfect hashing with 𝑃(𝑛) space. Proof: • Let 𝑌 = # of collisions in Stage 1. 𝑛 2 1 • We showed before: Pr 𝑌 > ≤ 2 . 𝑜 1 • Now 𝑛 = 𝑜 : Pr 𝑌 > 𝑛 ≤ 2 . • So at least half of ℎ ∈ ℋ have ≤ 𝑛 collisions. • Assume we found such ℎ . 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Analysis of 2-level scheme Theorem 2-level scheme achieves perfect hashing with 𝑃(𝑛) space. Proof (continued): 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Conclusion: 2-level hashing A solution for static dictionary problem with: • O( 1 ) worst case guarantee on search time. • O( 𝒏 ) space. • Expected O( 𝒏 ) preprocessing time. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Approximate solutions for static dictionary problem • False positives: If 𝑥 ∈ 𝑻 , our data structure must answer correctly. If 𝑥 ∉ 𝑻 , we may err with small probability. • E.g, we prevent all unsuitable passwords and some suitable ones, too. Fingerprints • Use hash function ℎ • Store sorted list 𝑀 of fingerprints ℎ 𝑦 , 𝑦 ∈ 𝑇 . • To see if 𝑥 ∈ 𝑇, perform binary search for ℎ 𝑥 . 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Bloom filters • Trade off between space and false positive probability • Parameters 𝑙, 𝑜 • Bloom filter: array of 𝑜 bits 𝐵 1 , … , 𝐵[𝑜] – Initially: all bits are 0 𝑙 independent random hash functions ℎ 1 , … , ℎ 𝑙 with range [𝑜] • To represent set 𝑇 – For each 𝑦 ∈ 𝑇 and 𝑗 ∈ [𝑙] , set bits 𝐵[ℎ 𝑗 𝑦 ] to 1. • To search for 𝑥: – If for all 𝑗 ∈ 𝑙 , bits 𝐵 ℎ 𝑗 𝑥 = 1 , accept, o.w. reject. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing
Recommend
More recommend