15-853:Algorithms in the Real World Announcements: • HW2 due tomorrow noon. • Small correction made in the BWT question. • Naama’s office hour cancelled. Francisco holding additional office hours instead. 15-853 Page 1
15-853:Algorithms in the Real World Announcements: • Plan for the coming week: • I am away at ACM SOSP 2019 • Graph compression guest lecture on Oct 29 by Laxman Dhulipala • Cryptography-1 guest lecture on Oct 31 by Francisco Maturana • There will be a homework on Hashing + Cryptography modules by the end of first week of November 15-853 Page 2
15-853:Algorithms in the Real World Announcements: Course project: • 2-3 people teams • 3 types of projects • Survey of a topic: At least 2 papers per team member (state-of-the-art papers; can include surveys) • Read papers (at least 3) + light weight “research - y” stuff (potentially implementation and comparison etc.) • Full fledged research: typically based on one paper and addressing a research question 15-853 Page 3
15-853:Algorithms in the Real World Announcements: Course project: • By Friday Nov 8 team and project plan (which papers, what question etc.) should be finalized • Share through one Google doc per team • Use the class email list: • 15853f19-students@lists.andrew.cmu.edu • with subject beginning “project -team- finding” to ping your classmates to form teams 15-853 Page 4
Ideas for project topics ECC: • Coding for distributed storage systems (at least 2 potential project topics here) • Several additional metrics become important such as “reconstruction locality”, “reconstruction bandwidth” • Several new classes of codes have been proposed as alternatives to Reed-Solomon codes, e.g., • Local reconstruction codes • Regenerating codes • Piggyback codes • Some employed in Microsoft Azure cloud storage, some in Apache Hadoop Distributed File System, some in Ceph, etc. 15-853 Page 5
Ideas for project topics ECC (cont.) • Coding for latency sensitive streaming communication (at least 1 potential project topic here) • Sequential encoding and decoding • Strict latency constraints • A new class of codes called “streaming codes” 15-853 Page 6
Ideas for project topics Compression: • Quantization in neural networks • DNA compression • Latest compression algorithm Zstd developed by Facebook 15-853 Page 7
Ideas for project topics Hashing: • Several network applications − Used for network monitoring − Sketching using hashing 15-853 Page 8
15-853:Algorithms in the Real World Hashing: Concentration bounds Load balancing: balls and bins Hash functions (cont.) 15-853 Page 9
Recall: Hashing Concrete running application for this module: dictionary . Setting: • A large universe of keys (e.g., set of all strings of certain length): denoted by U • The actual dictionary S (subset of U) • Let |S| = N (typically N << |U|) Operations: • add(x): add a key x • query(q): is key q there? • delete(x): remove the key x 15-853 Page 10
Recall: Hashing “.... with high probability there are not too many collisions among elements of S” On what is this probability calculated over? Two approaches: 1. Input is random 2. Input is arbitrary, but the hash function is random Input being random is typically not valid for many applications. So we will use 2. • We will assume a family of hash functions H. • When it is time to hash S, we choose a random function h ∈ H 15-853 Page 11
Recall: Hashing: Desired properties Let [M] = {0, 1, ..., M-1} We design a hash function h: U -> [M] 1. Small probability of distinct keys colliding: 1. If x≠y ∈ S, P[h(x) = h(y)] is “small” 2. Small range, i.e., small M so that the hash table is small 3. Small number of bits to store h 4. h is easy to compute 15-853 Page 12
Recall: Ideal Hash Function Perfectly random hash function: For each x ∈ S, h(x) =a uniformly random location in [M] Properties: • Low collision probability: P[h(x) = h(y)] = 1/M for any x≠y • Even conditioned on hashed values for any other subset A of S, for any element x ∈ S, h(x) is still uniformly random over [M] 15-853 Page 13
Recall: Universal Hash functions Captures the basic property of non-collision. Due to Carter and Wegman (1979) Definition: A family H of hash functions mapping U to [M] is universal if for any x≠y ∈ U, P[h(x) = h(y)] ≤ 1/M Note: Must hold for every pair of distinct x and y ∈ U. 15-853 Page 14
Recall: Universal Hash functions A simple construction of universal hashing: Assume |U| = 2 u and |M| = 2 𝑛 Let A be a m x u matrix with random binary entries. For any x ∈ U, view it as a u-bit binary vector, and define ℎ 𝑦 : = 𝐵𝑦 where the arithmetic is modulo 2. Theorem. The family of hash functions defined above is universal. 15-853 Page 15
Recall: Addressing collisions in hash table One of the main applications of hash functions is in hash tables (for dictionary data structures) Handling collisions: Closed addressing Each location maintains some other data structure One approach: “ separate chaining ” Each location in the table stores a linked list with all the elements mapped to that location. Look up time = length of the linked list To understand lookup time, we need to study the number of many collisions. 15-853 Page 16
Recall: Addressing collisions in hash table Let us study the number of many collisions: Let C(x) be the number of other elements mapped to the value where x is mapped to. Q: What is E[C(x)] ? E[C(x)] = (N-1)/M Hence if we use M = N = |S|, lookups take constant time in expectation . Item deletion is also easy. Let C = total number of collisions Q: What is E[C] ? 𝑂 2 1/𝑁 15-853 Page 17
Recall: Addressing collisions in hash table Can we design a collision free hash table? Suppose we choose M >= N 2 Q: P[there exists a collision] = ? ½ Can easily find a collision free hash table! Constant lookup time for all elements! (worst-case guarantee) But this is large a space requirement. (Space measured in terms of number of keys) Can we do better? O(N)? (while providing worst-case guarantee?) 15-853 Page 18
Application: Perfect hashing Handling collisions via “ two-level hashing ” First level hash table has size O(N) Each location in the hash table performs a collision-free hashing Let C(i) = number of elements mapped to location i in the first level table Q: For the second level table, what should the table size at location i? C(i)^2 (We know that for this size, we can find a collision-free hash function) 15-853 Page 19
Application: Perfect hashing Q: What is the total table space used in the second level? Q: What is the total table space? O(N) Collision-free and O(N) table space! 15-853 Page 20
k-wise independent hash functions In addition to universality, certain independence properties of hash functions are useful in analysis of algorithms Definition. A family H of hash functions mapping U to [M] is called k-wise-independent if for any k distinct keys we have Case for k=2 is called “pairwise independent. 15-853 Page 21
k-wise independent hash functions Properties: Suppose H is a k-wise independent family for k>=2. Then 1. H is also (k-1)-wise indepdent. 2. For any x ∈ U and a ∈ [M] P[h(x) = a] = 1/M. 3. H is universal. Q: Which is stronger: pairwise independent or universal? Pairwise independent is stronger. E.g.? h(x) = Ax construction since P[h(0) = 0] = 1 15-853 Page 22
Some constructions: 2-wise independent Construction 1 (variant of random matrix multiplication): Let A be a m x u matrix with uniformly random binary entries. Let b be a m-bit vector with uniformly random binary entries. ℎ 𝑦 : = 𝐵𝑦 + 𝑐 where the arithmetic is modulo 2. Claim. This family of hash functions is 2-wise independent. Q: How many hash functions are in this family? 2 (u+1)m Q: Number of bits to store? O(um) Can we do with fewer bits? 15-853 Page 23
Some constructions: 2-wise independent Construction 2 (Using fewer bits): Let A be a m x u matrix. • Fill the first row and column with uniformly random binary entries. • Set A i,j = A i-1,j-1 Let b be a m-bit vector with uniformly random binary entries. ℎ 𝑦 : = 𝐵𝑦 + 𝑐 where the arithmetic is modulo 2. Claim. This family of hash functions is 2-wise independent. (HW) 15-853 Page 24
Some constructions: 2-wise independent Construction 3 (Using finite fields) Consider GF(2 u ) Pick two random numbers a, b ∈ GF(2 u ). For any x ∈ U, define h(x) := ax + b where the calculations are done over the field GF(2u). Q: What is the domain and range of this mapping? [U] to [U] Q: Is it 2-wise independent? Yes (write as a matrix and invert) <board> 15-853 Page 25
Some constructions: 2-wise independent Construction 3 (Using finite fields) Consider GF(2 u ). Pick two random numbers a, b ∈ GF(2 u ). For any x ∈ U, define h(x) := ax + b where the calculations are done over the field GF(2u). Q: What is the domain and range of this mapping? [U] to [U] Q: Is it 2-wise independent? Yes Q: How change the range to [M]? Truncate last u=m bits. Still is 2-wise independent. 15-853 Page 26
Recommend
More recommend