Randomized QuickSort : High Probability Analysis If k levels of recursion then kn comparisons. Fix an element s ∈ A . We will track it at each level. Let S i be the partition containing s at i th level. S 1 = A and S k = { s } . We call s lucky in i th iteration, if balanced split : | S i+1 | ≤ (3 / 4) | S i | and | S i \ S i+1 | ≤ (3 / 4) | S i | . If ρ = #lucky rounds in first k rounds, then | S k | ≤ (3 / 4) ρ n . For | S k | = 1 , ρ = 4 ln n ≥ log 4 / 3 n suffices. Chandra & Ruta (UIUC) CS473 9 Fall 2016 9 / 41
How may rounds before 4 ln n lucky rounds? X i = 1 if s is lucky in i th iteration. Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 41
How may rounds before 4 ln n lucky rounds? X i = 1 if s is lucky in i th iteration. Observation: X 1 , . . . , X k are independent variables. Pr[X i = 1] = 1 Why? 2 Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 41
How may rounds before 4 ln n lucky rounds? X i = 1 if s is lucky in i th iteration. Observation: X 1 , . . . , X k are independent variables. Pr[X i = 1] = 1 Why? 2 Clearly, ρ = � k i=1 X i . Let µ = E[ ρ ] = k 2 . Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 41
How may rounds before 4 ln n lucky rounds? X i = 1 if s is lucky in i th iteration. Observation: X 1 , . . . , X k are independent variables. Pr[X i = 1] = 1 Why? 2 Clearly, ρ = � k i=1 X i . Let µ = E[ ρ ] = k 2 . Set k = 32 ln n and δ = 3 4 . (1 − δ ) = 1 4 . Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 41
How may rounds before 4 ln n lucky rounds? X i = 1 if s is lucky in i th iteration. Observation: X 1 , . . . , X k are independent variables. Pr[X i = 1] = 1 Why? 2 Clearly, ρ = � k i=1 X i . Let µ = E[ ρ ] = k 2 . Set k = 32 ln n and δ = 3 4 . (1 − δ ) = 1 4 . Probability of ≤ 4 ln n lucky rounds out of 32 ln n rounds is, Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 41
How may rounds before 4 ln n lucky rounds? X i = 1 if s is lucky in i th iteration. Observation: X 1 , . . . , X k are independent variables. Pr[X i = 1] = 1 Why? 2 Clearly, ρ = � k i=1 X i . Let µ = E[ ρ ] = k 2 . Set k = 32 ln n and δ = 3 4 . (1 − δ ) = 1 4 . Probability of ≤ 4 ln n lucky rounds out of 32 ln n rounds is, Pr[ ρ ≤ 4 ln n] = Pr[ ρ ≤ k / 8 ] = Pr[ ρ ≤ (1 − δ ) µ ] Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 41
How may rounds before 4 ln n lucky rounds? X i = 1 if s is lucky in i th iteration. Observation: X 1 , . . . , X k are independent variables. Pr[X i = 1] = 1 Why? 2 Clearly, ρ = � k i=1 X i . Let µ = E[ ρ ] = k 2 . Set k = 32 ln n and δ = 3 4 . (1 − δ ) = 1 4 . Probability of ≤ 4 ln n lucky rounds out of 32 ln n rounds is, Pr[ ρ ≤ 4 ln n] = Pr[ ρ ≤ k / 8 ] = Pr[ ρ ≤ (1 − δ ) µ ] − δ 2 µ (Chernoff) ≤ e 2 e − 9k = 64 e − 4 . 5 ln n ≤ 1 = n 4 Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 41
Randomized QuickSort w.h.p. Analysis n input elements. Probability that depth of recursion in 1 1 QuickSort > 32 ln n is at most n 4 ∗ n = n 3 . Chandra & Ruta (UIUC) CS473 11 Fall 2016 11 / 41
Randomized QuickSort w.h.p. Analysis n input elements. Probability that depth of recursion in 1 1 QuickSort > 32 ln n is at most n 4 ∗ n = n 3 . Theorem With high probability (i.e., 1 − 1 n 3 ) the depth of the recursion of QuickSort is ≤ 32 ln n . Due to n comparisons in each level, with high probability, the running time of QuickSort is O(n ln n) . Chandra & Ruta (UIUC) CS473 11 Fall 2016 11 / 41
Randomized QuickSort w.h.p. Analysis n input elements. Probability that depth of recursion in 1 1 QuickSort > 32 ln n is at most n 4 ∗ n = n 3 . Theorem With high probability (i.e., 1 − 1 n 3 ) the depth of the recursion of QuickSort is ≤ 32 ln n . Due to n comparisons in each level, with high probability, the running time of QuickSort is O(n ln n) . Q: How to increase the probability? Chandra & Ruta (UIUC) CS473 11 Fall 2016 11 / 41
Part II Balls and Bins Chandra & Ruta (UIUC) CS473 12 Fall 2016 12 / 41
Expected Bin Size Problem If n balls are thrown independently and uniformly into n bins, how many balls lend in a bin in expectation (expected size of a bin)? Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 41
Expected Bin Size Problem If n balls are thrown independently and uniformly into n bins, how many balls lend in a bin in expectation (expected size of a bin)? Solution Fix a bin, say j . Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 41
Expected Bin Size Problem If n balls are thrown independently and uniformly into n bins, how many balls lend in a bin in expectation (expected size of a bin)? Solution Fix a bin, say j . Random variable X ij is 1 if i th balls falls in j th bin, otherwise 0 . Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 41
Expected Bin Size Problem If n balls are thrown independently and uniformly into n bins, how many balls lend in a bin in expectation (expected size of a bin)? Solution Fix a bin, say j . Random variable X ij is 1 if i th balls falls in j th bin, otherwise 0 . E[X ij ] = Pr[X ij = 1] = Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 41
Expected Bin Size Problem If n balls are thrown independently and uniformly into n bins, how many balls lend in a bin in expectation (expected size of a bin)? Solution Fix a bin, say j . Random variable X ij is 1 if i th balls falls in j th bin, otherwise 0 . E[X ij ] = Pr[X ij = 1] =1 / n . Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 41
Expected Bin Size Problem If n balls are thrown independently and uniformly into n bins, how many balls lend in a bin in expectation (expected size of a bin)? Solution Fix a bin, say j . Random variable X ij is 1 if i th balls falls in j th bin, otherwise 0 . E[X ij ] = Pr[X ij = 1] =1 / n . R.V. Y j = # balls in j th bin = � n i=1 X ij . Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 41
Expected Bin Size Problem If n balls are thrown independently and uniformly into n bins, how many balls lend in a bin in expectation (expected size of a bin)? Solution Fix a bin, say j . Random variable X ij is 1 if i th balls falls in j th bin, otherwise 0 . E[X ij ] = Pr[X ij = 1] =1 / n . R.V. Y j = # balls in j th bin = � n i=1 X ij . E[Y j ] = � n i=1 E[X ij ] = n · 1 / n = 1 . Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 41
Expected Max Bin Size Problem If n balls are thrown independently and uniformly into n bins, what is the expected maximum bin size? Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 41
Expected Max Bin Size Problem If n balls are thrown independently and uniformly into n bins, what is the expected maximum bin size? � � max n j=1 Y j ? E Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 41
Expected Max Bin Size Problem If n balls are thrown independently and uniformly into n bins, what is the expected maximum bin size? � � max n j=1 Y j ? E Possible Solution j=1 Y j . E[Z] = � n R.V. Z = max n k=1 Pr[Z = k] k . Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 41
Expected Max Bin Size Problem If n balls are thrown independently and uniformly into n bins, what is the expected maximum bin size? � � max n j=1 Y j ? E Possible Solution j=1 Y j . E[Z] = � n R.V. Z = max n k=1 Pr[Z = k] k . How to compute Pr[Z = k] , i.e., count configurations where no bin has more than k balls and at least one has k balls. Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 41
Expected Max Bin Size Problem If n balls are thrown independently and uniformly into n bins, what is the expected maximum bin size? � � max n j=1 Y j ? E Possible Solution j=1 Y j . E[Z] = � n R.V. Z = max n k=1 Pr[Z = k] k . How to compute Pr[Z = k] , i.e., count configurations where no bin has more than k balls and at least one has k balls. Too many to count!! Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? � ln n R.V. Z = max n � j=1 Y j . Show E[Z] ≤ O ? ln ln n Possible Solution Z > 8 ln n ≤ 1 / n 2 , then: define A = 8 ln n � � If Pr ln ln n . ln ln n Chandra & Ruta (UIUC) CS473 15 Fall 2016 15 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? � ln n R.V. Z = max n � j=1 Y j . Show E[Z] ≤ O ? ln ln n Possible Solution Z > 8 ln n ≤ 1 / n 2 , then: define A = 8 ln n � � If Pr ln ln n . ln ln n � A k=1 Pr[Z = k] A + � n E[Z] ≤ k=A+1 Pr[Z = k] n ≤ A · Pr[Z ≤ A] + n · Pr[Z > A] � ln n A · (1) + n · (1 / n 2 ) = O(A) = O � ≤ ln ln n Chandra & Ruta (UIUC) CS473 15 Fall 2016 15 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? � ln n R.V. Z = max n � j=1 Y j . Show E[Z] ≤ O ? ln ln n Possible Solution Z > 8 ln n ≤ 1 / n 2 , then: define A = 8 ln n � � If Pr ln ln n . ln ln n � A k=1 Pr[Z = k] A + � n E[Z] ≤ k=A+1 Pr[Z = k] n ≤ A · Pr[Z ≤ A] + n · Pr[Z > A] � ln n A · (1) + n · (1 / n 2 ) = O(A) = O � ≤ ln ln n Z > 8 ln n � � Bound Pr . ln ln n Chandra & Ruta (UIUC) CS473 15 Fall 2016 15 / 41
Expected Max Bin Size (Contd.) Z > 8 ln n � � Bound Pr using Chernoff inequality. ln ln n Chernoff Ineq. We Saw X 1 , . . . , X k independent binary R.V., and X = � k i=1 X i , µ = E[X] , then for 0 < δ < 1 Pr[X ≥ (1 + δ ) µ ] ≤ e − δ 2 µ/ 3 Pr[X ≤ (1 − δ ) µ ] ≤ e − δ 2 µ/ 2 & Chandra & Ruta (UIUC) CS473 16 Fall 2016 16 / 41
Expected Max Bin Size (Contd.) Z > 8 ln n � � Bound Pr using Chernoff inequality. ln ln n Chernoff Ineq. We Saw X 1 , . . . , X k independent binary R.V., and X = � k i=1 X i , µ = E[X] , then for 0 < δ < 1 Pr[X ≥ (1 + δ ) µ ] ≤ e − δ 2 µ/ 3 Pr[X ≤ (1 − δ ) µ ] ≤ e − δ 2 µ/ 2 & Stronger Versions � µ � e δ For δ > 0 , Pr[X > (1 + δ ) µ ] < . (1+ δ ) (1+ δ ) � µ � e − δ For 0 < δ < 1 Pr[X < (1 − δ ) µ ] < (1 − δ ) (1 − δ ) Chandra & Ruta (UIUC) CS473 16 Fall 2016 16 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? Let Z = max n j=1 Y j . Show E[Z] ≤ O( ln n Z > 8 ln n � � ≤ 1 / n 2 . ln ln n ) . → Show Pr ln ln n Chandra & Ruta (UIUC) CS473 17 Fall 2016 17 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? Let Z = max n j=1 Y j . Show E[Z] ≤ O( ln n Z > 8 ln n � � ≤ 1 / n 2 . ln ln n ) . → Show Pr ln ln n Solution Recall: Y j = # balls in bin j , E[Y j ] = 1 , and A = 8 ln n ln ln n � � � e A − 1 n 6 / ln ln n � Pr[Y j > A] = Pr[Y j ≥ A E[Y]] < < A A A A Chandra & Ruta (UIUC) CS473 17 Fall 2016 17 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? Let Z = max n j=1 Y j . Show E[Z] ≤ O( ln n Z > 8 ln n � � ≤ 1 / n 2 . ln ln n ) . → Show Pr ln ln n Solution Recall: Y j = # balls in bin j , E[Y j ] = 1 , and A = 8 ln n ln ln n � � � e A − 1 n 6 / ln ln n � Pr[Y j > A] = Pr[Y j ≥ A E[Y]] < < A A A A � 8 ln n � 8 ln n √ ln ln n A A = ln ln n = e 4lgn = n 4 8 ln n 4 ln n ln ln n = (ln n) ≥ ( ln n) ln ln n Chandra & Ruta (UIUC) CS473 17 Fall 2016 17 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? Let Z = max n j=1 Y j . Show E[Z] ≤ O( ln n Z > 8 ln n � � ≤ 1 / n 2 . ln ln n ) . → Show Pr ln ln n Solution Recall: Y j = # balls in bin j , E[Y j ] = 1 , and A = 8 ln n ln ln n � � � e A − 1 n 6 / ln ln n � Pr[Y j > A] = Pr[Y j ≥ A E[Y]] < < A A A A � 8 ln n � 8 ln n √ ln ln n A A = ln ln n = e 4lgn = n 4 8 ln n 4 ln n ln ln n = (ln n) ≥ ( ln n) ln ln n Y j > 8 ln n � � < 1 / n 3 Pr ln ln n Chandra & Ruta (UIUC) CS473 17 Fall 2016 17 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? Let Z = max n j=1 Y j . Show E[Z] ≤ O( ln n Z > 8 ln n � � ≤ 1 / n 2 . ln ln n ) . → Show Pr ln ln n Solution Recall: Y j = # balls in bin j . E[Y j ] = 1 . Pr[Y j > 8 ln n / ln ln n] ≤ 1 / n 3 ( Using Chernoff ) Chandra & Ruta (UIUC) CS473 18 Fall 2016 18 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? Let Z = max n j=1 Y j . Show E[Z] ≤ O( ln n Z > 8 ln n � � ≤ 1 / n 2 . ln ln n ) . → Show Pr ln ln n Solution Recall: Y j = # balls in bin j . E[Y j ] = 1 . Pr[Y j > 8 ln n / ln ln n] ≤ 1 / n 3 ( Using Chernoff ) (Union bound) ≤ n · 1 / n 3 = 1 / n 2 . ≤ � n Z > 8 ln n Y j > 8 ln n � � � � Pr j=1 Pr ln ln n ln ln n Chandra & Ruta (UIUC) CS473 18 Fall 2016 18 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? Let Z = max n j=1 Y j . Show E[Z] ≤ O( ln n Z > 8 ln n � � ≤ 1 / n 2 . ln ln n ) . → Show Pr ln ln n Solution Recall: Y j = # balls in bin j . E[Y j ] = 1 . Pr[Y j > 8 ln n / ln ln n] ≤ 1 / n 3 ( Using Chernoff ) (Union bound) ≤ n · 1 / n 3 = 1 / n 2 . ≤ � n Z > 8 ln n Y j > 8 ln n � � � � Pr j=1 Pr ln ln n ln ln n Max bin size is at most O( ln n ln ln n ) with probability 1 − 1 / n 2 . Chandra & Ruta (UIUC) CS473 18 Fall 2016 18 / 41
Expected Max Bin Size (Contd.) Problem What is the expected maximum bin size? Let Z = max n j=1 Y j . Show E[Z] ≤ O( ln n Z > 8 ln n � � ≤ 1 / n 2 . ln ln n ) . → Show Pr ln ln n Solution Recall: Y j = # balls in bin j . E[Y j ] = 1 . Pr[Y j > 8 ln n / ln ln n] ≤ 1 / n 3 ( Using Chernoff ) (Union bound) ≤ n · 1 / n 3 = 1 / n 2 . ≤ � n Z > 8 ln n Y j > 8 ln n � � � � Pr j=1 Pr ln ln n ln ln n Max bin size is at most O( ln n ln ln n ) with probability 1 − 1 / n 2 . Ω( ln n ln ln n ) is a lower bound as well! Chandra & Ruta (UIUC) CS473 18 Fall 2016 18 / 41
Balls n Bins → Hashing Hashing Storing elements in a table such that look up is O(1) -time. Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 41
Balls n Bins → Hashing Hashing Storing elements in a table such that look up is O(1) -time. Throwing numbered balls Imagine that n balls have numbers coming from a universe U . |U| ≫ n . Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 41
Balls n Bins → Hashing Hashing Storing elements in a table such that look up is O(1) -time. Throwing numbered balls Imagine that n balls have numbers coming from a universe U . |U| ≫ n . Hashing: throw balls (elements) randomly into n bins such that bin sizes are small Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 41
Balls n Bins → Hashing Hashing Storing elements in a table such that look up is O(1) -time. Throwing numbered balls Imagine that n balls have numbers coming from a universe U . |U| ≫ n . Hashing: throw balls (elements) randomly into n bins such that bin sizes are small and also lookup is easy! . Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 41
Part III Hash Tables Chandra & Ruta (UIUC) CS473 20 Fall 2016 20 / 41
Dictionary Data Structure U : universe of keys with total order: numbers, strings, etc. 1 Data structure to store a subset S ⊆ U 2 Operations: 3 Search / lookup : given x ∈ U is x ∈ S ? 1 Insert : given x �∈ S add x to S . 2 Delete : given x ∈ S delete x from S 3 Static structure: S given in advance or changes very 4 infrequently, main operations are lookups. Dynamic structure: S changes rapidly so inserts and deletes as 5 important as lookups. Chandra & Ruta (UIUC) CS473 21 Fall 2016 21 / 41
Dictionary Data Structures Common solutions: Static: 1 Store S as a sorted array 1 Lookup : Binary search in O(log | S | ) time (comparisons) 2 Dynamic: 2 Store S in a balanced binary search tree 1 Lookup, Insert, Delete in O(log | S | ) time (comparisons) 2 Chandra & Ruta (UIUC) CS473 22 Fall 2016 22 / 41
Dictionary Data Structures Question: “Should Tables be Sorted?” (also title of famous paper by Turing award winner Andy Yao) Chandra & Ruta (UIUC) CS473 23 Fall 2016 23 / 41
Dictionary Data Structures Question: “Should Tables be Sorted?” (also title of famous paper by Turing award winner Andy Yao) Hashing is a widely used & powerful technique for dictionaries. Motivation: Universe U may not be (naturally) totally ordered. 1 Keys correspond to large objects (images, graphs etc) for which 2 comparisons are very expensive. Want to improve “average” performance of lookups to O(1) 3 even at cost of extra space or errors with small probability: many applications for fast lookups in networking, security, etc. Chandra & Ruta (UIUC) CS473 23 Fall 2016 23 / 41
Hashing and Hash Tables Hash Table data structure: A (hash) table/array T of size m (the table size ). 1 A hash function h : U → { 0 , . . . , m − 1 } . 2 Item x ∈ U hashes to slot h(x) in T . 3 Chandra & Ruta (UIUC) CS473 24 Fall 2016 24 / 41
Hashing and Hash Tables Hash Table data structure: A (hash) table/array T of size m (the table size ). 1 A hash function h : U → { 0 , . . . , m − 1 } . 2 Item x ∈ U hashes to slot h(x) in T . 3 Given S ⊆ U . How do we store S and how do we do lookups? Chandra & Ruta (UIUC) CS473 24 Fall 2016 24 / 41
Hashing and Hash Tables Hash Table data structure: A (hash) table/array T of size m (the table size ). 1 A hash function h : U → { 0 , . . . , m − 1 } . 2 Item x ∈ U hashes to slot h(x) in T . 3 Given S ⊆ U . How do we store S and how do we do lookups? Ideal situation: Each element x ∈ S hashes to a distinct slot in T . Store x in 1 slot h(x) Lookup : Given y ∈ U check if T[h(y)] = y . O(1) time! 2 Chandra & Ruta (UIUC) CS473 24 Fall 2016 24 / 41
Hashing and Hash Tables Hash Table data structure: A (hash) table/array T of size m (the table size ). 1 A hash function h : U → { 0 , . . . , m − 1 } . 2 Item x ∈ U hashes to slot h(x) in T . 3 Given S ⊆ U . How do we store S and how do we do lookups? Ideal situation: Each element x ∈ S hashes to a distinct slot in T . Store x in 1 slot h(x) Lookup : Given y ∈ U check if T[h(y)] = y . O(1) time! 2 Collisions unavoidable if | T | < |U| . Several techniques to handle them. Chandra & Ruta (UIUC) CS473 24 Fall 2016 24 / 41
Handling Collisions: Chaining Collision: h(x) = h(y) for some x � = y . Chaining to handle collisions: For each slot i store all items hashed to slot i in a linked list. 1 T[i] points to the linked list Lookup : to find if y ∈ U is in T , check the linked list at 2 T[h(y)] . Time proportion to size of linked list. f y s This is also known as Open hashing . Chandra & Ruta (UIUC) CS473 25 Fall 2016 25 / 41
Handling Collisions Several other techniques: Cuckoo hashing. 1 Every value has two possible locations. When inserting, insert in one of the locations, otherwise, kick stored value to its other location. Repeat till stable. if no stability then rebuild table. . . . 2 Others. 3 Chandra & Ruta (UIUC) CS473 26 Fall 2016 26 / 41
Understanding Hashing Does hashing give O(1) time per operation for dictionaries? Chandra & Ruta (UIUC) CS473 27 Fall 2016 27 / 41
Understanding Hashing Does hashing give O(1) time per operation for dictionaries? Questions: Complexity of evaluating h on a given element? 1 Relative sizes of the universe U and the set to be stored S . 2 Size of table relative to size of S . 3 Worst-case vs average-case vs randomized (expected) time? 4 How do we choose h ? 5 Chandra & Ruta (UIUC) CS473 27 Fall 2016 27 / 41
Understanding Hashing Complexity of evaluating h on a given element? Should be small. 1 Relative sizes of the universe U and the set to be stored S : 2 typically |U| ≫ | S | . Size of table relative to size of S . The load factor of T is the 3 ratio n / m where n = | S | and m = | T | . Typically n / m is a small constant smaller than 1 . Also known as the fill factor . Chandra & Ruta (UIUC) CS473 28 Fall 2016 28 / 41
Understanding Hashing Complexity of evaluating h on a given element? Should be small. 1 Relative sizes of the universe U and the set to be stored S : 2 typically |U| ≫ | S | . Size of table relative to size of S . The load factor of T is the 3 ratio n / m where n = | S | and m = | T | . Typically n / m is a small constant smaller than 1 . Also known as the fill factor . Main and interrelated questions: Worst-case vs average-case vs randomized (expected) time? 1 How do we choose h ? 2 Chandra & Ruta (UIUC) CS473 28 Fall 2016 28 / 41
Single hash function U : universe (very large). 1 Assume N = |U| ≫ m where m is size of table T . In particular 2 assume N ≥ m 2 (very conservative). Fix hash function h : U → { 0 , . . . , m − 1 } . 3 N items hashed to m slots. By pigeon hole principle there is 4 some i ∈ { 0 , . . . , m − 1 } such that N / m ≥ m elements of U get hashed to i (!). Implies that there is a set S ⊆ U where | S | = m such that all 5 of S hashes to same slot. Ooops. Chandra & Ruta (UIUC) CS473 29 Fall 2016 29 / 41
Single hash function U : universe (very large). 1 Assume N = |U| ≫ m where m is size of table T . In particular 2 assume N ≥ m 2 (very conservative). Fix hash function h : U → { 0 , . . . , m − 1 } . 3 N items hashed to m slots. By pigeon hole principle there is 4 some i ∈ { 0 , . . . , m − 1 } such that N / m ≥ m elements of U get hashed to i (!). Implies that there is a set S ⊆ U where | S | = m such that all 5 of S hashes to same slot. Ooops. Lesson: For every hash function there is a very bad set. Bad set. Bad. Chandra & Ruta (UIUC) CS473 29 Fall 2016 29 / 41
How many hash functions are there, anyway? Let H be the set of all functions from U = { 1 , . . . , U } to { 1 , . . . , m } . The number of functions in H is (A) U + m . (B) Um . (C) U m . (D) m U . � U+m � (E) . m (F) The answer is blowing in the wind. Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 41
How many bits one need? Let H be a set of functions from U = { 1 , . . . , U } to { 1 , . . . , m } . Specifying a function in H requires: (A) O(U + m) bits. (B) O(Um) bits. (C) O(U m ) bits. � m U � (D) O bits. (E) O(log |H| ) bits. (F) Many many bits. At least two. Chandra & Ruta (UIUC) CS473 31 Fall 2016 31 / 41
Picking a hash function Hash function are often chosen in an ad hoc fashion. Implicit 1 assumption is that input behaves well. May work well for aircraft control. Susceptible to denial of 2 service attack in routing. Chandra & Ruta (UIUC) CS473 32 Fall 2016 32 / 41
Picking a hash function Hash function are often chosen in an ad hoc fashion. Implicit 1 assumption is that input behaves well. May work well for aircraft control. Susceptible to denial of 2 service attack in routing. Parameters: N = |U| , m = | T | , n = | S | H is a family of hash functions: each function h ∈ H should 1 be efficient to evaluate (that is, to compute h(x) ). h is chosen randomly from H (typically uniformly at random). 2 Implicitly assumes that H allows an efficient sampling. Randomized guarantee: should have the property that for any 3 fixed set S ⊆ U of size m the expected number of collisions for a function chosen from H should be “small”. Here the expectation is over the randomness in choice of h . Chandra & Ruta (UIUC) CS473 32 Fall 2016 32 / 41
Picking a hash function Question: Why not let H be the set of all functions from U to { 0 , 1 , . . . , m − 1 } ? Chandra & Ruta (UIUC) CS473 33 Fall 2016 33 / 41
Picking a hash function Question: Why not let H be the set of all functions from U to { 0 , 1 , . . . , m − 1 } ? Too many functions! A random function has high complexity! 1 # of functions: M = m |U| . Bits to encode such a function ≈ log M = |U| log m . Chandra & Ruta (UIUC) CS473 33 Fall 2016 33 / 41
Picking a hash function Question: Why not let H be the set of all functions from U to { 0 , 1 , . . . , m − 1 } ? Too many functions! A random function has high complexity! 1 # of functions: M = m |U| . Bits to encode such a function ≈ log M = |U| log m . Question: Are there good and compact families H ? Chandra & Ruta (UIUC) CS473 33 Fall 2016 33 / 41
Picking a hash function Question: Why not let H be the set of all functions from U to { 0 , 1 , . . . , m − 1 } ? Too many functions! A random function has high complexity! 1 # of functions: M = m |U| . Bits to encode such a function ≈ log M = |U| log m . Question: Are there good and compact families H ? Yes... But what it means for H to be good and compact. 1 Chandra & Ruta (UIUC) CS473 33 Fall 2016 33 / 41
Uniform hashing Question: What are good properties of H in distributing data? Chandra & Ruta (UIUC) CS473 34 Fall 2016 34 / 41
Uniform hashing Question: What are good properties of H in distributing data? Consider any element x ∈ U . Then if h ∈ H is picked 1 randomly then x should go into a random slot in T . In other words Pr [h(x) = i] = 1 / m for every 0 ≤ i < m . (Uniform) Chandra & Ruta (UIUC) CS473 34 Fall 2016 34 / 41
Uniform hashing Question: What are good properties of H in distributing data? Consider any element x ∈ U . Then if h ∈ H is picked 1 randomly then x should go into a random slot in T . In other words Pr [h(x) = i] = 1 / m for every 0 ≤ i < m . (Uniform) Consider any two distinct elements x , y ∈ U . Then if h ∈ H is 2 picked randomly then the probability of a collision between x and y should be at most 1 / m . In other words Pr [h(x) = h(y)] = 1 / m (cannot be smaller). Chandra & Ruta (UIUC) CS473 34 Fall 2016 34 / 41
Uniform hashing Question: What are good properties of H in distributing data? Consider any element x ∈ U . Then if h ∈ H is picked 1 randomly then x should go into a random slot in T . In other words Pr [h(x) = i] = 1 / m for every 0 ≤ i < m . (Uniform) Consider any two distinct elements x , y ∈ U . Then if h ∈ H is 2 picked randomly then the probability of a collision between x and y should be at most 1 / m . In other words Pr [h(x) = h(y)] = 1 / m (cannot be smaller). Second property is stronger than the first and the crucial issue. 3 Definition A family hash function H is (2-)universal if for all distinct x , y ∈ U , Pr h [h(x) = h(y)] = 1 / m where m is the table size. Chandra & Ruta (UIUC) CS473 34 Fall 2016 34 / 41
Uniform hashing Question: What are good properties of H in distributing data? Consider any element x ∈ U . Then if h ∈ H is picked 1 randomly then x should go into a random slot in T . In other words Pr [h(x) = i] = 1 / m for every 0 ≤ i < m . (Uniform) Consider any two distinct elements x , y ∈ U . Then if h ∈ H is 2 picked randomly then the probability of a collision between x and y should be at most 1 / m . In other words Pr [h(x) = h(y)] = 1 / m (cannot be smaller). Second property is stronger than the first and the crucial issue. 3 Definition A family hash function H is (2-)universal if for all distinct x , y ∈ U , Pr h [h(x) = h(y)] = 1 / m where m is the table size. Note: The set of all hash functions satisfies stronger properties! Chandra & Ruta (UIUC) CS473 34 Fall 2016 34 / 41
Analyzing Universal Hashing T is hash table of size m . 1 S ⊆ U is a fixed set of size ≤ m . 2 h is chosen randomly from a universal hash family H . 3 x is a fixed element of U . 4 Question: What is the expected time to look up x in T using h assuming chaining used to resolve collisions? Chandra & Ruta (UIUC) CS473 35 Fall 2016 35 / 41
Analyzing Universal Hashing Question: What is the expected time to look up x in T using h assuming chaining used to resolve collisions? The time to look up x is the size of the list at T[h(x)] : same as 1 the number of elements in S that collide with x under h . Let ℓ (x) be this number. We want E [ ℓ (x)] 2 For y ∈ S let A y be the event that x , y collide and D y be the 3 corresponding indicator variable. Chandra & Ruta (UIUC) CS473 36 Fall 2016 36 / 41
Analyzing Universal Hashing Continued... Number of elements colliding with x : ℓ (x) = � y ∈ S D y . � ⇒ E [ ℓ (x)] = E [D y ] linearity of expectation y ∈ S � = Pr[h(x) = h(y)] y ∈ S 1 � = since H is a universal hash family m y ∈ S = | S | / m ≤ 1 if | S | ≤ m Chandra & Ruta (UIUC) CS473 37 Fall 2016 37 / 41
Analyzing Universal Hashing Question: What is the expected time to look up x in T using h assuming chaining used to resolve collisions? Answer: O(n / m) . Chandra & Ruta (UIUC) CS473 38 Fall 2016 38 / 41
Analyzing Universal Hashing Question: What is the expected time to look up x in T using h assuming chaining used to resolve collisions? Answer: O(n / m) . Comments: O(1) expected time also holds for insertion. 1 Analysis assumes static set S but holds as long as S is a set 2 formed with at most O(m) insertions and deletions. Worst-case : look up time can be large! How large? 3 Ω(log n / log log n) [Lower bound holds even under stronger assumptions.] Chandra & Ruta (UIUC) CS473 38 Fall 2016 38 / 41
Recommend
More recommend