strong randomness properties of hyper graphs generated by
play

Strong Randomness Properties of (Hyper-)Graphs Generated by Simple - PowerPoint PPT Presentation

Strong Randomness Properties of (Hyper-)Graphs Generated by Simple Hash Functions Martin Aum uller Technische Universit at Ilmenau, Germany AofA15 Strobl, June 8, 2015 Joint work with Martin Dietzfelbinger and Philipp Woelfel. M.


  1. Strong Randomness Properties of (Hyper-)Graphs Generated by Simple Hash Functions Martin Aum¨ uller Technische Universit¨ at Ilmenau, Germany AofA’15 Strobl, June 8, 2015 Joint work with Martin Dietzfelbinger and Philipp Woelfel. M. Aum¨ uller Graphs Generated by Simple Hash Functions 1/17

  2. Example: Cuckoo Hashing (Pagh/Rodler, 2001/2004) A hashing-based implementation of the dictionary data type. Setting: set S ⊆ U of n keys two tables T 1 [0 .. m − 1] and T 2 [0 .. m − 1], m ≥ (1 + ε ) n two (hash) functions h 1 , h 2 with h i : U → [ m ] Rules: each table cell can hold exactly one key a key x must be stored either in T 1 [ h 1 ( x )] or T 2 [ h 2 ( x )] (fast lookups and deletions!) Definition If S can be stored according to these rules, we call ( h 1 , h 2 ) suitable for S . M. Aum¨ uller Graphs Generated by Simple Hash Functions 2/17

  3. Example: Cuckoo Hashing (Pagh/Rodler, 2001/2004) A hashing-based implementation of the dictionary data type. Setting: set S ⊆ U of n keys two tables T 1 [0 .. m − 1] and T 2 [0 .. m − 1], m ≥ (1 + ε ) n two (hash) functions h 1 , h 2 with h i : U → [ m ] Rules: each table cell can hold exactly one key a key x must be stored either in T 1 [ h 1 ( x )] or T 2 [ h 2 ( x )] (fast lookups and deletions!) Definition If S can be stored according to these rules, we call ( h 1 , h 2 ) suitable for S . M. Aum¨ uller Graphs Generated by Simple Hash Functions 2/17

  4. Improving Cuckoo Hashing: Stash Original Analysis: ( h 1 , h 2 ) unsuitable with probability O (1 / n ). In fact: Θ(1 / n ) (Schellbach ’09, Drmota/Kutzelnigg ’12) (Kirsch/Mitzenmacher/Wieder ’08): Θ(1 / n ) is too large. Proposal: Can put up to s = O (1) keys into additional storage Theorem (K/M/W ’08) Let S ⊆ U with | S | = n . If ( h 1 , h 2 ) are fully random , then Pr(( h 1 , h 2 ) unsuitable for S with stash size s ) = O (1 / n s +1 ) . Again: Θ(1 / n s +1 ). (Kutzelnigg ’10) M. Aum¨ uller Graphs Generated by Simple Hash Functions 3/17

  5. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  6. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  7. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  8. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  9. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  10. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  11. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  12. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  13. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) Excess (Janson et al. ’93): #edges - #vertices (Here: 3) 3 more keys than table cells ⇒ at least 3 keys must be put into stash Minimal “bad subgraph”: a MOS s . (Example: s = 2.) M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  14. Analysis of Cuckoo Hashing with a Stash What is a criteria for ( h 1 , h 2 ) being unsuitable for stash size s ? Tool: Cuckoo graph G ( S , h 1 , h 2 ) (Devroye/Morin ’03) Theorem (K/M/W ’08) Let ( V ′ , E ′ ) consists of all connected components of G ( S , h 1 , h 2 ) having more than one cycle. Then Stash size = | E ′ | − | V ′ | . M. Aum¨ uller Graphs Generated by Simple Hash Functions 4/17

  15. The Quest M. Aum¨ uller Graphs Generated by Simple Hash Functions 5/17

  16. The Quest Analysis well understood when hash functions are fully random . M. Aum¨ uller Graphs Generated by Simple Hash Functions 5/17

  17. The Quest Analysis well understood when hash functions are fully random . Replace fully random hash functions by an explicit, efficient construction of hash functions. M. Aum¨ uller Graphs Generated by Simple Hash Functions 5/17

  18. The Quest Analysis well understood when hash functions are fully random . Replace fully random hash functions by an explicit, efficient construction of hash functions. “Simple hash functions that work in as many applications as possible” M. Aum¨ uller Graphs Generated by Simple Hash Functions 5/17

  19. The Quest Analysis well understood when hash functions are fully random . Replace fully random hash functions by an explicit, efficient construction of hash functions. “Simple hash functions that work in as many applications as possible” Other recent approaches, e. g., Thorup/Pˇ atra¸ scu ’11, Reingold/Rothblum/Wieder ’14 M. Aum¨ uller Graphs Generated by Simple Hash Functions 5/17

  20. The Quest Analysis well understood when hash functions are fully random . Replace fully random hash functions by an explicit, efficient construction of hash functions. “Simple hash functions that work in as many applications as possible” Other recent approaches, e. g., Thorup/Pˇ atra¸ scu ’11, Reingold/Rothblum/Wieder ’14 Focus on hashing-based algorithms and data structures that allow good enough bounds via first-moment method (C.H. [stash], generalized C.H., load balancing, ...) M. Aum¨ uller Graphs Generated by Simple Hash Functions 5/17

  21. The Quest Analysis well understood when hash functions are fully random . Replace fully random hash functions by an explicit, efficient construction of hash functions. “Simple hash functions that work in as many applications as possible” Other recent approaches, e. g., Thorup/Pˇ atra¸ scu ’11, Reingold/Rothblum/Wieder ’14 Focus on hashing-based algorithms and data structures that allow good enough bounds via first-moment method (C.H. [stash], generalized C.H., load balancing, ...) Generic approach? M. Aum¨ uller Graphs Generated by Simple Hash Functions 5/17

  22. Key Ingredient: Linear Functions h ( x ) = (( a · x + b ) mod p ) mod m , where p ≥ | U | is a prime, and a and b are chosen uniformly at random from { 0 , . . . , p − 1 } . → very simple structure! (Remark: This function is 2-wise independent, i. e., for any pair x , y ∈ U , x � = y , h ( x ) and h ( y ) are fully random.) M. Aum¨ uller Graphs Generated by Simple Hash Functions 6/17

  23. The Hash Class (Version for this Talk) For given c , n ≥ 1, we combine linear functions with lookups in tables of size √ n filled with random values. c z ( i ) � h i ( x ) = f i ( x ) ⊕ j [ g j ( x ) ] , i = 1 , 2 j =1 Class of all these pairs ( h 1 , h 2 ) of hash functions : Z . (Extension of hash functions from (Dietzfelbinger/Woelfel ’03)) M. Aum¨ uller Graphs Generated by Simple Hash Functions 7/17

  24. Example: Cuckoo Hashing with a Stash Main Task For given S and stash size s , calculate Pr(( h 1 , h 2 ) unsuitable for S with stash size s ) . Minimal bad subgraph: MOS s . (Example: s = 2.) M. Aum¨ uller Graphs Generated by Simple Hash Functions 8/17

  25. Thus, we have ( h 1 , h 2 ) ∈Z (( h 1 , h 2 ) unsuitable for S with stash size s ) Pr = ( h 1 , h 2 ) ∈Z ( ∃ T ⊆ S : G ( T , h 1 , h 2 ) forms a MOS s ) Pr � ≤ ( h 1 , h 2 ) ∈Z ( G ( T , h 1 , h 2 ) forms a MOS s ) Pr T ⊆ S if ( h 1 , h 2 ) are fully random, we provide a direct counting argument that this is O (1 / n s +1 ) giving an alternative proof to the original analysis by Kirsch, Mitzenmacher and Wieder (who used machinery like Markov chain coupling) M. Aum¨ uller Graphs Generated by Simple Hash Functions 9/17

Recommend


More recommend