v k
play

V K Simon J. Puglisi n Rajeev Raman dynamic associative map map - PowerPoint PPT Presentation

Fast and Simple Compact Hashing via Bucketing Dominik Kppl f V K Simon J. Puglisi n Rajeev Raman dynamic associative map map f K V n K, V: sets f maps a dynamic subset of size n of K to V common representations of f


  1. Fast and Simple Compact Hashing via Bucketing Dominik Köppl f V K Simon J. Puglisi n Rajeev Raman

  2. dynamic associative map map f K V n ● K, V: sets ● f maps a dynamic subset of size n of K to V ● common representations of f – search tree – hash table 2

  3. setting ● K = [1..|2 ω |] f K V n ● V = [1..|V|] ● in case that ω ≤ 20 – use plain array to represent f MiB = 1024 2 – space: lg |V|/8 MiB example: ● for larger ω not feasible  |K| = 2 32  |V| = 2 32 3

  4. memory benchmark ● setting : – 32 bit keys – 32 bit values – randomly generated ● std: C++ STL hash table 「 unordered_map 」 – closed addressing – n = 2 16 = 65536 : more than 2 GiB RAM needed! 4

  5. closed addressing pointer array buckets = linked lists 1 8 : apple 5: lemon 7: kiwi 2 h(3) = 5 3 : pear 3 2: grapes 1: apple 4 5 3: pear h: hash function 5

  6. array list array: ● key and values stored in a list ● ordered by insertion time 6

  7. array list searching a key: key value ● O( n ) time grapes 2 apple 8 ● if we sort, insertion lemon 。 5 becomes O(lg n ) n 。 apple 1 。 amortized time kiwi 7 (not fast) search 3 3 pear answer 7

  8. google sparse hash google: – open addressing – grouped into dynamic buckets – a bit vector addresses buckets 8

  9. sparse hash table buckets = arrays bit vector 1 1 2 0 8 : apple 7: lemon h(3) = 4 3 1 ` 4 0 1 3 : pear 5 1 3: pear 2: kiwi 2: kiwi 1: apple 1: apple 6 1 9

  10. compact hashing Cleary '84: ● open addressing ● φ : K φ(K) bijection → – φ( k ) = (h( k ), r( k )) – φ -1 (h( k ),r( k )) = k ● instead of k store r( k ) (may need less space than k ) 10

  11. compact hashing h( k ) (r( k ), value) φ( k ) = (h( k ), r( k )) 1 2: kiwi φ(5) = (3,2) 2 1: apple 3 2: lemon 5 : lemon 4 3: apple 5 φ -1 (3,2)=5 11

  12. Cleary: linear probing displacement info h( k ) (r( k ), value) φ( k ) = (h( k ), r( k )) 1 2: kiwi φ(4) = (3,1) 2 1: apple 3 2: lemon 4 : pear collision 4 3: apple 5 1: pear 3 as a plain array: φ -1 (5,1)= 8 ≠ 4 costs too much space! 12

  13. displacement info m : image size of h representations : = # cells in H ● Cleary '84: 2 m bits ● Poyias+ '15: – Elias γ code 1 2 3 4 5 6 20 1 0 1 9 11 – layered array 010 1 010 0001010 000010101 0001100 13

  14. displacement info representations : ● Cleary '84: 2 m bits displacement: 20 ● Poyias+ '15: 4 bit integer array – Elias γ code 1 2 3 4 5 6 – layered array -1 1 0 1 9 11 insert: - key: 5 - value: 20 hash table 14

  15. memory benchmark ● c: compact – layered – max. load factor 0.5 ● not space effjcient! 15

  16. memory benchmark ● c+s: composition of – compact with – sparse ● competitive with array 16

  17. chain ● composition of – closed addressing – array – compact ● most space effjcient (our contribution) 21

  18. chain ● closed addressing ● buckets: instead of lists use two arrays 1 8 : apple 5: lemon 7: kiwi φ(3) = (1,2) ... 3 : pear key bucket 8 5 7 2 1 apple lemon kiwi pear compact ... value bucket like array 22

  19. chain: space analysis ● a bucket costs O(ω) bits (pointer + length) ● want O( n lg n ) bits space for improvement! ⇒ # buckets: O( n / ω) ● then m = n / ω (image size of h) ● r( k ) uses ~ ω - lg( n /ω) = ω - lg n + lg ω bits ● K = [1..2 ω ] r( k ) of compact ● n : #elements 23

  20. improve space ● want n buckets such that m = n ● but each bucket costs O(ω) bits! ● idea: maintain buckets in a group (similar to sparse) 24

  21. chain → grp ● chain represents each bucket separately ● grp uses bit vector to mark bucket boundaries 1 8 : apple 5: lemon 7: kiwi 2 3 2: grapes 1: apple ... 1 0 0 0 1 1 0 0 8 : apple 5: lemon 7: kiwi 2: grapes 1: apple 25

  22. rehashing chain grp ● if a group reaches ● if a bucket reaches O(ω) elements O(ω) elements ● group bit vector has O(ω) bits, ● scan bit vector naively we set this maximum bucket / group size to 255 in practice ( length costs a byte) ⇒ 26

  23. insertion time chain grp ● bucket has ● group has O(ω) elements O(ω) elements ⇒ O(ω) worst-case time (assuming that we do not need to rehash) 27

  24. query time chain grp ● bit vector has O(ω) bits ● bucket has ⇒ fjnd respective bucket O(ω) elements in O(1) expected time ⇒ O(ω) worst-case ● bucket size is O(1) time expected ⇒ O(1) expected time assume that Ω(ω) bits fjt into a machine word 28

  25. theoretic space bounds to store n keys from K = [1..2 ω ] we need at least 29

  26. theoretic space bounds ε (0,1] constant ∈ construction query hash expected space in bits time table time cleary (1+ε) B + O( n ) O(1/ε 3 ) exp. O(1/ε 2 ) elias (1+ε) B + O( n ) O(1/ε) exp. O(1/ε) (1+ε) B + layered O(1/ε) exp. O(1/ε) O(n lglglglglg n ) chain B + O( n lg ω) O(ω) worst O(ω) worst grp B + O( n ) O(ω) worst O(1) 30

  27. average space per element ● max. load factor = 0.95 ● use sparse ● grp has the smallest space requirements layout ● 32 bit keys ● cleary, chain, and elias are roughly equal ● 8 bit values ● google and layered are not as space economic 31

  28. construction time elias is very slow omit it → 32

  29. construction time ● google is fastest ● grp is always slower than chain ● cleary and layered are slow 33

  30. query time ● grp is mostly slower than chain ● google is fastest. cleary and layered have spikes (happening at high load factors) 34

  31. experimental summary construction query hash table space time time google bad fast fast cleary good slow slow elias good very slow very slow layered average slow fast chain good fast slow grp best fast slow but sometimes slower than grp at high loads 35

  32. proposed two hash tables ● techniques are ● characteristics: – no displacement info combination of – memory-effjcient – closed addressing – fast construction but – bucketing [Askitis'09] – slow query times – compact hashing ● current research: [Cleary'84] – speed up queries with SIMD – bit vector like in – overfmow table for averaging google's sparse table the loads of the buckets thank you for watching! 36

Recommend


More recommend