V K Simon J. Puglisi n Rajeev Raman dynamic associative map map - PowerPoint PPT Presentation

Fast and Simple Compact Hashing via Bucketing Dominik Köppl f V K Simon J. Puglisi n Rajeev Raman

dynamic associative map map f K V n ● K, V: sets ● f maps a dynamic subset of size n of K to V ● common representations of f – search tree – hash table 2

setting ● K = [1..|2 ω |] f K V n ● V = [1..|V|] ● in case that ω ≤ 20 – use plain array to represent f MiB = 1024 2 – space: lg |V|/8 MiB example: ● for larger ω not feasible  |K| = 2 32  |V| = 2 32 3

memory benchmark ● setting : – 32 bit keys – 32 bit values – randomly generated ● std: C++ STL hash table 「 unordered_map 」 – closed addressing – n = 2 16 = 65536 : more than 2 GiB RAM needed! 4

closed addressing pointer array buckets = linked lists 1 8 : apple 5: lemon 7: kiwi 2 h(3) = 5 3 : pear 3 2: grapes 1: apple 4 5 3: pear h: hash function 5

array list array: ● key and values stored in a list ● ordered by insertion time 6

array list searching a key: key value ● O( n ) time grapes 2 apple 8 ● if we sort, insertion lemon 。 5 becomes O(lg n ) n 。 apple 1 。 amortized time kiwi 7 (not fast) search 3 3 pear answer 7

google sparse hash google: – open addressing – grouped into dynamic buckets – a bit vector addresses buckets 8

sparse hash table buckets = arrays bit vector 1 1 2 0 8 : apple 7: lemon h(3) = 4 3 1 ` 4 0 1 3 : pear 5 1 3: pear 2: kiwi 2: kiwi 1: apple 1: apple 6 1 9

compact hashing Cleary '84: ● open addressing ● φ : K φ(K) bijection → – φ( k ) = (h( k ), r( k )) – φ -1 (h( k ),r( k )) = k ● instead of k store r( k ) (may need less space than k ) 10

compact hashing h( k ) (r( k ), value) φ( k ) = (h( k ), r( k )) 1 2: kiwi φ(5) = (3,2) 2 1: apple 3 2: lemon 5 : lemon 4 3: apple 5 φ -1 (3,2)=5 11

Cleary: linear probing displacement info h( k ) (r( k ), value) φ( k ) = (h( k ), r( k )) 1 2: kiwi φ(4) = (3,1) 2 1: apple 3 2: lemon 4 : pear collision 4 3: apple 5 1: pear 3 as a plain array: φ -1 (5,1)= 8 ≠ 4 costs too much space! 12

displacement info m : image size of h representations : = # cells in H ● Cleary '84: 2 m bits ● Poyias+ '15: – Elias γ code 1 2 3 4 5 6 20 1 0 1 9 11 – layered array 010 1 010 0001010 000010101 0001100 13

displacement info representations : ● Cleary '84: 2 m bits displacement: 20 ● Poyias+ '15: 4 bit integer array – Elias γ code 1 2 3 4 5 6 – layered array -1 1 0 1 9 11 insert: - key: 5 - value: 20 hash table 14

memory benchmark ● c: compact – layered – max. load factor 0.5 ● not space effjcient! 15

memory benchmark ● c+s: composition of – compact with – sparse ● competitive with array 16

chain ● composition of – closed addressing – array – compact ● most space effjcient (our contribution) 21

chain ● closed addressing ● buckets: instead of lists use two arrays 1 8 : apple 5: lemon 7: kiwi φ(3) = (1,2) ... 3 : pear key bucket 8 5 7 2 1 apple lemon kiwi pear compact ... value bucket like array 22

chain: space analysis ● a bucket costs O(ω) bits (pointer + length) ● want O( n lg n ) bits space for improvement! ⇒ # buckets: O( n / ω) ● then m = n / ω (image size of h) ● r( k ) uses ~ ω - lg( n /ω) = ω - lg n + lg ω bits ● K = [1..2 ω ] r( k ) of compact ● n : #elements 23

improve space ● want n buckets such that m = n ● but each bucket costs O(ω) bits! ● idea: maintain buckets in a group (similar to sparse) 24

chain → grp ● chain represents each bucket separately ● grp uses bit vector to mark bucket boundaries 1 8 : apple 5: lemon 7: kiwi 2 3 2: grapes 1: apple ... 1 0 0 0 1 1 0 0 8 : apple 5: lemon 7: kiwi 2: grapes 1: apple 25

rehashing chain grp ● if a group reaches ● if a bucket reaches O(ω) elements O(ω) elements ● group bit vector has O(ω) bits, ● scan bit vector naively we set this maximum bucket / group size to 255 in practice ( length costs a byte) ⇒ 26

insertion time chain grp ● bucket has ● group has O(ω) elements O(ω) elements ⇒ O(ω) worst-case time (assuming that we do not need to rehash) 27

query time chain grp ● bit vector has O(ω) bits ● bucket has ⇒ fjnd respective bucket O(ω) elements in O(1) expected time ⇒ O(ω) worst-case ● bucket size is O(1) time expected ⇒ O(1) expected time assume that Ω(ω) bits fjt into a machine word 28

theoretic space bounds to store n keys from K = [1..2 ω ] we need at least 29

theoretic space bounds ε (0,1] constant ∈ construction query hash expected space in bits time table time cleary (1+ε) B + O( n ) O(1/ε 3 ) exp. O(1/ε 2 ) elias (1+ε) B + O( n ) O(1/ε) exp. O(1/ε) (1+ε) B + layered O(1/ε) exp. O(1/ε) O(n lglglglglg n ) chain B + O( n lg ω) O(ω) worst O(ω) worst grp B + O( n ) O(ω) worst O(1) 30

average space per element ● max. load factor = 0.95 ● use sparse ● grp has the smallest space requirements layout ● 32 bit keys ● cleary, chain, and elias are roughly equal ● 8 bit values ● google and layered are not as space economic 31

construction time elias is very slow omit it → 32

construction time ● google is fastest ● grp is always slower than chain ● cleary and layered are slow 33

query time ● grp is mostly slower than chain ● google is fastest. cleary and layered have spikes (happening at high load factors) 34

experimental summary construction query hash table space time time google bad fast fast cleary good slow slow elias good very slow very slow layered average slow fast chain good fast slow grp best fast slow but sometimes slower than grp at high loads 35

proposed two hash tables ● techniques are ● characteristics: – no displacement info combination of – memory-effjcient – closed addressing – fast construction but – bucketing [Askitis'09] – slow query times – compact hashing ● current research: [Cleary'84] – speed up queries with SIMD – bit vector like in – overfmow table for averaging google's sparse table the loads of the buckets thank you for watching! 36

V K Simon J. Puglisi n Rajeev Raman dynamic associative map map - PowerPoint PPT Presentation

Fast and Simple Compact Hashing via Bucketing Dominik Kppl f V K Simon J. Puglisi n Rajeev Raman dynamic associative map map f K V n K, V: sets f maps a dynamic subset of size n of K to V common representations of f

(see online references) Outline Regular expressions Customized sort Files 1 Some

Learning(Distribu.ons(over(Logical(Forms(for( Referring(Expression(Genera.on(

Introduction to Perl Scott Hazelhurst http://www.bioinf.wits.ac.za/~scott/perl.pdf August 2013

Generic Methods 36 What are Generic Methods? Generic methods = methods that introduce type

Learning Distribu.ons over Logical Forms for Referring Expression

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

Go 2 Draft Designs Hello everyone, Im here to talk about the draft designs that the Go team

East Malling Rootstock Breeding Club NIAB EMR Update January 2018 Agenda Minutes and

A Thorough Formalization of Conceptual Spaces Lucas Bechberger and Kai-Uwe Khnberger The

Categorical Feature Compression via Submodular Optimization Mohammad Hossein Bateni, Lin Chen,

Massive Schema Changes in Facebook Jesse Salomon, Junyi Lu Software Engineer, Production

Generative and discriminative classification techniques Machine Learning and Category

Mathematics 3670: Computer Systems Bits, Data Types, and Operations Dr. Andrew Mertz Mathematics

Welcome! Check your audio connection to be sure your speakers are on and the volume is up. An

1 PHP: PHP Hypertext Processor Our first web model

N328 Visualizing Information Week 2 | Data Abstractions & Intro to Tableau Khairi Reda |

Semantic mining: Unsupervised acquisition of multilingual semantic classes from texts Presenter:

VLSI Design Part 2.1.1: Combinational circuit Liang Liu liang.liu@eit.lth.se 1 Lund University

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano

ENERGY STAR Connected Thermostats Stakeholder Working Meeting Field Savings Metric May 26, 2017

USDA Foods 101 Marlon Hopkins Supervisor, Food Distribution Program OSPI Child Nutrition

NE Food Processors Community of Practice VT Food Venture Center Coastal Farms Food Processing

I LOVE MAKING SENSE OF MESSES. Thinking about INFORMATION as a material is hard Information can

V K Simon J. Puglisi n Rajeev Raman dynamic associative map map - PowerPoint PPT Presentation

Fast and Simple Compact Hashing via Bucketing Dominik Kppl f V K Simon J. Puglisi n Rajeev Raman dynamic associative map map f K V n K, V: sets f maps a dynamic subset of size n of K to V common representations of f

(see online references) Outline Regular expressions Customized sort Files 1 Some

Learning(Distribu.ons(over(Logical(Forms(for( Referring(Expression(Genera.on(

Introduction to Perl Scott Hazelhurst http://www.bioinf.wits.ac.za/~scott/perl.pdf August 2013

Generic Methods 36 What are Generic Methods? Generic methods = methods that introduce type

Learning Distribu.ons over Logical Forms for Referring Expression

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

Go 2 Draft Designs Hello everyone, Im here to talk about the draft designs that the Go team

East Malling Rootstock Breeding Club NIAB EMR Update January 2018 Agenda Minutes and

A Thorough Formalization of Conceptual Spaces Lucas Bechberger and Kai-Uwe Khnberger The

Categorical Feature Compression via Submodular Optimization Mohammad Hossein Bateni, Lin Chen,

Massive Schema Changes in Facebook Jesse Salomon, Junyi Lu Software Engineer, Production

Generative and discriminative classification techniques Machine Learning and Category

Mathematics 3670: Computer Systems Bits, Data Types, and Operations Dr. Andrew Mertz Mathematics

Welcome! Check your audio connection to be sure your speakers are on and the volume is up. An

1 PHP: PHP Hypertext Processor Our first web model

N328 Visualizing Information Week 2 | Data Abstractions &amp; Intro to Tableau Khairi Reda |

Semantic mining: Unsupervised acquisition of multilingual semantic classes from texts Presenter:

VLSI Design Part 2.1.1: Combinational circuit Liang Liu liang.liu@eit.lth.se 1 Lund University

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano

ENERGY STAR Connected Thermostats Stakeholder Working Meeting Field Savings Metric May 26, 2017

USDA Foods 101 Marlon Hopkins Supervisor, Food Distribution Program OSPI Child Nutrition

NE Food Processors Community of Practice VT Food Venture Center Coastal Farms Food Processing

I LOVE MAKING SENSE OF MESSES. Thinking about INFORMATION as a material is hard Information can

N328 Visualizing Information Week 2 | Data Abstractions & Intro to Tableau Khairi Reda |