cuckoo hashing for storage systems
play

Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu - PowerPoint PPT Presentation

Mitigating Asymmetric Read and Write Costs in Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu Chen, Yuncheng Guo Huazhong University of Science and Technology USENIX ATC 2019 Query Services in Cloud Storage Systems Large


  1. Mitigating Asymmetric Read and Write Costs in Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu Chen, Yuncheng Guo Huazhong University of Science and Technology USENIX ATC 2019

  2. Query Services in Cloud Storage Systems  Large amounts of data • 300 new profiles and more than 208 thousand photos per minute [September 2018@Facebook] … 2

  3. Query Services in Cloud Storage Systems  Large amounts of data • 300 new profiles and more than 208 thousand photos per minute [September 2018@Facebook] Demanding the support of low-latency and high-throughput queries … 3

  4. Hash structures  Constant-scale read performance • Widely used in key-value stores and relational databases 4

  5. Hash structures  Constant-scale read performance • Widely used in key-value stores and relational databases ꭗ High latency for handling hash collisions 5

  6. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations a n k T1 m b T2 6

  7. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations Insert(x) a n k T1 m b T2 7

  8. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations n k m T1 a x b T2 8

  9. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity a n k f T1 m c b T2 9

  10. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity Find(c) a n k f T1 m c b T2 10

  11. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance a a n n k f f T1 m m c c b T2 11

  12. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance Insert(x) a a n n k f f T1 m m c c b T2 12

  13. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance Insert(x) a a n n k f f T1 m m c c b T2 13

  14. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance Insert(x) n a n k f T1 c f x a m m c b T2 An endless loop occurs! 14

  15. Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance Insert(x) n a n k f T1 c f Bottleneck: Asymmetric reads and writes! x a m m c b T2 An endless loop occurs! 15

  16. Concurrency in Multi-core Systems  Existing concurrency strategy for cuckoo hashing • Lock two buckets before each kick- out operation (libcuckoo@EuroSys’14) 16

  17. Concurrency in Multi-core Systems  Existing concurrency strategy for cuckoo hashing • Lock two buckets before each kick- out operation (libcuckoo@EuroSys’14)  Challenges: • Inefficient insertion performance • Limited scalability 17

  18. Concurrency in Multi-core Systems  Existing concurrency strategy for cuckoo hashing • Lock two buckets before each kick- out operation (libcuckoo@EuroSys’14)  Challenges: • Inefficient insertion performance • Limited scalability  Design goal: • A high-throughput and concurrency-friendly cuckoo hash table 18

  19. Our Approach: CoCuckoo  Pseudoforests to predetermine endless loops  Efficient concurrency strategy • A graph-grained locking mechanism • Concurrency optimization to reduce the length of critical path  Higher throughput than state-of-the-art scheme, i.e., libcuckoo 19

  20. Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) a a n n k k f f T1 m m c c b b T2 20

  21. Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) n a n k k f T1 c f a m c b b T2 m 21

  22. Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) Maximal n a n k k f T1 c f a m c b b T2 m 22

  23. Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) Non-maximal Maximal n a n k f T1 b c f a k m c b T2 m Vacancy 23

  24. Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) y Non-maximal Maximal n a n k f T1 y b c f a k m c b T2 m Vacancy 24

  25. Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Maximal n a n b f T1 y c f a b y m c T2 k m k 25

  26. Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest n a a n n k k f f T1 b c f a k m m m c c b b T2 26

  27. Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest n a a n n k k f f T1 b c f a k m m m c c b b T2 27

  28. Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest  Classify insertions into 3 cases, which include 6 subcases EMPTY Non-maximal Maximal 28

  29. Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest  Classify insertions into 3 cases, which include 6 subcases EMPTY Non-maximal Maximal TwoEmpty According to the number of / OneEmpty corresponding EMPTY subgraphs ZeroEmpty 29

  30. Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest  Classify insertions into 3 cases, which include 6 subcases EMPTY Non-maximal Maximal TwoEmpty According to the number of / OneEmpty corresponding EMPTY subgraphs ZeroEmpty Diff_non_non Same_non According to the states Diff_non_max and the number of subgraphs / Max 30

  31. TwoEmpty  Two EMPTY subgraphs T1 Before insertion T2 31

  32. TwoEmpty  Two EMPTY subgraphs With graph-grained lock(s)  Out of the critical path  Insertion algorithm: Atomically assign allocated subgraph number to two buckets critical Insert item path Mark the subgraph as non-maximal T1 Before insertion T2 32

  33. TwoEmpty  Two EMPTY subgraphs With graph-grained lock(s)  Out of the critical path  Insertion algorithm: Atomically assign allocated subgraph number to two buckets critical Insert item path Mark the subgraph as non-maximal a k f T1 Before insertion After insertion T2 33

  34. / OneEmpty  One EMPTY subgraph (the other is non-maximal/maximal) a / k T1 f Before insertion T2 34

  35. / OneEmpty  One EMPTY subgraph (the other is non-maximal/maximal)  Insertion algorithm:  Two atomic operations without locks  Assign the existing subgraph number to the new vertex  Insert the item into the new vertex a n / k T1 f Before insertion / After insertion b T2 35

  36. ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs Insert(c) a a n n k f f T1 b T2 36

  37. ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs Insert(c) a n k f T1 n a f b T2 37

  38. ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs Insert(c) Non-maximal a n k f T1 Non-maximal n a f b T2 38

  39. ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs Non-maximal a n k f T1 n a f c c b T2 39

  40. ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs After insertion Non-maximal a n k f T1 n a f c c b T2 40

  41. ZeroEmpty (Same_non) Before insertion  The same non-maximal subgraph  Insertion algorithm: Mark as maximal  Kick-out (with item insertion) Insert(m) a n a n k f T1 f c c b T2 41

Recommend


More recommend