Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu - PowerPoint PPT Presentation

Mitigating Asymmetric Read and Write Costs in Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu Chen, Yuncheng Guo Huazhong University of Science and Technology USENIX ATC 2019

Query Services in Cloud Storage Systems  Large amounts of data • 300 new profiles and more than 208 thousand photos per minute [September 2018@Facebook] … 2

Query Services in Cloud Storage Systems  Large amounts of data • 300 new profiles and more than 208 thousand photos per minute [September 2018@Facebook] Demanding the support of low-latency and high-throughput queries … 3

Hash structures  Constant-scale read performance • Widely used in key-value stores and relational databases 4

Hash structures  Constant-scale read performance • Widely used in key-value stores and relational databases ꭗ High latency for handling hash collisions 5

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations a n k T1 m b T2 6

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations Insert(x) a n k T1 m b T2 7

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations n k m T1 a x b T2 8

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity a n k f T1 m c b T2 9

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity Find(c) a n k f T1 m c b T2 10

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance a a n n k f f T1 m m c c b T2 11

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance Insert(x) a a n n k f f T1 m m c c b T2 12

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance Insert(x) a a n n k f f T1 m m c c b T2 13

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance Insert(x) n a n k f T1 c f x a m m c b T2 An endless loop occurs! 14

Cuckoo Hashing  Multi-choice hashing  Handling hash collisions: kick-out operations  For reads, only limited positions are probed => O(1) time complexity  For writes, endless loops may occur! => slow-write performance Insert(x) n a n k f T1 c f Bottleneck: Asymmetric reads and writes! x a m m c b T2 An endless loop occurs! 15

Concurrency in Multi-core Systems  Existing concurrency strategy for cuckoo hashing • Lock two buckets before each kick- out operation (libcuckoo@EuroSys’14) 16

Concurrency in Multi-core Systems  Existing concurrency strategy for cuckoo hashing • Lock two buckets before each kick- out operation (libcuckoo@EuroSys’14)  Challenges: • Inefficient insertion performance • Limited scalability 17

Concurrency in Multi-core Systems  Existing concurrency strategy for cuckoo hashing • Lock two buckets before each kick- out operation (libcuckoo@EuroSys’14)  Challenges: • Inefficient insertion performance • Limited scalability  Design goal: • A high-throughput and concurrency-friendly cuckoo hash table 18

Our Approach: CoCuckoo  Pseudoforests to predetermine endless loops  Efficient concurrency strategy • A graph-grained locking mechanism • Concurrency optimization to reduce the length of critical path  Higher throughput than state-of-the-art scheme, i.e., libcuckoo 19

Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) a a n n k k f f T1 m m c c b b T2 20

Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) n a n k k f T1 c f a m c b b T2 m 21

Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) Maximal n a n k k f T1 c f a m c b b T2 m 22

Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) Non-maximal Maximal n a n k f T1 b c f a k m c b T2 m Vacancy 23

Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Insert(y) y Non-maximal Maximal n a n k f T1 y b c f a k m c b T2 m Vacancy 24

Pseudoforest  Vertex: a bucket  Edge: an inserted item from the storage vertex to its backup vertex  Identify endless loops: #Vertices = #Edges (called maximal) Maximal n a n b f T1 y c f a b y m c T2 k m k 25

Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest n a a n n k k f f T1 b c f a k m m m c c b b T2 26

Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest n a a n n k k f f T1 b c f a k m m m c c b b T2 27

Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest  Classify insertions into 3 cases, which include 6 subcases EMPTY Non-maximal Maximal 28

Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest  Classify insertions into 3 cases, which include 6 subcases EMPTY Non-maximal Maximal TwoEmpty According to the number of / OneEmpty corresponding EMPTY subgraphs ZeroEmpty 29

Graph-grained Locking  EMPTY subgraph: buckets not represented in pseudoforest  Classify insertions into 3 cases, which include 6 subcases EMPTY Non-maximal Maximal TwoEmpty According to the number of / OneEmpty corresponding EMPTY subgraphs ZeroEmpty Diff_non_non Same_non According to the states Diff_non_max and the number of subgraphs / Max 30

TwoEmpty  Two EMPTY subgraphs T1 Before insertion T2 31

TwoEmpty  Two EMPTY subgraphs With graph-grained lock(s)  Out of the critical path  Insertion algorithm: Atomically assign allocated subgraph number to two buckets critical Insert item path Mark the subgraph as non-maximal T1 Before insertion T2 32

TwoEmpty  Two EMPTY subgraphs With graph-grained lock(s)  Out of the critical path  Insertion algorithm: Atomically assign allocated subgraph number to two buckets critical Insert item path Mark the subgraph as non-maximal a k f T1 Before insertion After insertion T2 33

/ OneEmpty  One EMPTY subgraph (the other is non-maximal/maximal) a / k T1 f Before insertion T2 34

/ OneEmpty  One EMPTY subgraph (the other is non-maximal/maximal)  Insertion algorithm:  Two atomic operations without locks  Assign the existing subgraph number to the new vertex  Insert the item into the new vertex a n / k T1 f Before insertion / After insertion b T2 35

ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs Insert(c) a a n n k f f T1 b T2 36

ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs Insert(c) a n k f T1 n a f b T2 37

ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs Insert(c) Non-maximal a n k f T1 Non-maximal n a f b T2 38

ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs Non-maximal a n k f T1 n a f c c b T2 39

ZeroEmpty (Diff_non_non)  Two different non-maximal subgraphs Before insertion  Insertion algorithm: Kick-out (with item insertion) Merge two subgraphs After insertion Non-maximal a n k f T1 n a f c c b T2 40

ZeroEmpty (Same_non) Before insertion  The same non-maximal subgraph  Insertion algorithm: Mark as maximal  Kick-out (with item insertion) Insert(m) a n a n k f T1 f c c b T2 41

Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu - PowerPoint PPT Presentation

Mitigating Asymmetric Read and Write Costs in Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu Chen, Yuncheng Guo Huazhong University of Science and Technology USENIX ATC 2019 Query Services in Cloud Storage Systems Large

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Lecture #2: Advanced hashing and concentration bounds o Bloom filters o Cuckoo hashing o Load

Advanced Algorithms COMS31900 Hashing part three Cuckoo Hashing Rapha el Clifford Slides

Cuckoo Search via Lvy flights X. S. Yang and Suash Deb NABIC, 2009, IEEE Presented by Cihan

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing Xiaozhou

Today Load balancing. Balls in Bins. Power of two choices. Cuckoo hashing. n k k n

First experiences with Cuckoo bags John McHugh - RedJack, LLC and The University of North

AniFilter: Parallel and Failure-Atomic Cuckoo Filter for Non-Volatile Memories Hyungjun Oh 1 ,

Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs Sepehr Assadi

Graph Ordering Lecture 16 CSCI 4974/6971 27 Oct 2016 1 / 12 Todays Biz 1. Reminders 2.

Graph-Processing Systems (focusing on GraphChi) Recall: PageRank in MapReduce (Hadoop) (a,[c])

The Craft of XML Text Encoding in historical and humanistic context Wendell Piez JADH 2015

Quadratic functions Elementary Functions In the last lecture we studied polynomials of simple form

1 2 3 4 By fixed I mean the topology is fixed. Model can be dynamic in that it moves

Proposal for a new recob::Vertex G. Cerati LArSoft Coordination Meeting Dec. 5, 2017

MA/CSSE 473 Day 36 Kruskal proof recap Prim Data Structures and detailed algorithm. Recap: MST