LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne - PowerPoint PPT Presentation

LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg

LSM-Trie Overview ■ 32MB Htable KV-item organization ■ Almost no index – hash based ■ Fixed size buckets to match disk blocks (4KB) ■ Linear and Exponential levels in the trie (112 total) ■ 16bit bloom filters (5% false positive rate achieved) ■ 1 disk read necessary for bloom filters (BloomCluster) ■ Optimized for up to 10TB store https://www.researchgate.net/profile/Pasi_Fraenti/publication/321323711/figure/fig8/AS:576074708525076@1514358321712/Prefix-tree-example.png

Question 1 “In the meantime, for some KV stores, such as SILT [24], major efforts are made to optimize reads by minimizing metadata size, while write performance can be compromised without conducting multi- level incremental compactions” Explain how high write amplifications are produced in SILT. ■ Single SortedStore on disk for everything ■ Entries in HashStore can cover large range ■ Large ratio between actual data to write and data to merge http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

Question 2 “Note that LSM -trie uses hash functions to organize its data and accordingly does not support range search.” Do FAWN and LevelDB support range search? ■ FAWN is hash based – no range search ■ LevelDB stores sorted KV pairs, indices are block ranges – can range search

Question 3 Use Figure 1 to explain the difference between linear and exponential growth patterns.

Question 4 “Because 4KB block is a disk access unit, it is not necessary to maintain a larger index to determine byte offset of each item in a block.” Show how a lookup with a given key is carried out in LevelDB? ■ Binary search MemTable ■ Recursively binary search and check bloom filter for SSTables that index is in range of on each level ■ Retrieve value http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

Question 5 “Instead, we first apply a cryptographic hash function, such as SHA -1, on the key, and then use the hashed key, or hashkey in short, to make the determination.” Assuming a user- provided key has 160 bits, what’s the issue if LSM -trie used the user keys, instead of hashed keys, in its data structure and operations? ■ Cryptographic hash follows normal distribution ■ User key may be unbalanced https://appliedgo.net/balancedtree/

Question 6 “Among all compactions moving data from Lk to Lk+1, we must make sure their key ranges are not overlapped to keep any two SSTables at Level L k+1 from having overlapped key ranges. However, this cannot be achieved with the LevelDB data organization …” Please explain why LevelDB cannot achieve it? ■ SSTable has limited capacity ■ Key range size of SSTable highly variable ■ SSTables cover different ranges at each sublevel http://ranger.uta.edu/~sjiang/CSE6350-spring-19/lecture-7.pdf

Question 7 Use Figures 2 and 3 to describe the LSM- trie’s structure and how compaction is performed in the trie.

Conclusion ■ Optimized for many small items ■ High performance read and write ■ Hash based with some indices used for large items ■ No range search ■ Utilizes exponential levels (5) and linear levels (8 per exponential levels 1-4, 80 on level 5) to store up to 10TB of data

LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne - PowerPoint PPT Presentation

LOG-STRUCTURED MERGE-TRIE PART 1 Xingbo Wu and Yuehai Xu, Wayne State University; Zili Shao, The Hong Kong Polytechnic University; Song Jiang, Wayne State University Presented by: Joel Friberg LSM-Trie Overview 32MB Htable KV-item

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Section5.4 Properties of Logarithmic Functions PropertiesofLogarithms Formulas Basic

STUDIES OF CLOSED/OPEN MIRROR SYMMETRY FOR QUINTIC THREE-FOLDS THROUGH LOG MIXED HODGE THEORY 0.

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

Complementary log-log and probit: activation functions implemented in artificial neural networks

CS4102 Algorithms Summer 2020 Warm up Show log ! = ( log ) Hint: show !

CS320: Performance Evaluation Plotting data sets Semi-log plots Log-log plots Analyzing Program

Log-Structured File System CS 416: Operating Systems Design, Spring 2011 Department of Computer

Learning Dynamic Manipulation Skills under Unknown Dynamics with Guided Policy Search Sergey

Motivation Status quo Providing efficient PKC in embedded systems is challenging

Congruences and the Thue-Morse sequence Emeric Deutsch Department of Mathematics Polytechnic

Confium: an open-source framework to support threshold cryptography standardization NIST MPTS

Some thoughts on publishing in AOS Galway March 2019 Caveat: I was editor-in-chief, I am an

Computable Real Functions Parameterized Uniform Parameterized Uniform From NP -hard to polytime

What is the Nature of Logic? Judy Pelham Philosophy, York University, Canada July 16, 2013

SOCI 325: Sociology of science Agenda 1. Administrative 2. Technology and the standardization of