A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores Ting Yao 1 , Jiguang Wan 1 , Ping Huang 2 , Xubin He 2 , Qingxin Gui 1 , Fei Wu 1 , and Changsheng Xie 1 1 Wuhan National Laboratory for Optoelectronics Huazhong University of Science and Technology 2 Temple University
Outline ➢ Background ➢ LWC-tree (Light-Weight Compaction tree) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary
Background Key-value stores are widespread in modern data centers. Write (key, value) 3 2 Memory Better service quality 1 4 Disk Responsive user experience L 0 Log L 1 compaction The log-structured merge tree (LSM- L 2 tree) is widely deployed. ……… RocksDB, Cassandra, Hbase, PNUTs, and L n … LevelDB High throughput write Immutable MemTable SSTable MemTable Fast read performance
Background · LSM-tree a-c sort ② Memory a b c Disk L 0 L 0 ① read Above 10x write ③ L 1 L 1 a-c L 2 L 2 a b c ……… … L n L n The overall read and write data size for a compaction: 13x 8 tables This serious I/O amplifications of compactions motivate our design !
➢ Background ➢ LWC-tree (Light-Weight t Com Compacti tion tr tree ee) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary
LWC-tree Aim Alleviate the I/O amplification Achieve high write throughput No sacrifice to read performance How Keep the basic component of LSM-tree Keep tables sorted Keep the multilevel structure ➢ Light-weight compaction – reduce I/O amplification ➢ Metadata aggregation – reduce random read in a compaction ➢ New table structure, DTable – improve lookup efficient within a table ➢ Workload balance – keep the balance of LWC-tree
LWC-tree · Light-weight compaction Aim a ’ c ’ a-c b ’ Memory sort Reduce I/O amplification Disk ① read L 0 How ③ Overwrite and append append the data and only merge L 1 L 1 a-c the metadata c c a a b b L 2 ➢ Read the victim table ➢ Sort and divide the data, merge the ……… metadata L n ➢ Overwrite and append the segment Reduce 10 x amplification The overall read and write data size for a light-weight theoretically (AF=10) compaction: 2 tables. (In LSM-tree, the overall data size for a conventional compaction: 8 tables. )
LWC-tree · Metadata Aggregation Aim Reduce random read in a compaction Efficiently obtain the metadata form Li a-c a-c overlapped Dtables c a b How a c b c ’ Li+1 b ’ a ’ Cluster the metadata of overlapped DTables to its corresponding victim Dtable A light-weight compaction after each compaction
LWC-tree · Metadata Aggregation Aim Reduce random read in a compaction Efficiently obtain the metadata form Li a-c a-c overlapped Dtables c a b How a c b c ’ Li+1 b ’ a ’ Cluster the metadata of overlapped DTables to its corresponding victim Dtable Metadata aggregation after each compaction after light-weight compaction
LWC-Tree · DTable Aim Overlapped Dtables Support Light-weight compaction Dtable Metadata Data_block i Keep the lookup efficiency within a Data_block i+1 Origin data …… DTable Segment 1 (append data) Filter blocks Segment 2 Overlapped Meta_index block (append data) How Meta_index block Metadata Index block Store the metadata of its footer corresponding overlapped Dtables Magic data Segment 1 Segment 2 Manage the data and block index in Index block Origin index Index Index segment Index block in segment
LWC-Tree · Workload Balance Aim Data block Keep the balance of LWC-tree Data volume Improve the operation efficiency … How Deliver the key range of the n 1 2 3 4 5 6 7 8 DTable number in Level L i overly-full table to its siblings Range adjustment after light-weight compaction Advantage d c-d Li … a-c … a-b no data movement and no 1 2 1 2 extra overhead a b d a b c d c … Li+1 … Light-weight compaction
➢ Background ➢ LWC-tree (Light-Weight Compaction tree) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary
LevelDB on SMR Drives SMR(Shingled Magnetic Recording) Overlapped tracks Band & Guard Regions Random write constrain LevelDB on SMR Multiplicative I/O amplification Figure from Fast 2015 “Skylight – A Window on Shingled Disk Operation”
LevelDB on SMR Drives SMR(Shingled Magnetic Recording) 100 Overlapped tracks WA MWA 80 Amplification RAtio Band & Guard Regions 76.59 60 Random write constrain 62.14 52.85 40 LevelDB on SMR 25.22 39.89 20 9.73 10.07 9.83 9.72 9.86 Multiplicative I/O amplification 0 20 30 40 50 60 Band size 40 MB band size (MB) WA (Write amplification of LevelDB) 9.83x AWA (Auxiliary write amplification of SMR) 5.39x MWA (Multiplicative write amplification of LevelDB on SMR ) 52.58x This auxiliary I/O amplifications of SMR motivate our implementation!
LWC-store on SMR drive Aim Eliminate the auxiliary I/O amplification of SMR Improve the overall performance How A DTable is mapped to a band in SMR drive Segment appends to the band and overlaps the out-of-date metadata Equal division: Divide the DTable overflows a band into several sub- tables in the same level
➢ Background ➢ LWC-tree (Light-Weight Compaction tree) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary
Configuration Experiment Perimeter Test Machine 16 Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz processors SMR Drive Seagate ST5000AS0011 CMR HDD Seagate ST1000DM003 SSD Intel P3700 1. LevelDB on HDDs (LDB-hdd) 2. LevelDB on SMR drives (LDB-smr) 3. SMRDB* • An SMR drive optimized key-value store • reduce the LSM-tree levels to only two levels (i.e., L0 and L1) • the key range of the tables at the same level overlapped • match the SSTable size with the band size 4. LWC-store on SMR drives (LWC-smr)
Experiment · Load (100GB data) Random put Random load Rnd.load throughput (MB/s) 12.00 9.80 10.00 Large amount of compactions 8.00 LWC-store 9.80x better than LDB-SMR 4.67x 6.00 LWC-store 4.67x better than LDB-HDD 4.00 2.10 1.70 LWC-store 5.76x better than SMRDB 2.00 1.00 0.00 LDB-SMR LDB-HDD SMRDB LWC-SMR Sequential put Sequential load 50.00 45.50 45.00 42.90 Seq.load throughput (MB/s) No compaction 40.00 Similar Sequential load performance 30.00 20.00 12.20 10.00 0.00 LDB-SMR LDB-HDD SMRDB LWC-SMR
Experiment · read (100K entries) Look-up 100K KV entries against a 100GB random load database Sequential get Random get 30.00 31.00 26.90 30.58 Seq.read throughput (MB/s) 30.50 25.00 Rnd.read latency (ms) 20.50 30.00 20.00 15.50 29.50 15.00 12.60 28.91 28.88 29.00 28.65 10.00 28.50 5.00 28.00 0.00 27.50 LDB-SMR LDB-HDD SMRDB LWC-SMR LDB-SMR LDB-HDD SMRDB LWC-SMR
Experiment · compaction (randomly load 40GB data) Compaction performance in microscope LevelDB: number of compactions is large SMRDB: data size of each compaction is large LWC-tree: small number of compactions and small data size Overall compaction time LWC-smr gets the highest efficiency
Experiment · compaction (randomly load 40GB data) Compaction performance in microscope 49298 50000 LevelDB: number of compactions is Overall comp time (s) large 40000 35640 SMRDB: data size of each 30000 compaction is large 19227 20000 LWC-tree: small number of compactions and small data size 10000 5128 0 Overall compaction time LDB-SMR LDB-HDD SMRDB LWC-SMR Overall compaction time LWC-smr gets the highest efficiency
Experiment · Write amplification Competitors LWC-SMR.WA LDB-SMR.WA LWC-SMR.AWA LDB-SMR.AWA 12.00 10.07 9.83 9.86 9.73 9.72 • LWC-SMR Write amplification 10.00 7.77 • LDB-SMR 6.39 8.00 5.38 Write amplification (WA) 6.00 3.96 2.59 2.24 4.00 1.56 • Write amplification of KV-store 1.47 1.39 1.39 1.14 1.17 1.17 1.02 1.08 2.00 Auxiliary write amplification (ARA) 0.00 20MB 30MB 40MB 50MB 60MB • Auxiliary write amplification of SMR Band Size Multiplicative write amplification LWC-SMR.MWA LDB-SMR.MWA multiplicative write amplification (MWA) 90.00 80.00 • Multiplicative write amplification of KV stores 70.00 76.59 60.00 on SMR 62.14 50.00 52.85 40.00 38.12x 30.00 39.89 20.00 25.22 2.28 1.68 1.68 1.63 1.63 10.00 0.00 20MB 30MB 40MB 50MB 60MB Band Size
Experiment · LWC-store on HDD and SSD
➢ Background ➢ LWC-tree (Light-Weight Compaction tree) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary
Summary LWC-tree: A variant of LSM-tree Light-weight compaction – Significantly reduce the I/O amplification of compaction LWC-store on SMR drive Data management in SMR drive – eliminate the auxiliary I/O amplification from SMR drive Experiment result high compaction efficiency high write efficiency Fast read performance same as LSM-tree Wide applicability
Thank you! QUESTIONS? Email: tingyao@hust.edu.cn
Recommend
More recommend