a light weight compaction tree to reduce i o
play

A Light-weight Compaction Tree to Reduce I/O Amplification toward - PowerPoint PPT Presentation

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores Ting Yao 1 , Jiguang Wan 1 , Ping Huang 2 , Xubin He 2 , Qingxin Gui 1 , Fei Wu 1 , and Changsheng Xie 1 1 Wuhan National Laboratory for Optoelectronics


  1. A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores Ting Yao 1 , Jiguang Wan 1 , Ping Huang 2 , Xubin He 2 , Qingxin Gui 1 , Fei Wu 1 , and Changsheng Xie 1 1 Wuhan National Laboratory for Optoelectronics Huazhong University of Science and Technology 2 Temple University

  2. Outline ➢ Background ➢ LWC-tree (Light-Weight Compaction tree) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary

  3. Background  Key-value stores are widespread in modern data centers. Write (key, value) 3 2 Memory  Better service quality 1 4 Disk  Responsive user experience L 0 Log L 1 compaction  The log-structured merge tree (LSM- L 2 tree) is widely deployed. ………  RocksDB, Cassandra, Hbase, PNUTs, and L n … LevelDB  High throughput write Immutable MemTable SSTable MemTable  Fast read performance

  4. Background · LSM-tree a-c sort ② Memory a b c Disk L 0 L 0 ① read Above 10x write ③ L 1 L 1 a-c L 2 L 2 a b c ……… … L n L n The overall read and write data size for a compaction: 13x 8 tables  This serious I/O amplifications of compactions motivate our design !

  5. ➢ Background ➢ LWC-tree (Light-Weight t Com Compacti tion tr tree ee) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary

  6. LWC-tree  Aim  Alleviate the I/O amplification  Achieve high write throughput  No sacrifice to read performance  How  Keep the basic component of LSM-tree  Keep tables sorted  Keep the multilevel structure ➢ Light-weight compaction – reduce I/O amplification ➢ Metadata aggregation – reduce random read in a compaction ➢ New table structure, DTable – improve lookup efficient within a table ➢ Workload balance – keep the balance of LWC-tree

  7. LWC-tree · Light-weight compaction  Aim a ’ c ’ a-c b ’ Memory sort  Reduce I/O amplification Disk ① read L 0  How ③ Overwrite and append  append the data and only merge L 1 L 1 a-c the metadata c c a a b b L 2 ➢ Read the victim table ➢ Sort and divide the data, merge the ……… metadata L n ➢ Overwrite and append the segment  Reduce 10 x amplification The overall read and write data size for a light-weight theoretically (AF=10) compaction: 2 tables. (In LSM-tree, the overall data size for a conventional compaction: 8 tables. )

  8. LWC-tree · Metadata Aggregation  Aim  Reduce random read in a compaction  Efficiently obtain the metadata form Li a-c a-c overlapped Dtables c a b  How a c b c ’ Li+1 b ’ a ’  Cluster the metadata of overlapped DTables to its corresponding victim Dtable A light-weight compaction after each compaction

  9. LWC-tree · Metadata Aggregation  Aim  Reduce random read in a compaction  Efficiently obtain the metadata form Li a-c a-c overlapped Dtables c a b  How a c b c ’ Li+1 b ’ a ’  Cluster the metadata of overlapped DTables to its corresponding victim Dtable Metadata aggregation after each compaction after light-weight compaction

  10. LWC-Tree · DTable  Aim Overlapped Dtables  Support Light-weight compaction Dtable Metadata Data_block i  Keep the lookup efficiency within a Data_block i+1 Origin data …… DTable Segment 1 (append data) Filter blocks Segment 2 Overlapped Meta_index block (append data)  How Meta_index block Metadata Index block  Store the metadata of its footer corresponding overlapped Dtables Magic data Segment 1 Segment 2  Manage the data and block index in Index block Origin index Index Index segment Index block in segment

  11. LWC-Tree · Workload Balance  Aim Data block  Keep the balance of LWC-tree Data volume  Improve the operation efficiency …  How  Deliver the key range of the n 1 2 3 4 5 6 7 8 DTable number in Level L i overly-full table to its siblings Range adjustment after light-weight compaction  Advantage d c-d Li … a-c … a-b  no data movement and no 1 2 1 2 extra overhead a b d a b c d c … Li+1 … Light-weight compaction

  12. ➢ Background ➢ LWC-tree (Light-Weight Compaction tree) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary

  13. LevelDB on SMR Drives  SMR(Shingled Magnetic Recording)  Overlapped tracks  Band & Guard Regions  Random write constrain  LevelDB on SMR  Multiplicative I/O amplification Figure from Fast 2015 “Skylight – A Window on Shingled Disk Operation”

  14. LevelDB on SMR Drives  SMR(Shingled Magnetic Recording) 100  Overlapped tracks WA MWA 80 Amplification RAtio  Band & Guard Regions 76.59 60  Random write constrain 62.14 52.85 40  LevelDB on SMR 25.22 39.89 20 9.73 10.07 9.83 9.72 9.86  Multiplicative I/O amplification 0 20 30 40 50 60 Band size 40 MB band size (MB) WA (Write amplification of LevelDB) 9.83x AWA (Auxiliary write amplification of SMR) 5.39x MWA (Multiplicative write amplification of LevelDB on SMR ) 52.58x  This auxiliary I/O amplifications of SMR motivate our implementation!

  15. LWC-store on SMR drive  Aim  Eliminate the auxiliary I/O amplification of SMR  Improve the overall performance  How  A DTable is mapped to a band in SMR drive  Segment appends to the band and overlaps the out-of-date metadata  Equal division: Divide the DTable overflows a band into several sub- tables in the same level

  16. ➢ Background ➢ LWC-tree (Light-Weight Compaction tree) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary

  17. Configuration Experiment Perimeter Test Machine 16 Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz processors SMR Drive Seagate ST5000AS0011 CMR HDD Seagate ST1000DM003 SSD Intel P3700 1. LevelDB on HDDs (LDB-hdd) 2. LevelDB on SMR drives (LDB-smr) 3. SMRDB* • An SMR drive optimized key-value store • reduce the LSM-tree levels to only two levels (i.e., L0 and L1) • the key range of the tables at the same level overlapped • match the SSTable size with the band size 4. LWC-store on SMR drives (LWC-smr)

  18. Experiment · Load (100GB data) Random put  Random load Rnd.load throughput (MB/s) 12.00 9.80 10.00  Large amount of compactions 8.00  LWC-store 9.80x better than LDB-SMR 4.67x 6.00  LWC-store 4.67x better than LDB-HDD 4.00 2.10 1.70  LWC-store 5.76x better than SMRDB 2.00 1.00 0.00 LDB-SMR LDB-HDD SMRDB LWC-SMR Sequential put  Sequential load 50.00 45.50 45.00 42.90 Seq.load throughput (MB/s)  No compaction 40.00  Similar Sequential load performance 30.00 20.00 12.20 10.00 0.00 LDB-SMR LDB-HDD SMRDB LWC-SMR

  19. Experiment · read (100K entries)  Look-up 100K KV entries against a 100GB random load database Sequential get Random get 30.00 31.00 26.90 30.58 Seq.read throughput (MB/s) 30.50 25.00 Rnd.read latency (ms) 20.50 30.00 20.00 15.50 29.50 15.00 12.60 28.91 28.88 29.00 28.65 10.00 28.50 5.00 28.00 0.00 27.50 LDB-SMR LDB-HDD SMRDB LWC-SMR LDB-SMR LDB-HDD SMRDB LWC-SMR

  20. Experiment · compaction (randomly load 40GB data)  Compaction performance in microscope  LevelDB: number of compactions is large  SMRDB: data size of each compaction is large  LWC-tree: small number of compactions and small data size  Overall compaction time  LWC-smr gets the highest efficiency

  21. Experiment · compaction (randomly load 40GB data)  Compaction performance in microscope 49298 50000  LevelDB: number of compactions is Overall comp time (s) large 40000 35640  SMRDB: data size of each 30000 compaction is large 19227 20000  LWC-tree: small number of compactions and small data size 10000 5128 0  Overall compaction time LDB-SMR LDB-HDD SMRDB LWC-SMR Overall compaction time  LWC-smr gets the highest efficiency

  22. Experiment · Write amplification  Competitors LWC-SMR.WA LDB-SMR.WA LWC-SMR.AWA LDB-SMR.AWA 12.00 10.07 9.83 9.86 9.73 9.72 • LWC-SMR Write amplification 10.00 7.77 • LDB-SMR 6.39 8.00 5.38  Write amplification (WA) 6.00 3.96 2.59 2.24 4.00 1.56 • Write amplification of KV-store 1.47 1.39 1.39 1.14 1.17 1.17 1.02 1.08 2.00  Auxiliary write amplification (ARA) 0.00 20MB 30MB 40MB 50MB 60MB • Auxiliary write amplification of SMR Band Size Multiplicative write amplification LWC-SMR.MWA LDB-SMR.MWA  multiplicative write amplification (MWA) 90.00 80.00 • Multiplicative write amplification of KV stores 70.00 76.59 60.00 on SMR 62.14 50.00 52.85 40.00 38.12x 30.00 39.89 20.00 25.22 2.28 1.68 1.68 1.63 1.63 10.00 0.00 20MB 30MB 40MB 50MB 60MB Band Size

  23. Experiment · LWC-store on HDD and SSD

  24. ➢ Background ➢ LWC-tree (Light-Weight Compaction tree) ➢ LWC-store on SMR drives ➢ Experiment Result ➢ Summary

  25. Summary  LWC-tree: A variant of LSM-tree  Light-weight compaction – Significantly reduce the I/O amplification of compaction  LWC-store on SMR drive  Data management in SMR drive – eliminate the auxiliary I/O amplification from SMR drive  Experiment result  high compaction efficiency  high write efficiency  Fast read performance same as LSM-tree  Wide applicability

  26. Thank you! QUESTIONS? Email: tingyao@hust.edu.cn

Recommend


More recommend