Log-structured Merge Tree (LSM) 1
Big Data Indexing We covered the two-layered global/local indexing scheme Ideal for static data Question: How to update these indexes? HDFS limitation: Random updates are not allowed Naïve approach: Rebuild the index after each (batch) insert A better approach: Log-structured Merge Tree 2
DBMS Indexing New record Index Log 3
Index Update Randomly updated disk page(s) New record Append a disk page 4
LSM Tree Key idea: Use the log as the index Regularly: Merge the logs to consolidate the index (i.e., remove redundant entries) Flush Merge New Log records Log Bigger log Log Log Log 5 O’Neil, Patrick, Edward Cheng, Dieter Gawlick , and Elizabeth O’Neil. "The log -structured merge-tree (LSM-tree)." Acta Informatica 33, no. 4 (1996): 351-385.
LSM in Big Data First major application: BigTable (Google) Citations 120 100 80 BigTable paper 60 40 20 0 Citations First report from Google mentioning LSM 6
LSM in Big Data Buffer data in memory (memory component) Flush records to disk into an LSM as a disk component (sequential write) Disk components are sorted by key Compact (merge) disk components in the background (sequential read/write) 7
Recommend
More recommend