log structured merge tree lsm
play

Log-structured Merge Tree (LSM) 1 Big Data Indexing We covered - PowerPoint PPT Presentation

Log-structured Merge Tree (LSM) 1 Big Data Indexing We covered the two-layered global/local indexing scheme Ideal for static data Question: How to update these indexes? HDFS limitation: Random updates are not allowed Nave approach:


  1. Log-structured Merge Tree (LSM) 1

  2. Big Data Indexing We covered the two-layered global/local indexing scheme Ideal for static data Question: How to update these indexes? HDFS limitation: Random updates are not allowed Naïve approach: Rebuild the index after each (batch) insert A better approach: Log-structured Merge Tree 2

  3. DBMS Indexing New record Index Log 3

  4. Index Update Randomly updated disk page(s) New record Append a disk page 4

  5. LSM Tree Key idea: Use the log as the index Regularly: Merge the logs to consolidate the index (i.e., remove redundant entries) Flush Merge New Log records Log Bigger log Log Log Log 5 O’Neil, Patrick, Edward Cheng, Dieter Gawlick , and Elizabeth O’Neil. "The log -structured merge-tree (LSM-tree)." Acta Informatica 33, no. 4 (1996): 351-385.

  6. LSM in Big Data First major application: BigTable (Google) Citations 120 100 80 BigTable paper 60 40 20 0 Citations First report from Google mentioning LSM 6

  7. LSM in Big Data Buffer data in memory (memory component) Flush records to disk into an LSM as a disk component (sequential write) Disk components are sorted by key Compact (merge) disk components in the background (sequential read/write) 7

Recommend


More recommend