NVRAMOS ‘14 10.30. 2014 Resolving Journaling of Journal Anomaly via Weaving Recovery Information into DB Page Beomseok Nam UNIST
Outline Motivation • Journaling of Journal Anomaly How to resolve Journaling of Journal anomaly • Multi-Version B-Tree (MVBT) Optimizations of Multi-Version B-Tree • Lazy Split • Reserved Buffer Space • Lazy Garbage Collection • Metadata Embedding • Disabling Sibling Redistribution Evaluation Conclusion 2
Storage I/O Problems in Android Performance Bottleneck Lifetime of Storage 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 Cause = Excessive IO 3
Android I/O Stack Apps SQLite Txn Journaling Insert/Update/Delete/Select Misaligned Interaction EXT4 Write() Journaling Read/Write Block Device Driver 4
Journaling of Journal Anomaly Journaling in SQLite (TRUNCATE mode) Insert a DB entry SQLite Open rollback journal. Record the data to journal. Put commit mark to journal . Insert entry to DB Truncate journal . 5
EXT4 Journaling (ordered mode) SQLite write() EXT4 Write data block Write EXT4 journal Write journal metadata Write journal commit 6
Journaling of Journal Anomaly insert SQLite One insert of 100 Byte 9 Random Writes of 4KByte EXT4 Write data block Write EXT4 journal Write journal metadata Write journal commit 7
How to Resolve Journaling of Journal? Database Insertion Database Journaling Insert a DB entry SQLite EXT4 Journaling of journal anomaly. 8
How to Resolve Journaling of Journal? fsync() fsync() fsync() Journal DB Journal+DB = Version-based B-Tree (MVBT) 9
Version-based B-Tree (MVBT) Versioning Insert 10 Update 10 with 20 Time T1 T2 10 [T1, ∞ ) 10 [T1, T2) Dead Entry 20 [T2, ∞ ) Do not overwrite old version data Do not need a rollback journal 10
Node Split in Multi-Version B-Tree P1 5 [3~ ∞ ) key 25, ver=5 10 [2~ ∞ ) 12 [4~ ∞ ) 40 [1~ ∞ ) 11
Node Split in Multi-Version B-Tree Dead P1 Node 5 [3~5) key 25, ver=5 10 [2~5) 12 [4~5) 40 [1~5) 12
Node Split in Multi-Version B-Tree P1 5 [3~5) key 25, ver=5 10 [2~5) 12 [4~5) 40 [1~5) 13
Node Split in Multi-Version B-Tree key 25, ver=5 P3 P2 P1 5 [5~ ∞ ) 5 [3~5) 10 [5~ ∞ ) 10 [2~5) 12 [5~ ∞ ) 12 [4~5) 40 [5~ ∞ ) 40 [1~5) 14
Node Split in Multi-Version B-Tree P4 10 [5~ ∞ ) : P3 ∞ [5~ ∞ ) : P2 ∞ [0~5) : P1 key 25, ver=5 P3 P1 P2 5 [5~ ∞ ) 5 [3~5) 10 [5~ ∞ ) 10 [2~5) 12 [5~ ∞ ) 12 [4~5) 25 [5~ ∞ ) 40 [1~5) 40 [5~ ∞ ) 15
Node Split in Multi-Version B-Tree One more dirty page than P4 dirty original B-Tree 10 [5~ ∞ ) : P3 ∞ [5~ ∞ ) : P2 ∞ [0~5) : P1 P3 P1 P2 dirty dirty dirty 5 [5~ ∞ ) 5 [3~5) 10 [5~ ∞ ) 10 [2~5) 12 [5~ ∞ ) 12 [4~5) 25 [5~ ∞ ) 40 [1~5) 40 [5~ ∞ ) 16
I/O Traffic in MVBT DB buffer cache fsync() MVBT Reduce the number of dirty pages! 17
I/O Traffic in MVBT DB buffer cache fsync() MVBT DB buffer cache fsync() LS -MVBT 18
Optimizations in Android I/O >> t read t write Read Write Multiple Write Insert Transaction 1 Write Transaction 2 Single Insert Transaction Write Transaction 3 19
Lazy Split Multi-Version B-Tree (LS-MVBT) Optimizations Lazy Split Disabling Reserved Sibling Buffer Space Redistribution LS-MVBT Lazy Garbage Metadata Collection Embedding 20
Optimization1: Lazy Split Legacy Split in MVBT P4 dirty 10 [5~ ∞ ) : P3 ∞ [5~ ∞ ) : P2 ∞ [0~5) : P1 4 dirty pages P3 P1 P2 dirty dirty dirty 5 [5~ ∞ ) 5 [3~5) 10 [5~ ∞ ) 10 [2~5) 25 [5~ ∞ ) 12 [4~5) 12 [5~ ∞ ) 40 [1~5) 40 [5~ ∞ ) 21
Optimization1: Lazy Split P1 5 [3~ ∞ ) key 25, ver=5 10 [2~ ∞ ) 12 [4~ ∞ ) 40 [1~ ∞ ) 22
Optimization1: Lazy Split Lazy Node: Half-dead, P1 Half-live 5 [3~ ∞ ) key 25, ver=5 10 [2~ ∞ ) 12 [4~5 ) 40 [1~5 ) 23
Optimization1: Lazy Split P1 5 [3~ ∞ ) key 25, ver=5 10 [2~ ∞ ) 12 [4~5 ) 40 [1~5 ) 24
Optimization1: Lazy Split key 25, ver=5 P2 P1 5 [3~ ∞ ) 10 [2~ ∞ ) 12 [5~ ∞ ) 12 [4~5 ) 40 [5~ ∞ ) 40 [1~5 ) 25
Optimization1: Lazy Split P3 10 [5~ ∞ ) : P1 ∞ [0~5 ) : P1 ∞ [5 ~ ∞ ) : P2 key 25, ver=5 P1 P2 5 [3~ ∞ ) 12 [5~ ∞ ) 10 [2~ ∞ ) 12 [4~5 ) 25 [5~ ∞ ) 40 [1~5 ) 40 [5~ ∞ ) 26
Optimization1: Lazy Split Lazy Node Overflow P3 10 [5~ ∞ ) : P1 ∞ [0~5 ) : P1 ∞ [5 ~ ∞ ) : P2 key 8, ver = 6 P1 P2 5 [3~ ∞ ) 10 [2~ ∞ ) 12 [5~ ∞ ) Dead Dead 25 [5~ ∞ ) 12 [4~5 ) entries entries 40 [1~5 ) 40 [5~ ∞ ) 27
Optimization1: Lazy Split Lazy Node Overflow → Garbage collect dead entries P3 10 [5~ ∞ ) : P1 ∞ [0~5 ) : P1 ∞ [5 ~ ∞ ) : P2 key 8, ver = 6 P1 P2 5 [3~ ∞ ) 10 [2~ ∞ ) 12 [5~ ∞ ) 25 [5~ ∞ ) 12 [4~5 ) 40 [1~5 ) 40 [5~ ∞ ) 28
Optimization1: Lazy Split Lazy Node Overflow → Garbage collect dead entries P3 10 [5~ ∞ ) : P1 ∞ [0~5 ) : P1 ∞ [5 ~ ∞ ) : P2 key 8, ver = 6 P1 P2 5 [3~ ∞ ) 8 [6~ ∞ ) 12 [5~ ∞ ) 25 [5~ ∞ ) 10 [2~ ∞ ) 40 [5~ ∞ ) But, what if the dead entries are being accessed by other transactions? 29
Optimization2: Reserved Buffer Space Option 1. Wait for read transactions to finish P3 10 [5~ ∞ ) : P1 ∞ [0~5 ) : P1 ∞ [5 ~ ∞ ) : P2 key 8, ver = 6 P1 P2 5 [3~ ∞ ) 12 [5~ ∞ ) 10 [2~ ∞ ) 25 [5~ ∞ ) 12 [4~5 ) 40 [1~5 ) 40 [5~ ∞ ) 30
Optimization2: Reserved Buffer Space Option 2. Split as in legacy MVBT split P3 10 [5~ ∞ ) : P1 ∞ [0~5 ) : P1 ∞ [5 ~ ∞ ) : P2 key 8, ver = 6 P1 P2 P3 5 [3~6) 5 [6~ ∞ ) 12 [5~ ∞ ) 1 0 [2~6) 10 [6~ ∞ ) 12 [4~5 ) 25 [5~ ∞ ) 40 [1~5 ) 40 [5~ ∞ ) 31
Optimization2: Reserved Buffer Space Option 2. Split as in legacy MVBT split P3 10 [5~6 ) : P1 ∞ [0~5 ) : P1 ∞ [5 ~ ∞ ) : P2 10 [6 ~ ∞ ) : P3 key 8, ver = 6 P3 P1 P2 5 [6~ ∞ ) 5 [3 ~6) 12 [5~ ∞ ) 8 [6~ ∞ ) 10 [2 ~6) 25 [5~ ∞ ) 10 [6~ ∞ ) 12 [4~5 ) 40 [1~5 ) 40 [5~ ∞ ) 32
Optimization2: Reserved Buffer Space Option 3. Pad some buffer space in tree nodes P3 10 [5~ ∞ ) : P1 ∞ [0~5 ) : P1 ∞ [5 ~ ∞ ) : P2 key 8, ver = 6 P1 P2 12 [5~ ∞ ) 10 [2~ ∞ ) Buffer space is used 25 [5~ ∞ ) 12 [4~5 ) when lazy node is full 40 [1~5 ) 40 [5~ ∞ ) If buffer space is also full, 9 [6~ ∞ ) split as in legacy MVBT 33
Rollback in LS-MVBT Similar to rollback in MVBT • P3 P3 P3 P3 Rollback ∞ [0~ ∞ ) : P1 ∞ [0~5 ) : P1 10 [5~ ∞ ) : P1 10 [5~ ∞ ) : P1 ∞ [0~5 ) : P1 ∞ [0~5 ) : P1 Txn 5 ∞ [5 ~ ∞ ) : P2 ∞ [5 ~ ∞ ) : P2 crashes Number of dirty nodes touched by rollback of LS- MVBT is also smaller than that of MVBT. P1 P1 P2 P1 P1 5 [3~ ∞ ) 5 [3~ ∞ ) 12 [5~ ∞ ) 5 [3~ ∞ ) 5 [3~ ∞ ) 10 [2~ ∞ ) 10 [2~ ∞ ) 25 [5~ ∞ ) 10 [2~ ∞ ) 10 [2~ ∞ ) 12 [4~5 ) 12 [4~5 ) 40 [5~ ∞ ) 12 [4~5 ) 12 [4~ ∞ ) 40 [1~5 ) 40 [1~5 ) 40 [1~ ∞ ) 40 [1~5 ) 34
Optimization3: Lazy Garbage Collection Periodic Garbage Collection P1 Transaction 1 P3 P1 Transaction buffer cache P2 35
Optimization3: Lazy Garbage Collection Periodic Garbage Collection Transaction 1 Stopped P3 P1 Transaction buffer cache P2 P1 P2 P3 fsync() Garbage Extra Dirty Pages Collection GC buffer cache 36
Optimization3: Lazy Garbage Collection Lazy Garbage Collection Do not garabge collect if no space is needed P1 fsync() Transaction 1 Insert P3 P1 Transaction buffer cache P2 No Extra Dirty Page by GC 37
Optimization4: Metadata embedding Version = “File Change Counter” in DB header page Header Page 1 dirty 6 5 ++ Write 6 Transaction Page 2 Commit 2 dirty pages ... Read 5 Transaction dirty Page N 15 [ 6 ~ ∞ ) 25 [5~ ∞ ) 40 [1~ ∞ ) 38
Optimization4: Metadata embedding Flush “File Change Counter” to the last modified page and RAMDISK Header Page 1 Write 6 Transaction Page 2 4 Commit ... Read 5 Transaction Page N 5 6 RAMDISK 15 [ 6 ~ ∞ ) 25 [5~ ∞ ) 5 6 40 [1~ ∞ ) 39
Optimization4: Metadata embedding Flush “File Change Counter” to the last modified page and RAMDISK Header Page 1 Write 6 Transaction Page 2 4 Commit CRASH ... Read 5 Transaction Page N 6 RAMDISK dirty 15 [ 6 ~ ∞ ) 25 [5~ ∞ ) 6 6 40 [1~ ∞ ) 1 dirty page Volatile 40
Optimization5: Disable Sibling Redistribution Sibling redistribution hurts insertion performance dirty P5: P5: 10 : P4 10 : P4 10 12 Insertion of key 20 40 : P3 40 : P3 40 25 : P2 : P2 ∞ ∞ 4 dirty pages P4: dirty dirty P2: dirty P3: 12 55 5 15 20 10 ∞ 25 40 • Disable sibling redistribution → avoid dirtying 4 pages 41
Optimization5: Disable Sibling Redistribution Disabled sibling redistribution P5 : dirty 10 : P4 40 : P3 15 : P5 Insertion of key 20 90 : P2 P3: P5: P2: P4: dirty dirty 20 12 55 5 90 15 10 25 40 3 dirty pages 42
Summary Optimizations Lazy Split Avoid dirtying an extra page when split occurs Disabling Reserved Sibling Buffer Space Redistribution Reduce the probability LS-MVBT Do not touch siblings to of node split make search faster Lazy Garbage Metadata Collection Embedding Delete dead entries on the Avoid dirtying header page next mutation 43
Recommend
More recommend