Authors : Joy Arulraj, Matthew Peron, Andrew Pavlo (Computer Science @ CMU) Presenter: Devesh Kumar Singh
Background Storage Devices Write Ahead Protocol Write Behind Protocol Evaluation
Durability of updates: Persist committed transactions Commit Tx: A = A+1 A = 2 A = 1 Failure Atomicity: Dispose aborted transactions Tx: A = A + 1 Crash/Abort A = 1 A = 1
Transaction failure: System failure: Media failure: Data loss, storage corruption Aborted by DBMS/ Hardware failure, bugs in DBM/OS application
Steal Grab buffer-pool frames from uncommitted transactions Can lose dirty writes, but better performance No Force Don’t force transaction updates to disk before committing No Steal Steal Difficult to guarantee durability, but better performance No Force Desired Force Trivial
Changes added to a log on durable storage, then send to durable storage Redo log Reapply updates of committed transactions Undo log: Reverses updates by failed transactions
Magnetic storage platters based High data density/ Low storage price per capacity Random access slower than sequential access Slowest speeds due to mechanical design choices
NAND-based flash memory based Read/Write 100-1000x faster then HDD Storage cell durable for fixed # of writes 3-10x expensive then HDD
Low latency, byte sized reads/writes of DRAM Persistent writes, large storage capacity of HDD/SDDs Cache line granularity, High bandwidth, Low latency to CPU’s
Synchronized file write throughput to a 64 GB file 1000 100 100 100 10 IOPS (K) 1 1 0.5 0.1 0.1 0.02 0.01 Sequential Writes Random Writes HDD SSD NVM
WAL Record LSN Log Rec Type Transaction Table ID Insert Delete Before/After Commit Location Location Images Timestamp Dirty Page Table Active Transaction Table activeTxId latestLSN TxId lastLSN status
Traditional DBMS In-memory DBMS DPT/ATT ATT Storage Storage All TX Committed TX
During Transaction txId lastLSN status txId lastLSN status rec1,rec2 ,rec3 1 28 Commit 1 - Active 1 Database DRAM Data Data 2 3 Checkpoints NVM
In memory DBMS skips Undo phase
WBL record LSN Log Record Type Persisted commit Dirty Commit Timestamp Timestamp Dirty Tuple table TX id Table id Tuple location
Cp: Commit timestamp of latest Operation Finish committed transaction Cd: Commit timestamp not assigned to any transaction before the next group commit finishes TX changes DRAM Group Commit: Flushes a batch a log records in a single write to durable storage Tuple changes DTT
dt1 Dirty tuples, (Cp,Cd), dt2 DTT Long running tx Cp (Cp,Cd) 1 Meta DRAM Database Data Data 2 3 Database Log NVM ✗ Checkpoints
Group Commit Time 3 1 2 4 5 Dirty Ranges {(101,199) {(101,199)} {(101,199)} { } { } (301,399)} Garbage Collection
Intel PMEP Hardware Emulator 128 GB DRAM 3 TB Seagate 128 GB Emulated 400 GB Intel DC Barracuda HDD NVM from DRAM S3700 SSD
TPC-C Yahoo’s YCSB 5 Tx types, 88% reads, 1 table with 2 mil 12% updates, tuples (2 GB) 100k tuples(1 GB) Read-heavy, Balanced Write-heavy 90% reads, 10% updates 50% reads, 50% updates 10% reads, 90% updates
10000 Tx\sec 100 1 HDD SSD NVM WAL WBL
1000 100 Recovery time 10 1 0.1 HDD SSD NVM WAL WBL
Recommend
More recommend