Beomseok Nam UNIST (Ulsan National Institute of Science and Technology)
Non-Volatile Memory (NVM) NAND STT-MRAM PCM DRAM Non-volatility o o o x 2.5 X 10 4 Read (ns) 5 - 30 20 – 70 10 2 X 10 5 Write (ns) 10 - 100 150 - 220 10 Byte-addressable x o o o Density 185.8 Gbit/cm 2 0.36 Gbit/cm 2 13.5 Gbit/cm 2 9.1 Gbit/cm 2 K. Suzuki and S. Swanson. “A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014”, IMW 2015 Non-volatile Low Latency Persistent Memory
When Granularity of Atomicity = Page 4K Page 4K Page write() fsync() 4K Page 3
When Granularity of Atomicity When Granularity of Atomicity = Page = Cache Line 4K Page 4K Page B 4K Page store A write() A store B fsync() clflush clflush 4K Page 4K Page 4K Page 4K Page Memory level parallelism 4
When Granularity of Atomicity When Granularity of Atomicity = Page = Cache Line 4K Page 4K Page B 4K Page store A write() A store B fsync() clflush clflush 4K Page 4K Page 4K Page 4K Page B can be stored first. 5
When Granularity of Atomicity When Granularity of Atomicity = Page = Cache Line cache line 4K Page 4K Page 4K Page … write() store fsync() mfence clflush 4K Page 4K Page Legacy Block IO Interface requires too many barriers and clflushes unnecessarily 6
fsync() vs. a group of mfence and clflush instructions • Faster than flash memory, but there’s room for improvement. Need to make transactions be aware of byte-addressability of NVM Atomicity • All or Nothing Consistency • Only valid data Isolation • No interference Durability • Data is recoverable
N O R T H N O R T H Single Copy Minimize redundant write operations
Persistent (PM) Buffer Cache Block Device Storage Query update(EAST) N O R T H N O R T H DB File E A R T H A system crash may result in inconsistent data.
Logical view 30 50 of this page Slot Header Slot Header Record Content Area Metadata Record Content Area Record Offset Array Record Offset Array 1 2 1000 900 Free space Free space Key = 50 Key = 30 Invisible 900 1000 1024 Number of Records
40 Logical view 30 50 of this page Slot Header Slot Header Record Content Area Record Content Area Metadata Record Offset Array Record Offset Array 3 1 2 1000 1000 1000 900 900 800 Free space Free space Free space Free space Key = 40 Key = 50 Key = 30 Invisible 800 900 1000 1024 Number of Records
Logical view 30 50 of this page Slot Header Invisible Record Content Area Metadata Record Offset Array 1 2 1000 1000 1000 900 900 Free space Free space Free space Key = 40 Key = 50 Key = 30 800 900 1000 1024 Number of Records
Dirty Record of Slot Header Slotted Page 3
Page A 3 900 800 1000 Free space Key = 20 Key = 10 Key = 30 800 900 1000 Page B 2 1000 900 Free space Key = 50 Key = 40 900 1000
Page A 3 900 800 1000 Free space Key = 20 Key = 10 Key = 30 800 900 1000 ① Writing the record Page B 2 1000 900 Free space Key = 20 Key = 50 Key = 40 900 1000 invisible
② Updating the slot header A Page A 2 900 1000 Free space Key = 20 Key = 10 Key = 30 800 900 1000 invisible Page B 2 1000 900 Free space Key = 20 Key = 50 Key = 40 900 1000 invisible
A 3 20 10 30 B 2 20 50 40 dirty record B A 2 3 commit
A 3 20 10 10 30 30 B 2 20 20 50 50 40 40 dirty record B A commit 2 3
Dirty Slotted Slot Header Slot Header Dirty Slotted Page B of Page A of Page B Page A A B commit 2 3 Recovery
NVWAL FASH FAST Single page In-place commit update Differential logging Slot-header logging Multiple page Slot-header logging update Buffer cache In DRAM In PM In PM Log In PM In PM In PM Hybrid memory architecture PM-only architecture Volatile Buffer Cache Persistent Buffer Cache WAL File DB File
2.1x 2.6x
FAST and FASH consistently outperform NVWAL FAST and FASH do not duplicate write operations for records • NVWAL generates large log frames for large records • FASH calls more clflush instructions for small record sizes The reason is that with smaller records, the slotted-page can hold more records • FAST calls about 3 clflush instructions when the record is smaller than 64 bytes The slot-header size of FAST must be less than 64bytes. •
“Failure-atomic slotted paging scheme” eliminates the necessity of redundant copies by integrating logging into database buffer caching. PM-only memory systems can perform faster than hybrid memory systems that consist of both PM and DRAM Even with a small PM, we can significantly reduce IO traffic via Slot-Header Journaling.
Recommend
More recommend