Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes Pengfei Zuo, Yu Hua, Ming Zhao*, Wen Zhou, Yuncheng Guo Huazhong University of Science and Technology (HUST), China *Arizona State University (ASU), USA
Non-volatile Memory (NVM) � Non-volatile memory is expected to replace or complement DRAM in memory hierarchy � Non-volatility, low power, high density, large capacity PCM ReRAM DRAM Read ( ns ) 20-70 20-50 10 Write ( ns ) 150-220 70-140 10 PCM √ √ × Non-volatility Standby Power ~0 ~0 High 10 7 ~10 9 10 8 ~10 12 10 15 Endurance Density ( Gb/cm 2 ) 13.5 24.5 9.1 ReRAM C. Xu et al. “Overcoming the Challenges of Crossbar Resistive Memory Architectures”, HPCA, 2015. 2 K. Suzuki and S. Swanson. “A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014”, IMW 2015.
Endurance and Security in Non-volatile Memory � NVM typically has limited endurance – 10 7 ~10 9 for PCM, 10 8 ~10 12 for ReRAM – Writes have much higher latency than reads – Write reduction matters for NVM � NVM is vulnerable to stolen DIMM attack – NVM still retains data after systems power down – An attacker can directly read data from the stolen NVM – Memory encryption matters for NVM 3
Encryption Increases Bit Flips to NVM � Diffusion property of encryption – The change of one bit in the original data has to modify half of bits in the encrypted data Encryption New data: 10101100…000100101001 10000000…000000000000 Overwrite Overwrite Encryption Old data in NVM: 00000000…000000000000 01011010…000010110100 256 of 512 bits modified 1 of 512 bits modified 4 4
Encryption Increases Bit Flips to NVM 4X Young et al. “DEUCE: Write-efficient encryption for non-volatile memories”, in Proc. of ASPLOS, 2015. Encryption renders existing bit-level write reduction techniques ineffective 5
Observation and Motivation PARSEC 2.1 SPEC CPU2006 � A large number of entire-line duplicates exist in real-world applications 6
DeWrite � Lightweight cache-line-level deduplication for NVMM Last Level Cache – Employ lightweight hashing Data Memory Controller – Leverage NVM read/write asymmetry Metadata Dedup Logic – Eliminate a write at the cost of a read Cache Non-duplicate OTP AES-ctr � Efficient synergization between Data: Metadata: deduplication and encryption Direct encryption CME Metadata Encrypted NVMM – Opportunistic parallelism Storage – Metadata storage co-location Hardware Architecture 7
Prediction-based Parallelism The direct way A Write Request Detect Duplication Yes Is duplicate ? Cancel the Write No Encrypt Data Write to NVM � Be inefficient for non-duplicate writes • Serial execution latency 8
Prediction-based Parallelism The direct way The parallel way A Write Request A Write Request Detect Duplication Detect Duplication Encrypt Data Yes Is duplicate ? Cancel the Write Is duplicate ? No No Yes Encrypt Data Write to NVM Write to NVM Discard the Ciphertext � Be inefficient for non-duplicate writes � Be inefficient for duplicate writes • Serial execution latency • Unnecessary encryption 9
Prediction-based Parallelism The direct way The parallel way A Write Request A Write Request Detect Duplication Detect Duplication Encrypt Data Yes Is duplicate ? Cancel the Write Is duplicate ? No No Yes Encrypt Data Write to NVM Write to NVM Discard the Ciphertext � Be inefficient for non-duplicate writes � Be inefficient for duplicate writes • Serial execution latency • Unnecessary encryption 10
Prediction-based Parallelism The direct way The parallel way A Write Request A Write Request Detect Duplication Detect Duplication Encrypt Data Yes Is duplicate ? Cancel the Write Is duplicate ? No No Yes Encrypt Data Write to NVM Write to NVM Discard the Ciphertext Duplicate Non-duplicate Prediction 11
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A History window 1: duplicate 0: non-duplicate 12
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU A D C B History window 1: duplicate 0: non-duplicate 13
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A 0 History window 1: duplicate 0: non-duplicate 14
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU B D C A 0 History window 1: duplicate 0: non-duplicate 15
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU B D C A Predict 0 History window 1: duplicate 0: non-duplicate 16
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A 0 0 History window 1: duplicate 0: non-duplicate 17
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU C D B A 0 0 History window 1: duplicate 0: non-duplicate 18
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU C D B A Predict 0 0 History window 1: duplicate 0: non-duplicate 19
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: 92.1% accuracy Memory CPU D C B A 0 0 0 History window 1: duplicate 0: non-duplicate 20
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A 0 0 0 History window 1: duplicate 0: non-duplicate 21
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A Predict 0 0 0 History window 1: duplicate 0: non-duplicate 22
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: 92.1% 93.6% Memory CPU D C B A Predict 0 0 0 History window 1: duplicate 0: non-duplicate 23
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: 92.1% 93.6% Memory CPU D C B A Predict 0 0 0 History window 1: duplicate 0: non-duplicate 24
Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: 92.1% 93.6% Why can we achieve such a high prediction accuracy? � Rationale: the size of duplicate (non-duplicate) data is usually much larger than a cache line – E.g., a page (4KB) is duplicate or non-duplicate: 100% accuracy 25
Lightweight Deduplication for NVMM � Traditional deduplication N Non-duplicate Hash computation latency: >300ns SHA1/ SHA1/ ≈ NVM write latency Match? Match? MD5 MD5 Y Duplicate 26
Lightweight Deduplication for NVMM � Traditional deduplication N Non-duplicate Hash computation latency: >300ns SHA1/ SHA1/ ≈ NVM write latency Match? Match? MD5 MD5 Y Duplicate � DeWrite The latency is 91ns at most N Non-duplicate N CRC-32 CRC-32 Match? Match? Y Y Read data Read data Duplicate Match? Match? and compare and compare 15ns 75ns+1ns 27
Metadata Colocation � Encryption metadata: per-line counter Encryption Decryption Plaintext LineAddr Plaintext Counter + + OTP AES-ctr Key Ciphertext Ciphertext 28
Metadata Colocation � Encryption metadata: per-line counter � Deduplication metadata: address mapping, reverted hash 29
Recommend
More recommend