Improving the Performance and Endurance of Encrypted Non-volatile - PowerPoint PPT Presentation

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes Pengfei Zuo, Yu Hua, Ming Zhao*, Wen Zhou, Yuncheng Guo Huazhong University of Science and Technology (HUST), China *Arizona State University (ASU), USA

Non-volatile Memory (NVM) � Non-volatile memory is expected to replace or complement DRAM in memory hierarchy � Non-volatility, low power, high density, large capacity PCM ReRAM DRAM Read ( ns ) 20-70 20-50 10 Write ( ns ) 150-220 70-140 10 PCM √ √ × Non-volatility Standby Power ~0 ~0 High 10 7 ~10 9 10 8 ~10 12 10 15 Endurance Density ( Gb/cm 2 ) 13.5 24.5 9.1 ReRAM C. Xu et al. “Overcoming the Challenges of Crossbar Resistive Memory Architectures”, HPCA, 2015. 2 K. Suzuki and S. Swanson. “A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014”, IMW 2015.

Endurance and Security in Non-volatile Memory � NVM typically has limited endurance – 10 7 ~10 9 for PCM, 10 8 ~10 12 for ReRAM – Writes have much higher latency than reads – Write reduction matters for NVM � NVM is vulnerable to stolen DIMM attack – NVM still retains data after systems power down – An attacker can directly read data from the stolen NVM – Memory encryption matters for NVM 3

Encryption Increases Bit Flips to NVM � Diffusion property of encryption – The change of one bit in the original data has to modify half of bits in the encrypted data Encryption New data: 10101100…000100101001 10000000…000000000000 Overwrite Overwrite Encryption Old data in NVM: 00000000…000000000000 01011010…000010110100 256 of 512 bits modified 1 of 512 bits modified 4 4

Encryption Increases Bit Flips to NVM 4X Young et al. “DEUCE: Write-efficient encryption for non-volatile memories”, in Proc. of ASPLOS, 2015. Encryption renders existing bit-level write reduction techniques ineffective 5

Observation and Motivation PARSEC 2.1 SPEC CPU2006 � A large number of entire-line duplicates exist in real-world applications 6

DeWrite � Lightweight cache-line-level deduplication for NVMM Last Level Cache – Employ lightweight hashing Data Memory Controller – Leverage NVM read/write asymmetry Metadata Dedup Logic – Eliminate a write at the cost of a read Cache Non-duplicate OTP AES-ctr � Efficient synergization between Data: Metadata: deduplication and encryption Direct encryption CME Metadata Encrypted NVMM – Opportunistic parallelism Storage – Metadata storage co-location Hardware Architecture 7

Prediction-based Parallelism The direct way A Write Request Detect Duplication Yes Is duplicate ? Cancel the Write No Encrypt Data Write to NVM � Be inefficient for non-duplicate writes • Serial execution latency 8

Prediction-based Parallelism The direct way The parallel way A Write Request A Write Request Detect Duplication Detect Duplication Encrypt Data Yes Is duplicate ? Cancel the Write Is duplicate ? No No Yes Encrypt Data Write to NVM Write to NVM Discard the Ciphertext � Be inefficient for non-duplicate writes � Be inefficient for duplicate writes • Serial execution latency • Unnecessary encryption 9

Prediction-based Parallelism The direct way The parallel way A Write Request A Write Request Detect Duplication Detect Duplication Encrypt Data Yes Is duplicate ? Cancel the Write Is duplicate ? No No Yes Encrypt Data Write to NVM Write to NVM Discard the Ciphertext � Be inefficient for non-duplicate writes � Be inefficient for duplicate writes • Serial execution latency • Unnecessary encryption 10

Prediction-based Parallelism The direct way The parallel way A Write Request A Write Request Detect Duplication Detect Duplication Encrypt Data Yes Is duplicate ? Cancel the Write Is duplicate ? No No Yes Encrypt Data Write to NVM Write to NVM Discard the Ciphertext Duplicate Non-duplicate Prediction 11

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A History window 1: duplicate 0: non-duplicate 12

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU A D C B History window 1: duplicate 0: non-duplicate 13

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A 0 History window 1: duplicate 0: non-duplicate 14

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU B D C A 0 History window 1: duplicate 0: non-duplicate 15

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU B D C A Predict 0 History window 1: duplicate 0: non-duplicate 16

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A 0 0 History window 1: duplicate 0: non-duplicate 17

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU C D B A 0 0 History window 1: duplicate 0: non-duplicate 18

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU C D B A Predict 0 0 History window 1: duplicate 0: non-duplicate 19

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: 92.1% accuracy Memory CPU D C B A 0 0 0 History window 1: duplicate 0: non-duplicate 20

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A 0 0 0 History window 1: duplicate 0: non-duplicate 21

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: Memory CPU D C B A Predict 0 0 0 History window 1: duplicate 0: non-duplicate 22

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: 92.1% 93.6% Memory CPU D C B A Predict 0 0 0 History window 1: duplicate 0: non-duplicate 23

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: 92.1% 93.6% Memory CPU D C B A Predict 0 0 0 History window 1: duplicate 0: non-duplicate 24

Prediction-based Parallelism � How to know whether a cache line is duplicate beforehand? � Observation: duplication states of most memory writes are the same as those of their previous ones � A prediction scheme: 92.1% 93.6% Why can we achieve such a high prediction accuracy? � Rationale: the size of duplicate (non-duplicate) data is usually much larger than a cache line – E.g., a page (4KB) is duplicate or non-duplicate: 100% accuracy 25

Lightweight Deduplication for NVMM � Traditional deduplication N Non-duplicate Hash computation latency: >300ns SHA1/ SHA1/ ≈ NVM write latency Match? Match? MD5 MD5 Y Duplicate 26

Lightweight Deduplication for NVMM � Traditional deduplication N Non-duplicate Hash computation latency: >300ns SHA1/ SHA1/ ≈ NVM write latency Match? Match? MD5 MD5 Y Duplicate � DeWrite The latency is 91ns at most N Non-duplicate N CRC-32 CRC-32 Match? Match? Y Y Read data Read data Duplicate Match? Match? and compare and compare 15ns 75ns+1ns 27

Metadata Colocation � Encryption metadata: per-line counter Encryption Decryption Plaintext LineAddr Plaintext Counter + + OTP AES-ctr Key Ciphertext Ciphertext 28

Metadata Colocation � Encryption metadata: per-line counter � Deduplication metadata: address mapping, reverted hash 29

Improving the Performance and Endurance of Encrypted Non-volatile - PowerPoint PPT Presentation

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes Pengfei Zuo, Yu Hua, Ming Zhao, Wen Zhou, Yuncheng Guo Huazhong University of Science and Technology (HUST), China Arizona State

Ni-MH ENDURANCE range Benefits 2 ENDURANCE range- ARTS Energy proprietary & Confidential

Endurance Overview Barry Fudge Head of Endurance and Sport Sciences Today We recognise

Training for Endurance Training for Endurance A. Career Highlights B. Program Goals and

Concept & Calendar 1 Porsche Endurance Trophy Benelux Concept Official Porsche

Learning Objectives Learning Objectives Overuse Injuries in Overuse Injuries in Endurance

Improving your Biological Power Hacking into your brain and body to maximize your results What

Traceback for End-to-End Encrypted Messaging Nirvan Tyagi Ian Miers Tom Ristenpart CCS 2019 1

TLS 1.3 Encrypted SNI ekr: ekr@rtfm.com dkg: dkg@aclu.org IETF 94 TLS 1.3 Encrypted SNI 1

Challenges With Building End-to-End Encrypted Challenges With Building End-to-End Encrypted

5000m and 10000m. What goes where and why? Content Coaching (an endurance runner) What is the

ABOUT US - COMMITTEE ABOUT US CYBEL VLO SERVICES The Endurance Hub A loca'on

AE-705: Introduction to Flight Range and Endurance Subham Panda PEC University of Technology

GREENVILLE VEGAN POTLUCK DINNER: ENDURANCE TRIATHLON ON A PLANT- BASED DIET Who Am I???

8 Best Practices for IT Incident Management With Dan Barthelemy, Endurance International Group

UKA ENDURANCE ADM V1.2 INTRODUCTION Many coaches have requested guidelines surrounding long-term

SASHI CHELIAH Celebrity chef | MasterChef winner | Restaurateur| Presenter | Endurance athlete |

Cross-ISA Machine Emulation for Multicores Emilio G. Cota Columbia University Paolo Bonzini

Deduplication in VM Environments Frank Bellosa < bellosa@kit.edu > Konrad Miller <

Program Analysis in Relay Gus Smith December 5th, 2019

Information-Aware Type Systems Philippa Cowderoy SPLS March 2019 email: flippa@flippac.org

Self Study: Yeast Genome Comparison SESSION 4 MARTIN KRZYWINSKI Genome Sciences Centre BC

Copy number Aberra4ons Normal cells: Cancer cells: Extensive gene duplica4on/dele4on Red and

Memory CoW in Xen Talk overview Why is CoW need? Memory CoW basics CoW mechanism:

CS473 Web Search (II) Luo Si Department of Computer Science Purdue University Modified Slides

Sambuz

Useful Links

Newsletter

Mail Us

Improving the Performance and Endurance of Encrypted Non-volatile - PowerPoint PPT Presentation

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes Pengfei Zuo, Yu Hua, Ming Zhao*, Wen Zhou, Yuncheng Guo Huazhong University of Science and Technology (HUST), China *Arizona State

Ni-MH ENDURANCE range Benefits 2 ENDURANCE range- ARTS Energy proprietary &amp; Confidential

Endurance Overview Barry Fudge Head of Endurance and Sport Sciences Today We recognise

Training for Endurance Training for Endurance A. Career Highlights B. Program Goals and

Concept &amp; Calendar 1 Porsche Endurance Trophy Benelux Concept Official Porsche

Learning Objectives Learning Objectives Overuse Injuries in Overuse Injuries in Endurance

Improving your Biological Power Hacking into your brain and body to maximize your results What

Traceback for End-to-End Encrypted Messaging Nirvan Tyagi Ian Miers Tom Ristenpart CCS 2019 1

TLS 1.3 Encrypted SNI ekr: ekr@rtfm.com dkg: dkg@aclu.org IETF 94 TLS 1.3 Encrypted SNI 1

Challenges With Building End-to-End Encrypted Challenges With Building End-to-End Encrypted

5000m and 10000m. What goes where and why? Content Coaching (an endurance runner) What is the

ABOUT US - COMMITTEE ABOUT US CYBEL VLO SERVICES The Endurance Hub A loca'on

AE-705: Introduction to Flight Range and Endurance Subham Panda PEC University of Technology

GREENVILLE VEGAN POTLUCK DINNER: ENDURANCE TRIATHLON ON A PLANT- BASED DIET Who Am I???

8 Best Practices for IT Incident Management With Dan Barthelemy, Endurance International Group

UKA ENDURANCE ADM V1.2 INTRODUCTION Many coaches have requested guidelines surrounding long-term

SASHI CHELIAH Celebrity chef | MasterChef winner | Restaurateur| Presenter | Endurance athlete |

Cross-ISA Machine Emulation for Multicores Emilio G. Cota Columbia University Paolo Bonzini

Deduplication in VM Environments Frank Bellosa &lt; bellosa@kit.edu &gt; Konrad Miller &lt;

Program Analysis in Relay Gus Smith December 5th, 2019

Information-Aware Type Systems Philippa Cowderoy SPLS March 2019 email: flippa@flippac.org

Self Study: Yeast Genome Comparison SESSION 4 MARTIN KRZYWINSKI Genome Sciences Centre BC

Copy number Aberra4ons Normal cells: Cancer cells: Extensive gene duplica4on/dele4on Red and

Memory CoW in Xen Talk overview Why is CoW need? Memory CoW basics CoW mechanism:

CS473 Web Search (II) Luo Si Department of Computer Science Purdue University Modified Slides

Sambuz

Useful Links

Newsletter

Mail Us

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes Pengfei Zuo, Yu Hua, Ming Zhao, Wen Zhou, Yuncheng Guo Huazhong University of Science and Technology (HUST), China Arizona State

Ni-MH ENDURANCE range Benefits 2 ENDURANCE range- ARTS Energy proprietary & Confidential

Concept & Calendar 1 Porsche Endurance Trophy Benelux Concept Official Porsche

Deduplication in VM Environments Frank Bellosa < bellosa@kit.edu > Konrad Miller <