LIBMPK: SOFTWARE ABSTRACTION FOR INTEL MEMORY PROTECTION KEYS (INTEL MPK) Soyeon Park, Sangho Lee, Wen Xu, Hyungon Moon and Taesoo Kim
INTRODUCTION SECURITY CRITICAL MEMORY REGIONS NEED PROTECTION ▸ JIT page “To achieve code execution, we can simply locate one of these RWX JIT pages and overwrite it with our own shellcode.” - [1] ▸ Personal information ▸ Password ▸ Private key “We confirmed that all individuals used only the Heartbleed exploit to obtain the private key.” - [2] [1] Amy Burnett, et al. “Weaponization of a Javascriptcore vulnerability” RET2 Systems Engineering Blog [2] Nick Sullivan “The Results of the CloudFlare Challenge” CloudFlare Blog
INTRODUCTION EXAMPLE 1 - HEARTBLEED ATTACK P r i v a t e k e y 1000 bytes H E L L O Reply “HELLO” (1000 bytes) · · · Private key · · · · · · t s e u q e r d e e l b t r a e H y e Web Server k e t a v i r p g n i d u l c n i a t a d d e k a e L
INTRODUCTION EXAMPLE 1 : EXISTING SOLUTION TO PROTECT MEMORY ▸ Process separation Process Process MEMORY MEMORY [1] Song, Chengyu, et al. "Exploiting and Protecting Dynamic Code Generation”, NDSS 2015. [2] Litton, James, et al. "Light-Weight Contexts: An OS Abstraction for Safety and Performance”, OSDI 2016.
INTRODUCTION EXAMPLE 2 - EXISTING SOLUTION TO PROTECT JIT PAGE ▸ JIT page W^X protection Process mprotect(W) Write code Execute Write Write mprotect(RX) Code Cache
INTRODUCTION PROBLEMS OF EXISTING SOLUTIONS ▸ Process Separation High overhead to spawn new process and synch data ▸ W^X Protection Multiple cost to change permission of multiple pages Race condition due to permission synchronization This talk: utilizing a hardware mechanism, Intel Memory Protection Key (MPK), to address these challenges
OUTLINE OUTLINE ▸ Introduction ▸ Intel MPK Explained ▸ Challenges ▸ Design ▸ Implementation ▸ Evaluation ▸ Discussion ▸ Related Work ▸ Conclusion
INTEL MPK EXPLAINED OVERVIEW ▸ Support fast permission change for page groups with single instruction ▸ Fast single invocation ▸ Fast permission change for multiple pages 18 mprotect (contiguous) mprotect (sparse) 13.5 Latency (ms) mprotect Intel MPK 9 Userspace 4.5 Kernel 0 1000 6000 11000 16000 21000 26000 31000 36000 Number of pages
INTEL MPK EXPLAINED UNDERLINE IMPLEMENTATION pkey 2 <- R/W pkey 2 <- R 32-bit register page 120 -> R/W page 120 -> R R W R W WRPKRU 16 pkeys RDPKRU pkey_mprotect Kernel ··· page # pkey perm. ··· 120 2 R/W ▸ Permissions per cpu ··· ▸ 32-bit PKRU register contains keys/perm < Page table> ▸ WRPKRU: write key/perm ▸ RDPKRU: read key/perm
INTEL MPK EXPLAINED EXAMPLE - JIT PAGE W^X PROTECTION pkey = 1 function init() pkey = pkey_alloc() Grant pkey_mprotect(code_cache, len, RWX, pkey) permission PKRU Register function JIT() 1 WRPKRU(pkey, W) CODE CACHE Write code in ... W code cache RWX RWX write code cache ... WRPKRU(pkey, R) 1 Revoke function fini() permission R pkey_free(pkey)
INTEL MPK EXPLAINED EXAMPLE : EXECUTABLE-ONLY MEMORY function init() pkey = pkey_alloc() pkey_mprotect(code_cache, len, RWX, pkey) pkey function JIT() WRPKRU(pkey, W) CODE CACHE ... RWX write code cache ... WRPKRU(pkey, R) function fini() pkey_free(pkey)
INTEL MPK EXPLAINED EXAMPLE : EXECUTABLE-ONLY MEMORY function init() pkey = pkey_alloc() pkey_mprotect(code_cache, len, RWX, pkey) pkey function JIT() WRPKRU(pkey, W) CODE CACHE ... RWX RWX write code cache ... WRPKRU(pkey, None) function fini() pkey_free(pkey)
OUTLINE OUTLINE ▸ Introduction ▸ Intel MPK Explained ▸ Challenges ▸ Non-scalable Hardware Resource ▸ Asynchronous Permission Change ▸ Design ▸ Implementation ▸ Evaluation ▸ Discussion ▸ Related Work ▸ Conclusion
CHALLENGES NON-SCALABLE HARDWARE RESOURCE ▸ Only 16 keys are provided Process pkey pkey pkey 1 16 ? W W W Write code Write code Write code cache 1 cache 16 cache 17 R R R pkey pkey pkey pkey 1 2 3 4 1 2 3 4 pkey pkey 5 … 16 17 5 16
CHALLENGES ASYNCHRONOUS PERMISSION CHANGE - PROS ▸ Permission change with MPK is per-thread intrinsically Process RX RX RX W RX Write Code RX Cache R Code Cache
CHALLENGES ASYNCHRONOUS PERMISSION CHANGE - PROS ▸ Permission change with MPK is per-thread intrinsically Process RX RX W W pkey RX Write Code RX Cache Write R Code Cache
CHALLENGES ASYNCHRONOUS PERMISSION CHANGE - CONS ▸ Permission synchronization is necessary in some context Process RX RX X W RX Write Code RX Cache pkey None Code Cache
CHALLENGES ASYNCHRONOUS PERMISSION CHANGE - CONS ▸ Permission synchronization is necessary in some context Process RX RX X W RX Write Code RX Cache Read pkey None Code Cache
DESIGN REVISIT : CHALLENGES ▸ Non-scalable Hardware Resources Key virtualization solve by key indirection. ▸ Asynchronous Permission Change libmpk provide permission synchronization API
DESIGN KEY VIRTUALIZATION ▸ Decoupling physical keys from user interface ▸ Key indirection working like cache W W W Write Write Write code code code R R R pkey 1 pkey 16 pkey ? vkey 1 vkey 16 vkey 17 😲 Application Library 😋 Evicted pkey 1 pkey 16
DESIGN INTER-THREAD PERMISSION SYNCHRONIZATION RX X RX X THREAD B THREAD A STATE : RUNNING STATE : RUNNING RUNNING SLEEP ➍ return ➌ interrupt Userspace ➊ call mpk_mprotect() Kernel task_work ➎ update PKRU (rescheduled) pkey_sync ➋ add hooks WRPKRU
IMPLEMENTATION IMPLEMENTATION ▸ libmpk is written in C/C++ ▸ Userspace library : 663 LoC ▸ Kernel support : 1K LoC ▸ Permission Synchronization ▸ Kernel module for managing metadata ▸ Userspace cannot fabricate metadata ‣ We open source at https://github.com/sslab-gatech/libmpk
IMPLEMENTATION USE CASE - JIT PAGE W^X PROTECTION function init() vkey = libmpk_mmap(&code_cache, len, RWX) Key virtualization function JIT() libmpk_begin(vkey, W) ... CODE CACHE write code cache RWX RWX ... libmpk_end(vkey) libmpk_mprotect(vkey, X) X X X Permission synchronization X
OUTLINE OUTLINE ▸ Introduction ▸ Intel MPK Explained ▸ Challenges ▸ Design ▸ Implementation ▸ Evaluation ▸ Usability ▸ Checking overhead occurred by design ▸ Use cases - applying for memory isolation and protection ▸ Discussion ▸ Related Work ▸ Conclusion
EVALUATION LIBMPK IS EASY TO ADOPT ▸ OpenSSL (83 LoC) : protecting private key ▸ Memcached (117 LoC) : protecting slabs ▸ Chakracore (10 LoC) : protecting JIT pages
EVALUATION LATENCY - KEY VIRTUALIZATION ▸ Cache miss costs overhead due to eviction 3.0 2.3 Miss Time ( μ s) Hit 1.5 mprotect 0.8 0.0 0 25 50 75 100 Hit rate Reasonable overhead while providing similar functionality.
EVALUATION LATENCY - INTER-THREAD PERMISSION SYNCHRONIZATION mpk_mprotect ▸ Performance mprotect (4KB) mprotect (4000KB) ▸ 1,000 pages : 3.8x 40 ▸ Single page : 1.7x 30 Latency ( μ s) 20 10 0 1 5 10 15 20 25 30 35 40 Number of threads libmpk outperform mprotect regardless of the number of pages.
EVALUATION FAST MEMORY ISOLATION - OPENSSL & MEMCACHED For 1GB protection : OpenSSL ▸ ▸ original vs mpk_inthread : ▸ request/sec: 0.53% ▸ 0.01% slowdown mpk_synch vs mprotect : ▸ 8.1x original mpk_inthread original libmpk mpk_synch mprotect 500 1500 375 1125 request/sec Kbyte/sec 750 250 125 375 0 0 1 2 4 8 16 32 64 128 256 5121024 250 500 750 1000 Size of each request (KB) Number of connections
EVALUATION FAST AND SECURE W ⊕ X - JIT COMPILATION ▸ Chakracore mprotect-based protection ▸ Allows race-condition attack ▸ 4.39% performance improvement (31.11% at most) ▸ mprotect libmpk 1.30 Normalized Score 1.15 1.00 0.85 0.70 RICHARDS DELTABLUE CRYPTO RAYTRACE EARLEYBOYER REGEXP SPLAY SPLAYLATENCY NAVIERSTOKES PDFJS MANDREEL MANDREELLATENCY GAMEBOY CODELOAD BOX2D ZLIB TYPESCRIPT
DISCUSSION DISCUSSION ▸ Rogue data cache load (Meltdown) ▸ MPK is also affected by the Meltdown attack ▸ Hardware or software-level mitigation ▸ Code reuse attack ▸ Arbitrary executed WRPKRU may break the security ▸ Applying sandboxing or control-flow integrity ▸ Protection key use-after-free ▸ pkey_free does not perfectly free the protection key ▸ Pages are still associated with the pkey after free
RELATED WORK RELATED WORK ▸ ERIM [1] : Secure wrapper of MPK ▸ Shadow Stack [2] : Shadow stack protected by MPK ▸ XOM-Switch [3] : Code-reuse attack prevention with execute-only memory supported by MPK [1] Anjo Vahldiek-Oberwagner, et al. “ERIM: Secure, Efficient In-Process Isolation with Memory Protection Keys”, Security 2019 [2] Nathan Burow, et al. “Shining Light on Shadow Stacks”, Oakland 2019 [3] Mingwei Zhang, et al. “XOM-Switch: Hiding Your Code From Advanced Code Reuse Attacks in One Shot”, Black Hat Asia 2018
CONCLUSION CONCLUSION ▸ libmpk is a secure, scalable, and synchronizable abstraction of MPK for supporting fast memory protection and isolation with little effort. THANKS! https://github.com/sslab-gatech/libmpk
Recommend
More recommend