memor ory d y defen enses es
play

Memor ory D y Defen enses es The Elevation from Obscurity to - PowerPoint PPT Presentation

Memor ory D y Defen enses es The Elevation from Obscurity to Headlines Rajeev Balasubramonian School of Computing, University of Utah 2 Image sources: pinterest, gizmodo Spectre Overview x is controlled Thanks to bpred, x can be


  1. Memor ory D y Defen enses es The Elevation from Obscurity to Headlines Rajeev Balasubramonian School of Computing, University of Utah

  2. 2 Image sources: pinterest, gizmodo

  3. Spectre Overview x is controlled Thanks to bpred, x can be anything by attacker array1[ ] is the secret if (x < array1_size) Victim Code y = array2[ array1[x] ]; Access pattern of array2[ ] betrays the secret 3

  4. What Did We Learn? Speculation + Specific Code + No side channel defenses 4

  5. The Wake Up Call Say Yes to Side Channel Defenses 5

  6. Overview • Memory timing channels • The Fixed Service memory controller [MICRO 2015] • Memory access patterns • Near-data ORAM [HPCA 2018] • Memory integrity • Improving SGX with VAULT [ASPLOS 2018] 6

  7. Memory Timing Channels VM 1 Victim CORE 1 MC VM 2 Attacker CORE 2 Two VMs sharing a processor and memory channel 7

  8. Possible Attacks VM 1 CORE 1 MC VM 2 CORE 2 Attack 1: Bits in a key influence memory accesses Attack 2: A victim can betray secrets through memory activity Attack 3: A covert channel attack 8

  9. Covert Channel Attack Electronic health records 3 rd party document reader VM 1 CORE 1 MC VM 2 CORE 2 Conspirator A covert channel 9

  10. Fixed Service Memory Controller VM-1 has its data in Rank-1 VM-2 has its data in Rank-2 … VM-8 has its data in Rank-8 VM-1 begins memory access VM-2 begins memory access VM-8 begins memory access … VM-1 begins memory access Time (in cycles) 0 7 49 56 10

  11. Fixed Service Details • Deterministic schedule • No resource contention • Dummy accesses if nothing pending • Lower bandwidth, higher latency • Why 7? DRAM timing parameters, worst-case • Rank partitioning: 7 cycle gap • Bank partitioning: 15 cycle gap • No partitioning: 43 cycle gap 11

  12. Overcoming Worst-Case • In one batch of requests, schedule all reads, followed by all writes (worst-case encountered once per batch) • Impose constraints on banks that can be accessed – triple bank alternation Red: Bank-id mod 3 = 0 Blue: Bank-id mod 3 = 1 Green: Bank-id mod 3 = 2 0 1 2 3 4 5 6 7 0 1 2 3 0 4 5 6 7 15 3x15 = 45 > 43 12

  13. Results NON-SECURE BASELINE 1.0 PERFORMANCE FS 0.74 FS: RD/WR-REORDER 0.48 FS: TRIPLE ALTERNATION 0.43 0.40 TP 0.20 Increased OS complexity TP 13 NO PARTITIONING BANK PARTITIONING RANK PARTITIONING

  14. Overview • Memory timing channels • The Fixed Service memory controller [MICRO 2015] • Memory access patterns • Near-data ORAM [HPCA 2018] • Memory integrity • Improving SGX with VAULT [ASPLOS 2018] 14

  15. Oblivious RAM • Assumes that addresses are exposed Image sources: vice.com • PHANTOM [CCS’13]: Memory bandwidth overhead of … 15

  16. Oblivious RAM • Assumes that addresses are exposed Image sources: vice.com • PHANTOM [CCS’13]: Memory bandwidth overhead of … 2560x (about 280x today) 16

  17. Path-ORAM Stash 17

  18. A Distributed ORAM Authenticated buffer chip ORAM operations shift from Processor MC Processor to SDIMM. ORAM traffic pattern shifts from the memory bus to on- All buses are exposed SDIMM “private” buses. Buffer chip and processor communication is encrypted 18

  19. The Independent ORAM Protocol 1. Each SDIMM handles a subtree of the ORAM tree. 2. Only traffic on shared memory channel: CPU Processor MC requests and leaf-id re- assignments. 3. As much parallelism as the number of SDIMMs. 19

  20. The Split ORAM Protocol 1. Each SDIMM handles a subset of every node. 2. Only metadata is sent to the processor. Processor MC 3. The processor tells the SDIMMs how to shuffle data. 4. Lower latency per ORAM request, but lower parallelism as well. 20

  21. ORAM Results Summary • Can combine the Independent and Split protocols to find the best balance of latency and parallelism • Bandwidth demands are reduced from 280x  35x Execution time overheads from 5.2x  2.7x • Reduces memory energy by 2.5x 21

  22. Overview • Memory timing channels • The Fixed Service memory controller [MICRO 2015] • Memory access patterns • Near-data ORAM [HPCA 2018] • Memory integrity • Improving SGX with VAULT [ASPLOS 2018] 22

  23. Intel SGX Basics 1. Enclave data is protected Intel SGX from malicious OS/operator. EPC 96MB 2. A per-block integrity tree Enclave 1 Non-EPC Sen protects EPC. … 3. A per-page integrity tree Non-EPC NSen Enclave N protects non-EPC Sen. Memory 4. This keeps overheads (bw and capacity) of integrity tree low. 5. Entails frequent paging between EPC and non-EPC. 23

  24. Intel SGX Basics 1. Enclave data is protected Intel SGX from malicious OS/operator. EPC 96MB 2. A per-block integrity tree Enclave 1 Non-EPC Sen protects EPC. … 3. A per-page integrity tree Non-EPC NSen Enclave N protects non-EPC Sen. Memory 4. This keeps overheads (bw and capacity) of integrity tree low. VAULT: Unify EPC and non-EPC to 5. Entails frequent paging reduce paging. New integrity tree for between EPC and non-EPC. low bw. Better metadata for capacity. 24

  25. SGX Overheads 25

  26. Bonsai Merkle Tree … Root block in processor … Intermediate Hashes Hash Hash Hash 64 bits … 512 bits … Hash Hash Hash Leaf hashes … Shared … Arity=8 512 bits for global Local 64 counters counter counter Arity=64 … 7b … 64b 7b MAC MAC Data Block Data Block 64+512 bits 26

  27. VAULT 1. Small linkage counters  high arity, compact/shallow tree, better cacheability. 2. Variable counter width to manage overflow. 3. Reduces bandwidth overhead for integrity verification. 27

  28. VAULT+SMC 1. MAC storage and bw overheads are high. 2. Sharing a MAC among 4 blocks reduces storage, but incr bw. 3. A block is compressed and the MAC is embedded in the block  reduces bw and storage. 28

  29. Integrity Results Summary • 3.7x performance improvement over SGX – primarily because of lower paging overheads • A large effective EPC is palatable – 4.7% storage overhead and a more scalable tree (34% better than the SGX tree) VAULT+SMC 29

  30. Big Finish • Memory defenses were purely academic pursuits • Integrity trees now a part of Intel SGX: overheads of 2x – 40x • VAULT improves integrity overhead to 1.5x – 2.5x • FS eliminates timing channels with overhead of 2x • SDIMM improves ORAM overhead to 2.7x • An array of memory defenses is now commercially viable … and strategic given latent vulnerabilities Acks: Ali Shafiee, Meysam Taassori, Akhila Gundu, Manju Shevgoor, Mohit Tiwari, Feifei Li, NSF, Intel. 30

Recommend


More recommend