non transient side channels
play

Non-transient Side Channels Mengjia Yan Fall 2020 6.888 - PowerPoint PPT Presentation

Non-transient Side Channels Mengjia Yan Fall 2020 6.888 L5-Non-transient Side Channels 1 Lab Assignment Handout on course website Each (regular) student will receive an email Solo or 2-person group Individual GitHub repo


  1. Non-transient Side Channels Mengjia Yan Fall 2020 6.888 L5-Non-transient Side Channels 1

  2. Lab Assignment • Handout on course website • Each (regular) student will receive an email • Solo or 2-person group • Individual GitHub repo • Info about accessing a server machine • Listeners can send us an email if you want to try the lab • Advice: • Start early. The first step is not to implement the attack, but to reverse engineer the machine. 6.888 L5-Non-transient Side Channels 2

  3. Recap: Prime+Probe Sender line Sender Receiver Receiver line # ways Time Cache Set Prime Shared Cache 6.888 L5-Non-transient Side Channels 3

  4. Recap: Prime+Probe Sender line Sender Receiver Receiver line # ways Access Time Cache Set Prime Wait Shared Cache 6.888 L5-Non-transient Side Channels 4

  5. Recap: Prime+Probe Sender line Sender Receiver Receiver line # ways Access Time Cache Set Probe Prime Wait Shared Cache Receive “1” = 8 accesses à 1 miss 6.888 L5-Non-transient Side Channels 5

  6. Analogy: Bucket/Ball How many cache lines in total in the system? How to find the bucket used by the sender? Sender Receiver Receiver’s address Sender’s address # ways Cache Set Shared Cache Each cache set is a bucket that can hold 8 balls 6.888 L5-Non-transient Side Channels 6

  7. Practical Cache Side Channels 6.888 L5-Non-transient Side Channels 7

  8. Cache Mapping – Directly Mapped Cache • Can think cache mapping as a hash table with limited size • Linear cache set mapping using modular arithmetic index Tag Data (64 bytes) 31 0 Physical 0 32bit Address: 1 2 3 Set Index = (Addr / Block Size) % Number of Sets 4 5 6 7 6.888 L5-Non-transient Side Channels 8

  9. Cache Mapping – Directly Mapped Cache • Can think cache mapping as a hash table with limited size • Linear cache set mapping using modular arithmetic Assuming byte-addressable index Tag Data (64 bytes) 31 9 31 9 8 6 8 6 5 0 5 0 Physical 0 Tag 32bit Set Index Line offset Address: (high order bits) (3 bits) (6 bits) 1 2 3 To distinguish addresses Number of bits for set index = in the same set log 2 (Number of sets) 4 5 Question: Given an 1MB L2 with 1024 sets, how 6 many bits are used for set index? 7 6.888 L5-Non-transient Side Channels 9

  10. Cache Mapping – Set Associative Cache • Can think cache mapping as a hash table with limited size • Linear cache set mapping using modular arithmetic 2-way cache Tag Data index Tag Data 31 9 8 6 5 0 31 9 8 6 5 0 0 Physical Tag Tag Set Index Index Line offset Line offset Address: 1 (high order bits) (high order bits) (3 bits) (3 bits) (6 bits) (6 bits) 2 Find eviction set 3 == 4 Find addresses with the same set index bits 5 6 Question: How to decide which way to use? 7 Answer: Cache replacement policy. 6.888 L5-Non-transient Side Channels 10

  11. Address Translation (4KB page) 48 12 11 0 Programmer’s view Page offset Virtual page number Virtual Address (48bit): (12 bits) Page Copy Table page offset 31 12 11 0 system’s view Page offset physical page number Physical Address (32bit): (12 bits) 6.888 L5-Non-transient Side Channels 11

  12. Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) Cache mapping: Tag Index Line offset (8 sets) (3 bits) (6 bits) Cache mapping: 2 Tag Set Index Line offset bit (8 bits) (6 bits) (256 sets) Not controllable via virtual address. 6.888 L5-Non-transient Side Channels 12

  13. Huge Pages • Huge page size: 2MB or 1GB • Number of bits for page offset? 48 12 11 0 Virtual Address : Page offset Virtual page number 4KB page (12 bits) 48 21 20 0 Virtual Address : Page offset 2MB page Virtual page number (21 bits) Cache mapping: Tag Set Index Line offset (256 sets) (8 bits) (6 bits) 6.888 L5-Non-transient Side Channels 13

  14. Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … • Motivation: L2 L2 • A memory cannot be large and fast. Add level of cache to reduce miss penalty LLC A typical configuration of Intel Ivy Bridge. Configurations are different with processor types. L1-I/D cache L2 cache L3 cache (LLC) DRAM Size 32KB 256KB 1MB/core 16GB Associativity 4 or 8 8 16 N/A (# ways) Latency 1-5 12 ~40 ~150 (cycles) 6.888 L5-Non-transient Side Channels 14

  15. Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … • Motivation: L2 L2 • A memory cannot be large and fast. Add level of cache to reduce miss penalty LLC • LLC is generally divided into multiple slices • Conflict happens if addresses map to the same slice and the same set Tag Set Index Line offset An undocumented secret hash function Slice ID = Hash(bits) 6.888 L5-Non-transient Side Channels 15

  16. Eviction Set Construction Algorithm Sender line Sender Receiver Receiver line Time Access Candidate Addresses Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19 Shared Cache 6.888 L5-Non-transient Side Channels 16

  17. Eviction Set Construction Algorithm Sender line Sender Receiver Receiver line Access Target Time Address Access Candidate Wait Addresses Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19 6.888 L5-Non-transient Side Channels 17

  18. Eviction Set Construction Algorithm Sender line Sender Receiver Receiver line Access Target Address Time Access Candidate Measure Latency of Wait Addresses Each Candidate Address Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19 6.888 L5-Non-transient Side Channels 18

  19. Problems Due to Replacement Policy • Self-eviction due to replacement policy • An LRU (least recently used) example Initial: • A small trick: 1 2 3 4 5 6 7 8 Prime: • Access addresses in reverse order Victim access: 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 Probe: Which to evict? 6.888 L5-Non-transient Side Channels 19

  20. Measure Latency of Multiple Accesses • HW Prefetcher + Out-of-order execution What we expect: Ld A1 Ld A2 …… Ld A7 Ld A8 T1 = rdtsc() Time Dummy1=Ld(Addr1) What actually will happen: …… Dummy8=Ld(Addr8) Ld A1 T2 = rdtsc() Ld A2 …… Latency = T2-T1 Ld A7 Ld A8 Time 6.888 L5-Non-transient Side Channels 20

  21. Out-of-Order Processor Writeback Fetch Decode RegRead Execute (Commit) Check whether the register to read is ready. Ld A1 Ld A2 Question: How to serialize …… Ld A7 data accesses? Ld A8 Time 6.888 L5-Non-transient Side Channels 21

  22. Serialize Data Accesses • A special instruction “mfence” https://www.felixcloutier.com/x86/mfence • Add data dependency by creating a linked list Pointer to the Dummy1 = Ld(Addr1) content next node dummy A1 dummy A2 dummy A3 …… Addr2 = Ld(Addr1) • Double linked list to access addresses in reverse order A1 A2 A3 …… A1 A2 6.888 L5-Non-transient Side Channels 22

  23. Handle Noise A real-world example: Square-and-Multiply Exponentiation • What you generally see in papers: for i = n-1 to 0 do r = sqr(r) mod n if e i == 1 then r = mul(r, b) mod n end end 6.888 L5-Non-transient Side Channels 23

  24. The Multiply Function 6.888 L5-Non-transient Side Channels 24

  25. Raw Trace Access latencies measured in the probe operation in Prime+Probe. A sequence of “01010111011001” can be deduced as part of the exponent. 6.888 L5-Non-transient Side Channels 25

  26. There may exist other problems • Tips for lab assignment • Build the attack step-by-step • Recommend to read “Last-Level Cache Side-Channel Attacks are Practical” • Ask questions via Piazza 6.888 L5-Non-transient Side Channels 26

  27. Defenses 6.888 L5-Non-transient Side Channels 27

  28. Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker X {Cache, DRAM, TLB, NoC, etc.} {Transient, Non-transient} Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 28

  29. Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker Defenses: Block creation of signals: Block detection of signals: Close the channel: Oblivious execution, Randomization, etc. Isolation, etc. speculative execution defenses, etc. Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 29

Recommend


More recommend