Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) 6.888 L5-Non-transient Side Channels 12
Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) Cache mapping: (8 sets) 6.888 L5-Non-transient Side Channels 12
Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) Cache mapping: Line offset (8 sets) (6 bits) 6.888 L5-Non-transient Side Channels 12
Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) Cache mapping: Index Line offset (8 sets) (3 bits) (6 bits) 6.888 L5-Non-transient Side Channels 12
Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) Cache mapping: Tag Index Line offset (8 sets) (3 bits) (6 bits) 6.888 L5-Non-transient Side Channels 12
Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) Cache mapping: Tag Index Line offset (8 sets) (3 bits) (6 bits) Cache mapping: (256 sets) 6.888 L5-Non-transient Side Channels 12
Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) Cache mapping: Tag Index Line offset (8 sets) (3 bits) (6 bits) Cache mapping: Tag Set Index Line offset (8 bits) (6 bits) (256 sets) 6.888 L5-Non-transient Side Channels 12
Find Eviction Set Using Virtual Addresses 48 12 11 0 Virtual Address (48bit): Virtual page number Page offset 31 12 11 0 Physical Address (32bit): Page offset physical page number 4KB page (12 bits) Cache mapping: Tag Index Line offset (8 sets) (3 bits) (6 bits) Cache mapping: 2 Tag Set Index Line offset bit (8 bits) (6 bits) (256 sets) Not controllable via virtual address. 6.888 L5-Non-transient Side Channels 12
Huge Pages • Huge page size: 2MB or 1GB • Number of bits for page offset? 6.888 L5-Non-transient Side Channels 13
Huge Pages • Huge page size: 2MB or 1GB • Number of bits for page offset? 48 12 11 0 Virtual Address : Page offset Virtual page number 4KB page (12 bits) 48 21 20 0 Virtual Address : Page offset 2MB page Virtual page number (21 bits) 6.888 L5-Non-transient Side Channels 13
Huge Pages • Huge page size: 2MB or 1GB • Number of bits for page offset? 48 12 11 0 Virtual Address : Page offset Virtual page number 4KB page (12 bits) 48 21 20 0 Virtual Address : Page offset 2MB page Virtual page number (21 bits) Cache mapping: Tag Set Index Line offset (256 sets) (8 bits) (6 bits) 6.888 L5-Non-transient Side Channels 13
Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … L2 L2 LLC 6.888 L5-Non-transient Side Channels 14
Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … • Motivation: L2 L2 • A memory cannot be large and fast. Add level of cache to reduce miss penalty LLC 6.888 L5-Non-transient Side Channels 14
Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … • Motivation: L2 L2 • A memory cannot be large and fast. Add level of cache to reduce miss penalty LLC A typical configuration of Intel Ivy Bridge. Configurations are different with processor types. L1-I/D cache L2 cache L3 cache (LLC) DRAM Size 32KB 256KB 1MB/core 16GB Associativity 4 or 8 8 16 N/A (# ways) Latency 1-5 12 ~40 ~150 (cycles) 6.888 L5-Non-transient Side Channels 14
Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … • Motivation: L2 L2 • A memory cannot be large and fast. Add level of cache to reduce miss penalty LLC • LLC is generally divided into multiple slices 6.888 L5-Non-transient Side Channels 15
Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … • Motivation: L2 L2 • A memory cannot be large and fast. Add level of cache to reduce miss penalty LLC • LLC is generally divided into multiple slices Tag Set Index Line offset 6.888 L5-Non-transient Side Channels 15
Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … • Motivation: L2 L2 • A memory cannot be large and fast. Add level of cache to reduce miss penalty LLC • LLC is generally divided into multiple slices Tag Set Index Line offset An undocumented secret hash function Slice ID = Hash(bits) 6.888 L5-Non-transient Side Channels 15
Multi-level Caches core core D-L1 D-L1 I-L1 I-L1 … • Motivation: L2 L2 • A memory cannot be large and fast. Add level of cache to reduce miss penalty LLC • LLC is generally divided into multiple slices • Conflict happens if addresses map to the same slice and the same set Tag Set Index Line offset An undocumented secret hash function Slice ID = Hash(bits) 6.888 L5-Non-transient Side Channels 15
Eviction Set Construction Algorithm Sender line Sender Receiver Receiver line Time Access Candidate Addresses Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19 Shared Cache 6.888 L5-Non-transient Side Channels 16
Eviction Set Construction Algorithm Sender line Sender Receiver Receiver line Access Target Time Address Access Candidate Wait Addresses Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19 6.888 L5-Non-transient Side Channels 17
Eviction Set Construction Algorithm Sender line Sender Receiver Receiver line Access Target Address Time Access Candidate Measure Latency of Wait Addresses Each Candidate Address Vila et al. Theory and Practice of Finding Eviction Sets. S&P’19 6.888 L5-Non-transient Side Channels 18
Problems Due to Replacement Policy • Self-eviction due to replacement policy • An LRU (least recently used) example Initial: 6.888 L5-Non-transient Side Channels 19
Problems Due to Replacement Policy • Self-eviction due to replacement policy • An LRU (least recently used) example Initial: 1 2 3 4 5 6 7 8 Prime: 6.888 L5-Non-transient Side Channels 19
Problems Due to Replacement Policy • Self-eviction due to replacement policy • An LRU (least recently used) example Initial: 1 2 3 4 5 6 7 8 Prime: Victim access: 1 9 2 3 4 5 6 7 8 6.888 L5-Non-transient Side Channels 19
Problems Due to Replacement Policy • Self-eviction due to replacement policy • An LRU (least recently used) example Initial: 1 2 3 4 5 6 7 8 Prime: Victim access: 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 Probe: Which to evict? 6.888 L5-Non-transient Side Channels 19
Problems Due to Replacement Policy • Self-eviction due to replacement policy • An LRU (least recently used) example Initial: • A small trick: 1 2 3 4 5 6 7 8 Prime: • Access addresses in reverse order Victim access: 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 Probe: Which to evict? 6.888 L5-Non-transient Side Channels 19
Measure Latency of Multiple Accesses • HW Prefetcher + Out-of-order execution T1 = rdtsc() Dummy1=Ld(Addr1) …… Dummy8=Ld(Addr8) T2 = rdtsc() Latency = T2-T1 6.888 L5-Non-transient Side Channels 20
Measure Latency of Multiple Accesses • HW Prefetcher + Out-of-order execution What we expect: Ld A1 Ld A2 …… Ld A7 Ld A8 T1 = rdtsc() Time Dummy1=Ld(Addr1) …… Dummy8=Ld(Addr8) T2 = rdtsc() Latency = T2-T1 6.888 L5-Non-transient Side Channels 20
Measure Latency of Multiple Accesses • HW Prefetcher + Out-of-order execution What we expect: Ld A1 Ld A2 …… Ld A7 Ld A8 T1 = rdtsc() Time Dummy1=Ld(Addr1) What actually will happen: …… Dummy8=Ld(Addr8) Ld A1 T2 = rdtsc() Ld A2 …… Latency = T2-T1 Ld A7 Ld A8 Time 6.888 L5-Non-transient Side Channels 20
Out-of-Order Processor Writeback Fetch Decode RegRead Execute (Commit) 6.888 L5-Non-transient Side Channels 21
Out-of-Order Processor Writeback Fetch Decode RegRead Execute (Commit) Check whether the register to read is ready. 6.888 L5-Non-transient Side Channels 21
Out-of-Order Processor Writeback Fetch Decode RegRead Execute (Commit) Check whether the register to read is ready. Ld A1 Ld A2 …… Ld A7 Ld A8 Time 6.888 L5-Non-transient Side Channels 21
Out-of-Order Processor Writeback Fetch Decode RegRead Execute (Commit) Check whether the register to read is ready. Ld A1 Ld A2 Question: How to serialize …… Ld A7 data accesses? Ld A8 Time 6.888 L5-Non-transient Side Channels 21
Serialize Data Accesses • A special instruction “mfence” https://www.felixcloutier.com/x86/mfence 6.888 L5-Non-transient Side Channels 22
Serialize Data Accesses • A special instruction “mfence” https://www.felixcloutier.com/x86/mfence • Add data dependency by creating a linked list Dummy1 = Ld(Addr1) Addr2 = Ld(Addr1) 6.888 L5-Non-transient Side Channels 22
Serialize Data Accesses • A special instruction “mfence” https://www.felixcloutier.com/x86/mfence • Add data dependency by creating a linked list Pointer to the Dummy1 = Ld(Addr1) content next node dummy A1 dummy A2 dummy A3 …… Addr2 = Ld(Addr1) 6.888 L5-Non-transient Side Channels 22
Serialize Data Accesses • A special instruction “mfence” https://www.felixcloutier.com/x86/mfence • Add data dependency by creating a linked list Pointer to the Dummy1 = Ld(Addr1) content next node dummy A1 dummy A2 dummy A3 …… Addr2 = Ld(Addr1) • Double linked list to access addresses in reverse order A1 A2 A3 …… A1 A2 6.888 L5-Non-transient Side Channels 22
Handle Noise 6.888 L5-Non-transient Side Channels 23
Handle Noise A real-world example: Square-and-Multiply Exponentiation • What you generally see in papers: for i = n-1 to 0 do r = sqr(r) mod n if e i == 1 then r = mul(r, b) mod n end end 6.888 L5-Non-transient Side Channels 23
The Multiply Function 6.888 L5-Non-transient Side Channels 24
The Multiply Function 6.888 L5-Non-transient Side Channels 24
Raw Trace Access latencies measured in the probe operation in Prime+Probe. A sequence of “01010111011001” can be deduced as part of the exponent. 6.888 L5-Non-transient Side Channels 25
There may exist other problems • Tips for lab assignment • Build the attack step-by-step • Recommend to read “Last-Level Cache Side-Channel Attacks are Practical” • Ask questions via Piazza 6.888 L5-Non-transient Side Channels 26
Defenses 6.888 L5-Non-transient Side Channels 27
Micro-architecture Side Channels A Channel (a micro-architecture structure) Victim Attacker Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 28
Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 28
Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 28
Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 28
Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker X {Cache, DRAM, TLB, NoC, etc.} {Transient, Non-transient} Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 28
Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker Defenses: Block creation of signals: Oblivious execution, speculative execution defenses, etc. Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 29
Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker Defenses: Block creation of signals: Close the channel: Oblivious execution, Isolation, etc. speculative execution defenses, etc. Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 29
Micro-architecture Side Channels secret-dependent execution A Channel (a micro-architecture structure) Victim Attacker Defenses: Block creation of signals: Block detection of signals: Close the channel: Oblivious execution, Randomization, etc. Isolation, etc. speculative execution defenses, etc. Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO’18 6.888 L5-Non-transient Side Channels 29
Defense Design Considerations Security Performance Portability 6.888 L5-Non-transient Side Channels 30
The Problem: The ISA Abstraction • Interface between HW and SW: ISA • Advantage: HW optimizations without affecting usability/portability Software (branch, arithmetic instruction, load/store) ISA (instruction set architecture) Hardware (caches, DRAM, TLBs, etc.) 6.888 L5-Non-transient Side Channels 31
From https://www.felixcloutier.com/x86/index.html 6.888 L5-Non-transient Side Channels 32
The Problem: The ISA Abstraction • Interface between HW and SW: ISA • ISA specifies functionality, not performance/timing Software (branch, arithmetic • Compare Intel Ivy Bridge and Cascade Processor instruction, load/store) ISA (instruction set architecture) Hardware Example: (caches, DRAM, TLBs, etc.) DEC [addr] 6.888 L5-Non-transient Side Channels 33
Data Oblivious/“Constant time” Programming Write program w/o data-dependent behavior 6.888 L5-Non-transient Side Channels 34
Data Oblivious/“Constant time” Programming Write program w/o data-dependent behavior Original: if ( secret ) a = *(addr1); else a = *(addr2); secret = confidential addr1 = public addr2 = public 6.888 L5-Non-transient Side Channels 34
Data Oblivious/“Constant time” Programming Write program w/o data-dependent behavior Original: Data Oblivious: if ( secret ) a ← load (addr1); a = *(addr1); b ← load (addr2); else cmov a = ( secret ) ? a : b; a = *(addr2); secret = confidential addr1 = public addr2 = public 6.888 L5-Non-transient Side Channels 34
Data Oblivious/“Constant time” Programming Write program w/o data-dependent behavior Original: Data Oblivious: secret if ( secret ) a ← load (addr1); a = *(addr1); b ← load (addr2); a ← load addr1 b ← load addr2 else cmov a = ( secret ) ? a : b; a b a = *(addr2); cmov secret , b, a secret = confidential addr1 = public addr2 = public 6.888 L5-Non-transient Side Channels 34
Programming in Circuit Abstraction • Program = DAG (“circuit”) op1 Node/Gate op2 • Operations = nodes (“gates”) • Data transfers = edges (“wires”) Edge/Wire op3 • Topology must be confidential data- independent • Each gate’s execution must hide its inputs • Each wire must hide the value it carries op4 6.888 L5-Non-transient Side Channels 35
What assumptions underpin the model? secret addr2 addr1 if ( secret ) a = *(addr1); a ← load addr1 b ← load addr2 else a = *(addr2); a b secret = confidential addr1 = public cmov secret , b, a addr2 = public 36
What assumptions underpin the model? secret addr2 addr1 if ( secret ) a = *(addr1); a ← load addr1 b ← load addr2 else a = *(addr2); a b secret = confidential addr1 = public cmov secret , b, a addr2 = public • Rule 1: instruction/gate execution = confidential data-independent 36
What assumptions underpin the model? secret addr2 addr1 if ( secret ) a = *(addr1); a ← load addr1 b ← load addr2 else a = *(addr2); a b secret = confidential addr1 = public cmov secret , b, a addr2 = public • Rule 1: instruction/gate execution = confidential data-independent • Rule 2: data transfer/wire = confidential data-independent 36
What assumptions underpin the model? secret addr2 addr1 if ( secret ) a = *(addr1); a ← load addr1 b ← load addr2 else a = *(addr2); a b secret = confidential addr1 = public cmov secret , b, a addr2 = public • Rule 1: instruction/gate execution = confidential data-independent • Rule 2: data transfer/wire = confidential data-independent 36 • Rule 3: circuit/program topology = fixed
Today’s machines can violate these assumptions secret addr2 addr1 Violations due to: a ← load addr1 b ← load addr2 Data-dependent instruction optimizations a b (e.g., zero-skip, early exit, microcode, silent stores, …) cmov secret , b, a • Rule 1: instruction/gate execution = confidential data-independent • Rule 2: data transfer/wire = confidential data-independent • Rule 3: circuit/program topology = fixed 37
Today’s machines can violate these assumptions secret addr2 addr1 Violations due to: a ← load addr1 b ← load addr2 Data at rest optimizations a b (e.g., compression in register file/uop fusion, cache, page tables, …) cmov secret , b, a • Rule 1: instruction/gate execution = confidential data-independent • Rule 2: data transfer/wire = confidential data-independent • Rule 3: circuit/program topology = fixed 38
Today’s machines can violate these assumptions secret addr2 addr1 Violations due to: a ← load addr1 b ← load addr2 Speculative/OoO a b execution cmov secret , b, a • Rule 1: instruction/gate execution = confidential data-independent • Rule 2: data transfer/wire = confidential data-independent • Rule 3: circuit/program topology = fixed 39
HW Resource Partition • Security v.s. Quality of Service (QoS) • Intel Cache Allocation Technology (CAT) 6.888 L5-Non-transient Side Channels 40
HW Resource Partition • Security v.s. Quality of Service (QoS) • Intel Cache Allocation Technology (CAT) • Temporal Partition v.s. Spatial Partition 6.888 L5-Non-transient Side Channels 40
HW Resource Partition • Security v.s. Quality of Service (QoS) • Intel Cache Allocation Technology (CAT) • Temporal Partition v.s. Spatial Partition • Challenges nowadays: • Security domain determination is tricky nowadays • Scalability: what is #domains > #partitions • How to partition inside cores? • Why not execute applications on a single node? 6.888 L5-Non-transient Side Channels 40
Randomization/Fuzzing • Introduce noise to time measurement/Make time measurement coarse-grained • Pros and cons? 6.888 L5-Non-transient Side Channels 41
Randomization/Fuzzing • Introduce noise to time measurement/Make time measurement coarse-grained • Pros and cons? + Simple and no performance overhead + Effective towards a group of popular attacks …… - Not effective to attacks that do not measure time - Not effective to victims that cause big timing difference - Affect usability if benign application needs to use a fine-grained timer 6.888 L5-Non-transient Side Channels 41
Randomization/Fuzzing • Introduce noise to time measurement/Make time measurement coarse-grained • Pros and cons? + Simple and no performance overhead + Effective towards a group of popular attacks …… - Not effective to attacks that do not measure time - Not effective to victims that cause big timing difference - Affect usability if benign application needs to use a fine-grained timer • Randomize cache mapping functions • Pros and cons? 6.888 L5-Non-transient Side Channels 41
Recommend
More recommend