Spectre and Meltdown Clifford Wolf q/Talk 2018-01-30
Spectre and Meltdown ● Spectre (CVE-2017-5753 and CVE-2017-5715) – Is an architectural security bug that effects most modern processors with speculative execution – It allows a program to read memory locations in its memory space without technically accessing that location. – This is a problem with code running in “sandbox environments”, such as a web-browser executing JavaScript code: The JavaScript code can access all data in the browsers memory, such as login credentials for webpages. ● Meltdown (CVE-2017-5754) – Is a related hardware vulnerability in some Intel x86, some IBM POWER, and some ARM processors. – It allows a process to read all memory in the system.
But how does it work? ● To answer this question we must first discuss some implementation details of modern speculating superscalar out-of-order processors. – Scalar – execute one instruction per cycle – Superscalar – execute >1 instruction per cycle – Out-of-order – execute instructions in a different order than they appear in the program code – Speculative execution – instead of waiting for the result of a computation, guess the result and keep executing. Roll back if the guess turned out to be incorrect. This helps avoid pipeline stalls in cases where its possible to make good guesses. – The types of speculative execution important for understanding Spectre/Meltdown: ● Branch prediction – guess if a branch is taken or not ● Branch target prediction – guess the target of a dynamic jump ● Trap optimism – always guess that instructions will not cause traps – When the guess was wrong we need to roll back the entire CPU state so that it looks to the software as if no code had been executed speculatively.
What is pipelining? The more work we do in one cycle, the slower our circuit gets. This is a slow circuit: Data FF Task 1 Task 2 Task 3 Task 4 FF decode load exec store Clock But we want high clock rates for CPUs! This pipeline works with a 4x faster clock rate: Data FF Stage 1 FF Stage 2 FF Stage 3 FF Stage 4 FF Clock
In-order pipeline stalls Stage 1 Stage 2 Stage 3 Stage 4 Program: A A: r1 ⊙ r2 → r3 B A time C B A ⊙ B: r4 r5 → r6 D C B A C: r7 ⊙ r8 → r9 D C B D: r9 ⊙ r1 → r1 D C D ⊙ E: r10 r11 → r12 E D ⊙ F: r13 r14 → r15 F E D G F E D G: r16 ⊙ r17→ r18 G F E G F D has to wait for C => pipeline stalls G
Out-of-order execution to the rescue! Stage 1 Stage 2 Stage 3 Stage 4 Program: A A: r1 ⊙ r2 → r3 B A time C B A ⊙ B: r4 r5 → r6 E C B A C: r7 ⊙ r8 → r9 F E C B D: r9 ⊙ r1 → r1 G F E C D G F E ⊙ E: r10 r11 → r12 D G F ⊙ F: r13 r14 → r15 D G D G: r16 ⊙ r17→ r18 E, F, G is executed out-of-order to improve system performance. But what if D traps?
Out-of-order execution in modern CPUs ● Some instructions (such as memory loads) can stall for >100 cycles. We need very deep out- of-order execution to hide this latency. – Without speculative execution it would be impossible to keep the processor busy for so many cycles. There is no way around speculative execution for modern high-speed processors. ● We need many more physical addresses than are available in the ISA to remember previous states (Scoreboarding isn’t sufficient → Register renaming, Tomasulo’s algorithm ) – We need previous states for rollback when instructions trap or branch prediction is wrong. – And we need more registers because the dynamic instruction order may have significantly higher register pressure than the original instruction order. ● But there is more to the processor state than just general purpose registers. Clean rollback is incredibly hard! – For memory writes there is a store buffer for the pending writes during speculative execution. – CPU flags may be stored in shadow registers for each checkpoint we might need to rollback to. – But there is no mechanism to rollback the state of the CPU caches. ● Caches are just a performance optimization, so it can’t hurt if information from speculative execution can be recovered from cache timings ... right? Unfortunately this is wrong.
What is a CPU cache? ● Caches are local memories close to the CPU that have much faster access times than main memory. – Addresses “in the cache” can be accessed quickly – The first access to an address moves that memory location into the cache – Addresses that haven’t been accessed in a while are evicted from the cache – The granularity of this is aligned cache lines of usually 64 bytes each. ● There are special commands to flush the CPU caches. ● Even without those commands we can access memory in a way that guarantees that all cache lines of interest are evicted from the cache. (By accessing other memory locations that are mapped to the same cache slot .) ● By measuring the access time to a memory location we can measure if that location is in the cache or not. ● This allows us to detect which memory locations the CPU has accessed recently.
Spectre Variant 1 – CVE 2017-5753 (bounds check bypass, simplified explanation) ● Consider something like the following code: uint8_t unprotected_data[128]; uint8_t protected_data[1]; int peek(int i) { flush_or_evict_caches(); if (slow_predicted_true(i < 128)) { int a = unprotected_data[i]; int b = unprotected_data[64*(a&1)]; return b; } return is_in_cache(unprotected_data[64]); } ● peek(128) will return 1 if the least significant bit of protected_data[0] is set . ● We have effectively bypassed the (i < 128) bounds check.
Spectre Variant 2 – CVE 2017-5715 (branch target injection) ● Variant 1 relies on tricking the branch predictor into making an incorrect guess on whether a branch is taken or not. ● But processors can also branch to dynamic locations: – x86 : jmp eax; jmp[eax]; ret; jmp dword ptr [0x12345678] – ARM : MOV pc, r14 – MIPS : jr $ra – RISC-V : jalr x0,x1,0 ● Spectre Variant 2 tricks the branch predictor into incorrectly guessing the destination of such dynamic jumps. ● This can be used to speculatively execute arbitrary code gadgets , similar to return-oriented-programming (ROP). ● Exfiltrate data using cache side channel.
Spectre and JIT Sandboxes ● Spectre only allows a process to read its own memory. So you might ask: “What is the problem?” ● Its JIT sandboxes, where we run JIT-compiled untrusted code in our process, assuming the bounds checks added by the JIT compiler will prevent the code from reading data it should not have access to. – For example: A website running JavaScript code in your browser might access security credentials or other private data in your browser memory. ● But that means the JavaScript code must be tailored to a JIT compiler to yield the correct malicious machine code. – For example you can’t simply flush CPU caches. Instead you must execute memory access pattern that will evict the relevant pages from the cache. – That’s why it said “simplified explanation” on the slide for Variant 1. – The Spectre paper contains a JavaScript code snippet that demonstrates such an attack using the V8 JavaScript engine.
Meltdown – CVE 2017-5754 ● The Meltdown attack exploits a privilege escalation vulnerability specific to some processors: – At least sometimes, Intel processors don’t check memory protection during speculative execution. – Instead memory protection is checked after the fact when instructions are committed. But at that point we already exfiltrated data using the cache side channel. – By adding a trapping instruction before the access to privileged memory we prevent the access to be committed. So “it never happened” and no access violation is detected by the OS. But the data read can still be reconstructed using the cache state. ● Every Intel x86 / x86_64 processor since 1995 – Only exception afaik are Intel Atom processors from before 2013 ● AMD x86 / x86_64 processors are not affected by Meltdown ● Very few ARM processors are affected. For example ARM Cortex-A75 ● IBM POWER and System Z are also affected by variants of Meltdown
Meltdown Mitigations ● Short-term mitigation for existing processors: – Flush TLBs when leaving kernel code – This prevents speculative access to kernel memory – But it also adds a performance penalty that can be significant for some workloads, especially on processors that do not support selective TLB flushing (most Intel processors before Haswell). ● Long-term fix: – Better isolation of kernel and user-land page tables – Probably at the cost of not allowing speculative execution into kernel code (such as system calls) ● In my opinion there is no doubt that Meltdown is a hardware bug that needs to be addressed in future hardware generations. ● But Intel says its processors “work as designed”, calls mitigation a “security feature” instead of “bug fix”.
Recommend
More recommend