Weird machines: a model for code-reuse attacks Sergey Bratus Rebecca Shapiro Anna Shubina Dartmouth T rust Lab
Outline Code re-use: unexpected computation, programming models Containing computation: Coarse intent-based ABI-level semantics/ region-describing types LangSec: co-design of data & code, via constrained input handlers & input languages
T erminology "Code {re,ab}use" is unexpected computation Classes of attacks are more: they are unexpected programming models Essence of code reuse: code becomes part of an emergent programming model
Input data is the program Strings are programs for regexps (DFAs) T ape is the program for T uring machines "Everything is an interpreter" (Greg Morrisett) "Any complex enough input is indistinguishable from bytecode"
Invisible machines: stack Standard function prologues & epilogues are an automaton distributed through code. data fragments on stack are its programs implements control flow graph Aleph1 > Solar Designer > Newsham > gera > Nergal > ... Return-oriented Programming
Invisible machines: heap Heap management code is a machine, heap metadata its programs "Once upon a free" (Phrack 57:8), "Vudo malloc tricks" (Phrack 58:9) ISA: aa4bmo, chunk->flink->blink = chunk->blink Configured via a series of mallocs: "Heap Feng- shui" (Sotirov 2007), ..., starvation-based machines (Gorenc et al. Recon.cx 2015)
Invisible machines: signals Sigreturn-oriented programming (Bosman & Bos, 2014) "portable shellcode" via sigreturn structs Counterfeit OO-oriented (COOP , 2015) "Interrupt-oriented programming" (T an et al, 2014) "bugdoor" via nesting MSP430 interrupts; fixed- entry, timed-exit "un-gadgets"
Symbol-related machines Dynamic linker (cf. Nergal's RTLD gadget) Ld.so relocation (Shapiro et al, 2013; cf. LOCREATE) ELF relocation entries are T .-c. "bytecode" DWARF exception handler (helpfully a part of most processes) is T .-c. (Oakley et al, 2012) Diff. between execve() & ld.so: "All you need is GOT" (Bangert et al., 29c3)
The weirdest machine (possibly) x86 MMU is T uring-complete on GDT+IDT +TSS+Page T ables (Bangert et al., 2013) Arbitrary computation can be compiled in a combinations of these tables No instruction is successfully dispatched #PF & #DF alternate, acting as clock cycles
The "weird machine" upshot Code re-use/code abuse is possible whenever (meta)data guides code into actions Code re-use likely has an emergent programming model associated with it (a WM) data to drive it need not be ill-formed or corrupt memory
A verification problem
Ab Ovo Proving correctness from axioms, by deductive construction Cf. with construction of types ~ proofs ~ programs
P { Q } R Precondition Code Result
The root of weirdness? Assume P { Q } R holds If P' is not quite right, what will P' do under Q? P
The root of weirdness? What can we make "correct" Q compute by varying P it wasn't verified for? ∆ P ∆ R P ∆ R ∆ P What is " ∆ R" given " ∆ P" for a Q?
Proof-carrying code FTW? "Weird machines in PCC", Vanegue @ 1st IEEE LangSec S&P Workshop, 2014 PCC doesn't capture additional instructions a machine may execute ("divergent machines") Proof-carrying code can execute untrusted computations not captured by proofs
A hypothesis We need "Differential computability": how to easily reason about " ∆ R" given " ∆ P" for a Q We program not with statements {Q} but, implicitly, with tuples P {Q} - but we rarely capture P explicitly. Hence bugs & WMs.
Unforeseen preconditions The "correct" P is rarely obvious e.g. "well formed" =/=> safe (ELF , MMU) Parser differentials ("master key", X.509) P influenced by opinion & idea/model of a system P can't reflect not-yet-discovered threats or state P may be dependent on composition effects!
Constraining Q If Q is sufficiently "constrained", P doesn't have to be so large E.g.: P is "input is a formal language of class X" Question: how can we usefully characterize the Languages Acceptors power of Q? beyond the Chomsky hierarchy of recognizers
Coarse types for code & data intents Control flow enforcement (not quite CFI :) ) ELFbac: Sections are types (with very coarse semantics by data access & flow) "Gostak semantics" (The Gostak distims the doshes) Dependent typing to enforce intended use of data Range dependencies, intent by range
Beyond address ranges A code section's intended accesses are its type "You are what you work with/operate on"
Beyond address ranges A code section's intended accesses are its type "You are what you work with/operate on" SSL app SSL libpng initialization logic RW R R RW W RW Input Output SSL keys buffer buffer
LangSec approach to input Since all input data are programs driving the code, construct input-handing as verifiable recognizer automata Requires regular or context-free languages to avoid undecidability (e.g., in verifying parser equivalence) Verifying input-handlers: big payoff, but underused? Not all bugs are parser bugs, but latest biggest ones sure were! (Heartbleed, GnuTLS Hello, BERserk, ...)
More weird machines?
Code as a "contour/circuit" with a characteristic "frequency response"? How code reacts to periodically injected failure? Systems: resource starvation WMs Networks: packet loss and/or delay What new behavior patterns can be produced? Protocol implementations exposed to induced periodic packet loss/delay
Periodic packet drop vs OpenVPN Blowfish-CBC AES256-CBC DES-CBC RC2-CBC
Thank you IEEE Language-theoretic Security Workshop (LangSec SPW) co-located with IEEE S&P Symposium (San Jose) http://spw14.langsec.org http://spw15.langsec.org
Recommend
More recommend