All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) (Yes, we were trying to overflow the title length field on the submission server) Edward J. Schwartz, Thanassis Avgerinos, David Brumley 8/16/2010 Carnegie Mellon University 1
A Few Things You Need to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis Avgerinos, David Brumley 8/16/2010 Carnegie Mellon University 2
The Root of All Evil Humans write programs This Talk: Computers Analyzing Programs Dynamically at Runtime 8/16/2010 Carnegie Mellon University 3
Two Essential Runtime Analyses Detect Detect Exploits packing in malware [Costa2005,Crandall2005, Newsome2005,Suh2004] [Bayer2009,Yin2007] Dynamic Taint Analysis: What values are derived from user input? Automated Test Case Input Filter Generation Generation [ Costa2007,Brumley2008 ] [Cadar2008,Godefroid2005,Sen2005] Forward Symbolic Execution: What input will make execution reach this line of code? 8/16/2010 Carnegie Mellon University 4
Our Contributions 1: Turn English Computers Analyzing Programs descriptions into an Dynamically at Runtime algorithm – Operational Semantics Dynamic Taint Analysis: Is this value affected by user input? 2: Algorithm highlights caveats, issues, and unsolved problems Forward Symbolic Execution: that are deceptively What input will make execution hard reach this line of code? 8/16/2010 Carnegie Mellon University 5
Our Contributions (cont’d) 3: Systematize recurring themes in a wealth of previous work 8/16/2010 Carnegie Mellon University 6
Dynamic Taint Analysis: What values are derived from user input? 1. How it works – example 2. Desired properties 3. Example issue. Paper has many more. 8/16/2010 Carnegie Mellon University 7
Δ tainted untainted Var Val x = get_input( ) x 7 y = x + 42 … Input is tainted goto y τ Taint Introduction Tainted? Var Input t = IsUntrusted( src ) T x get_input( src )↓ t 8/16/2010 Carnegie Mellon University 8
Δ tainted untainted Var Val x = get_input( ) x 7 y = x + 42 y 49 … Data derived from user input is tainted goto y τ Taint Propagation Tainted? Var t 1 = τ [x 1 ] , t 2 = τ [x 2 ] T x BinOp x 1 + x 2 ↓ t 1 v t 2 T y 8/16/2010 Carnegie Mellon University 9
Δ tainted untainted Var Val x = get_input( ) x 7 y = x + 42 y 49 … Policy Violation goto y Detected τ Taint Checking Tainted? Var T x P goto (t a ) = ¬ t a T y (Must be true to execute) 8/16/2010 Carnegie Mellon University 10
Different Use: Real Use: Exploit Detection Program Control x = get_input( ) y = … … goto y … Jumping to strcpy(buffer,argv[1]) ; overwritten … return address return ; 8/16/2010 Carnegie Mellon University 11
Memory Load Variables Memory Δ μ Var Val Addr Val x 7 7 42 τ τ μ Tainted? Tainted? Var Addr T 7 F x 8/16/2010 Carnegie Mellon University 12
Problem: Memory Addresses Var Val Δ x = get_input( ) x 7 y = load( x ) … Addr Val μ goto y 7 42 All values derived from user input are tainted?? Tainted? Addr τ μ 7 F 8/16/2010 Carnegie Mellon University 13
Policy 1: Taint depends only on the memory cell Var Val Δ x = get_input( ) x 7 Jump target could Undertainting y = load( x ) be any untainted … Addr Val memory cell value Failing to identify tainted values μ goto y - e.g., missing exploits 7 42 Taint Propagation Tainted? Addr τ μ Load v = Δ* x] , t = τ μ [v] 7 F load(x) ↓ t 8/16/2010 Carnegie Mellon University 14
If either the address or the memory Policy 2: cell is tainted, then the value is tainted Memory x = get_input( ) Address Overtainting expression y = load(jmp_table + x % 2 ) is tainted … Unaffected values are tainted jmp_table printa goto y - e.g., exploits on safe inputs printb Policy Violation? Taint Propagation Load v = Δ* x] , t = τ μ [v], t a = τ [x] load(x) ↓ t v t a 8/16/2010 Carnegie Mellon University 15
Research Challenge State-of-the-Art is not perfect for all programs Overtainting: Undertainting: Policy may wrongly Policy may miss taint detect taint 8/16/2010 Carnegie Mellon University 16
Forward Symbolic Execution: What input will make execution reach this line of code? • How it works – example • Inherent problems of symbolic execution • Proposed solutions 8/16/2010 Carnegie Mellon University 17
The Challenge packet_len(int header, char *packet) 2 32 possible char buf *2048+ = “…”; if (header < 0) inputs return 0; if (header == 0x12345678) 0x12345678 strcpy(buf, packet); return strlen(buf); Forward Symbolic Execution: What input will make execution reach this line of code? 8/16/2010 Carnegie Mellon University 18
A Simple Example header symbolic packet_len(int header, …) Interpreter can have any value What input will make execution If (header < 0) Interpreter Interpreter reach this line of header ≥ 0 t f code? return 0; If header == 0x12345678 Interpreter Interpreter t f header < 0 return strlen(buf); strcpy(buf,packet); header ≥ 0 Λ header ≥ 0 Λ header == 0x12345678 header != 0x12345678 8/16/2010 Carnegie Mellon University 19
One Problem: Exponential Blowup Due to Branches Interpreter Branch 1 Branch 2 Branch 3 Exponential Number of Interpreters/formulas in # of branches 8/16/2010 Carnegie Mellon University 20
Path Selection Heuristics Symbolic Execution Tree However, these are heuristics. In the worst case all create an exponential number of formulas in the tree height. • Depth-First Search (bounded) ,Random Search [Cadar2008] … • Concolic Testing [Sen2005,Godefroid2008] 8/16/2010 Carnegie Mellon University 21
Symbolic Execution is not Easy • Exponential number of interpreters/formulas branching • Exponentially-sized formulas s + s + s + s + substitution s + s + s + s == 42 • Solving a formula is NP-Complete! 8/16/2010 Carnegie Mellon University 22
Other Important Issues Formalization More complex policies Π = ( s + s + s + s + s + s + s + s) == 42 8/16/2010 Carnegie Mellon University 23
Conclusion • Dynamic taint analysis and forward symbolic execution used extensively in literature – Formal algorithm and what is done for each possible step of execution often not emphasized • We provided a formal definition and summarized – Critical issues – State-of-the-art solutions – Common tradeoffs 8/16/2010 Carnegie Mellon University 24
Thank You! thanassis@cmu.edu Questions? 8/16/2010 Carnegie Mellon University 25
Recommend
More recommend