The Low-Level Bounded Model Checker LLBMC A Precise Memory Model for LLBMC Florian Merz | October 7, 2010 Carsten Sinz Stephan Falke V ERIFICATION MEETS A LGORITHM E NGINEERING www.kit.edu KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Motivation Buffer overflows are still the number one issue as reported in OS vendor advisories. (. . . ) Integer overflows, barely in the top ten overall in the past few years, are number two for OS vendor advisories (in 2006), behind buffer overflows Use-after-free vulnerability in Microsoft Internet Explorer (. . . ) allows remote attackers to execute arbitrary code by accessing a pointer associated with a deleted object (. . . ) Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 2/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
What is LLBMC? LLBMC = Low-Level (Software) Bounded Model Checking Low-Level: Not operating on source code but on “abstract assembler” Software: Programs written in C/C++/Objective C and compiled into “abstract assembler” Bounded: restricted number of nested function calls and loop iterations Model Checking: bit-precise static analysis Properties checked: Built-in properties: invalid memory accesses, use-after-free, double free, range overflow, division by zero, . . . User-supplied properties: assert statements Focus on memory properties Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 3/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
Software Bounded Model Checking Programs typically deal with unbounded data structures such as linked lists, trees, etc. Property checking is undecidable for these programs Bugs manifest themselves in (typically short) finite runs of the program Software bounded model checking: Analyze only bounded program runs Restrict number of nested function calls and inline functions Restrict number of loop iterations and unroll loops Data structures are then bounded as well Property checking becomes decidable by a logical encoding into SAT or SMT Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 4/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
Specifying and Verifying Properties Properties are formalized using assume and assert statements assume states a pre-condition that is assumed to hold at its location assert states a post-condition that is to be checked at its location The program Prog is correct if � � Prog ∧ assume ⇒ assert is valid In software bounded model checking, this can be decided using a logical encoding and a SAT or SMT solver Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 5/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
Low Level Bounded Model Checking Fully supporting real-life programming languages is cumbersome Particularly true for C/C++/Objective C due to their complex (sometimes ambiguous) semantics Key idea: Do not operate on the source code directly, use a compiler intermediate language (“abstract assembler”) instead Well-defined, simple semantics makes logical encoding easier Closer to the code that is actually run Compiler optimizations etc. come “for free” LLBMC uses the LLVM intermediate language and compiler infrastructure After the logical encoding, LLBMC uses the SMT solver Boolector (theory of bitvectors and arrays) Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 6/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
Overview of the LLBMC Approach LLVM Compiler Loop Unrolling / Logical Encoding SMT Solver Frontend Function Inlining Abstract Transformed Verification Program Bit-Vector Logic Assembler Assembler Result / Error Source Code with Arrays Representation Representation Trace Memory Model Memory model captures the semantics of memory accesses Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 7/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
Example %s t r u c t .S = type { i32 , %s t r u c t .S ∗ } define i32 @main( i32 %argc , i8 ∗∗ %argv ) { struct S { entry : int x ; %0 = c a l l i8 ∗ @malloc ( i32 8) struct S ∗ n ; %p = b i t c a s t i8 ∗ %0 to %s t r u c t .S ∗ } ; %p . x = getelementptr %s t r u c t .S ∗ %p , i32 0 , i32 0 store i32 5 , i32 ∗ %p . x %p . n = getelementptr %s t r u c t .S ∗ %p , i32 0 , i32 1 int main ( int argc , char ∗ argv [ ] ) { store %s t r u c t .S ∗ null , %s t r u c t .S ∗∗ %p . n struct S ∗ p , ∗ q ; %c .1 = icmp sgt i32 %argc , 1 br i1 %c .1 , label %i f . then , label %i f . end p = malloc ( sizeof ( struct S ) ) ; i f . then : p − > x = 5; %1 = c a l l i8 ∗ @malloc ( i32 8) p − > n = NULL; %q = b i t c a s t i8 ∗ %1 to %s t r u c t .S ∗ %q . x = getelementptr %s t r u c t .S ∗ %q , i32 0 , i32 0 ( argc > 1) { i f store i32 5 , i32 ∗ %q . x q = malloc ( sizeof ( struct S ) ) ; %q . n = getelementptr %s t r u c t .S ∗ %q , i32 0 , i32 1 store %s t r u c t .S ∗ %p , %s t r u c t .S ∗∗ %q . n q − > x = 5; br label %i f . end q − > n = p ; } else { i f . end : q = p ; %q.0 = phi %s t r u c t .S ∗ [ %q , %i f . then ] , [ %p , %entry ] %q . 0 . x = getelementptr %s t r u c t .S ∗ %q .0 , i32 0 , i32 0 } %2 = load i32 ∗ %p . x i32 ∗ %q . 0 . x %3 = load l l b m c a s s e r t (p − > x + q − > x == 10); %4 = add i32 %2, %3 %c .2 = icmp eq i32 %4, 10 free ( q ) ; %5 = zext i1 %c .2 to i32 c a l l void @ llbmc assert ( i32 %5) free ( p ) ; %6 = b i t c a s t %s t r u c t .S ∗ %q.0 to i8 ∗ void @free ( i8 ∗ %6) c a l l return 0; %7 = b i t c a s t %s t r u c t .S ∗ %p to i8 ∗ } c a l l void @free ( i8 ∗ %7) r e t i32 0 } Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 8/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
Encoding of phi -Instructions The abstract assembler contains phi -instructions of the form i ′ = phi [ i 1 , bb 1 ] , . . . , [ i n , bb n ] where bb 1 , . . . , bb n are basic blocks For the logical encoding, bb j is replaced by c exec ( bb j ) ∧ t ( bb j , b ) where c exec ( bb j ) is bb j ’s execution condition b is the basic block containing the phi -instruction t ( bb j , b ) is the condition under which control passes from bb j to b Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 9/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
Elimination of branches The memory can be modelled as an array of bytes SSA form for the memory by introducing an abstract type memstate : Memory is accessed using read -instructions Memory is changed using write -, malloc -, and free -instructions phi -instructions for memory states are introduced With the encoding of phi -instructions and the conversion of the memory to SSA form branches can be eliminated Introduction Software Bounded Model Checking Logical Encoding Demonstration Future Work October 7, 2010 10/19 Carsten Sinz, Stephan Falke, Florian Merz – LLBMC
Recommend
More recommend