Simplified data-flow analysis with Datalog. by François Gauthier
It all began with a qualifying exam… Among the heap of papers I had to read, there was this one: Cloning-based context-sensitive pointer alias analysis using binary decision diagrams by J Whaley and MS Lam in the Programming Language Design and Implementation (PLDI) conference
… and a points -to analysis Where the authors claimed that the following 4 lines compute a basic points-to analysis: vP(v, h) :− vP0(v, h). vP(v1, h) :− assign(v1, v2), vP(v2, h). hP(h1, f, h2) :− store(v1,f,v2), vP(v1,h1), vP(v2, h2). vP(v2, h2) :− load(v1,f, v2), vP(v1,h1), hP(h1,f, h2).
Really?
Yes!
Datalog – Basics • Datalog is a logic programming language that is a subset of Prolog. • Datalog operates on facts and rules . • A fact is declared like this: – parent("Bill", "Mary"). • Can read as – Bill is the parent of Mary or – Mary is the parent of Bill. • Implementer choose the meaning.
Datalog – Basics (cont.) • A Datalog program consists in a set rules that define new facts. • A rule consists of two parts: head and body : – ancestor(?X,?Y) :- parent(?X,?Y). – ancestor(?X,?Y) :- parent(?X,?Z), ancestor(?Z,?Y). • The :- symbol separates the head and the body . • Commas in the body stand for AND. • ? indicates a variable.
Datalog – Understanding rules • The following rule: – ancestor(?X,?Y) :- parent(?X,?Y). reads as: Y is an ancestor of X if it is true that Y is a parent of X. • Similarly, the following rule: – ancestor(?X,?Y) :- parent(?X,?Z), ancestor(?Z,?Y). reads as: Y is an ancestor of X if it is true that Z is the parent of X and Y is the ancestor of Z.
Ancestors - Initial facts parent("C", "D"). parent("Y", "D"). D Z parent("C", "Z"). parent("Y", "Z"). parent("A", "B"). X B C Y parent("A", "C"). parent("W", "Y"). A W parent("W", "X").
Ancestors – Rules and queries • Recall the rules of our ancestors program: – ancestor(?X,?Y) :- parent(?X,?Y). – ancestor(?X,?Y) :- parent(?X,?Z), ancestor(?Z,?Y). • These rules will be evaluated iteratively until the head is not modified anymore (fixpoint). • A query in Datalog is expressed like this: – ?-ancestor("W", ?Ancestor).
What about data-flow analysis? Java code Control-flow graph 1 public String name (String type){ 1: String a = “Anonymous”; 2: if(type.equals (“cat”)) 3: a = “Garfield”; 2 4: else if(type.equals (“dog”)) 5: a = “Snoopy"; 6: else 3 4 7: a = “Blob”; 8: return a; } 5 7 8
Reaching definitions – Initial facts Java code Initial facts public String name (String type){ 1: String a = “Anonymous”; assign(1, "a"). 2: if(type.equals (“cat”)) 3: a = “Garfield”; assign(3, "a"). 4: else if(type.equals (“dog”)) 5: a = “Snoopy"; assign(5, "a"). 6: else 7: a = “Blob”; assign(7, "a"). 8: return a; }
Reaching definitions – Initial facts Initial facts Control-flow graph 1 follows(1,2). follows(2,3). follows(2,4). 2 follows(4,5). follows(4,7). 3 4 follows(3,8). follows(5,8). 5 7 follows(7,8). 8
Reaching definitions - Rules reach(?i,?x,?j) :- assign(?i,?x), follows(?i,?j). reach(?d,?x,?j) :- reach(?d,?x,?i), follows(?i,?j), !assign(?j,?x).
Back to point- to… Java code Representation o1: Dog snoopy = new Dog(); o2: Dog odie = new Dog(); snoopy myDog odie o3: Food f1 = new Food(); snoopy.food = f1; Food f2 = snoopy.food; odie.food = f2; food food Dog myDog = odie; o1 o3 o2 f1 f2
Initial facts – vPointsTo 0 Java code Facts o1: Dog snoopy = new Dog(); vPointsTo 0 ("snoopy","o1"). o2: Dog odie = new Dog(); vPointsTo 0 ("odie","o2"). vPointsTo 0 ("f1","o3"). o3: Food f1 = new Food(); snoopy.food = f1; Food f2 = snoopy.food; odie.food = f2; Dog myDog = odie;
Initial facts – store Java code Facts o1: Dog snoopy = new Dog(); o2: Dog odie = new Dog(); o3: Food f1 = new Food(); snoopy.food = f1; store("snoopy","food","f1"). Food f2 = snoopy.food; odie.food = f2; store("odie","food","f2"). Dog myDog = odie;
Initial facts – load Java code Facts o1: Dog snoopy = new Dog(); o2: Dog odie = new Dog(); o3: Food f1 = new Food(); snoopy.food = f1; Food f2 = snoopy.food; load("snoopy","food","f2"). odie.food = f2; Dog myDog = odie;
Initial facts – assign Java code Facts o1: Dog snoopy = new Dog(); o2: Dog odie = new Dog(); o3: Food f1 = new Food(); snoopy.food = f1; Food f2 = snoopy.food; odie.food = f2; Dog myDog = odie; assign("myDog","odie").
Initial facts – putting it all together Java code Facts o1: Dog snoopy = new Dog(); vPointsTo 0 ("snoopy","o1"). o2: Dog odie = new Dog(); vPointsTo 0 ("odie","o2"). vPointsTo 0 ("f1","o3"). o3: Food f1 = new Food(); snoopy.food = f1; store("snoopy","food","f1"). Food f2 = snoopy.food; load("snoopy","food","f2"). odie.food = f2; store("odie","food","f2"). Dog myDog = odie; assign("myDog","odie").
Points-to – Rules We are interested in finding: 1. To which heap objects a variable can point to. 2. To which heap objects a field can point to. Outputs will be stored in two relations: 1. vPointsTo(?v, ?o) – Variable v points to object o 2. hPointsTo(?o 1 , ?f, ?o 2 ) – The field f of object o 1 points to object o 2 .
Points-to – Rules (cont.) Initialization: vPointsTo(?v, ?o) :- vPointsTo 0 (?v, ?o). Assignments (v 1 = v 2 ): vPointsTo(?v 1 , ?o) :- assign(?v 1 , ?v 2 ), vPointsTo(?v 2 , ?o).
Points-to – Rules (cont.) Stores (v 1 .f = v 2 ): hPointsTo(?o 1 , ?f, ?o 2 ) :- store(?v 1 , ?f, ?v 2 ), vPointsTo(?v 1 , ?o 1 ), vPointsTo(?v 2 , ?o 2 ).
Points-to – Rules (cont.) Loads (v 2 = v 1 .f): vPointsTo(?v 2 , ?o 2 ) :- load(?v 1 , ?f, ?v 2 ), vPointsTo(?v 1 , ?o 1 ), hPointsTo(?o 1 , ?f, ?o 2 ).
Points-to – Putting all rules together vPointsTo(?v, ?o) :- vPointsTo 0 (?v, ?o). vPointsTo(?v 1 , ?o) :- assign(?v 1 , ?v 2 ), vPointsTo(?v 2 , ?o). hPointsTo(?o 1 , ?f, ?o 2 ) :- store(?v 1 , ?f, ?v 2 ), vPointsTo(?v 1 , ?o 1 ), vPointsTo(?v 2 , ?o 2 ). vPointsTo(?v 2 , ?o 2 ) :- load(?v 1 ,?f, ?v 2 ), vPointsTo(?v 1 , ?o 1 ), hPointsTo(?o 1 , ?f, ?o 2 ).
Application to security 3 function read($file, ) { $privilege if( ) $privilege 4 5 Protected by the $handle = fopen($file, "r"); ‘ read ’ privilege. else error (‘You cannot read that file’); } ... $file = ‘prescriptions.txt’; ... 1 = $canRd user_can (‘ read ’); ... 2 read($file, ); $canRd
Results on Moodle 1.9.5 Syntactic analysis: 992 security checks detected. Intra-procedural, flow-insensitive: 1062 security checks detected. Intra-procedural, flow-sensitive: 1063 security checks detected (removed an ambiguity) Inter-procedural, flow-insensitive: 1072 security checks detected.
Conclusion You can find the Datalog programs I developed (both intra and inter-procedural) in: Alias-aware propagation of simple pattern- based properties in PHP applications, SCAM 2012. That’s all folks!
Recommend
More recommend