decision procedures for string constraints
play

Decision Procedures for String Constraints Ph.D. Proposal Pieter - PowerPoint PPT Presentation

Decision Procedures for String Constraints Ph.D. Proposal Pieter Hooimeijer University of Virginia 1 Motivation 2 Mitre Corp. data reported on http://www.attrition.org/ Motivation #1 #2 3 Mitre Corp. data reported on


  1. Decision Procedures for String Constraints Ph.D. Proposal Pieter Hooimeijer University of Virginia 1

  2. Motivation 2 Mitre Corp. data reported on http://www.attrition.org/

  3. Motivation #1 #2 3 Mitre Corp. data reported on http://www.attrition.org/

  4. Motivation “String values have lost their innocence and are being used in many unforeseen contexts.” [Thiemann05] 4

  5. Motivation “String their #1 are b unfor #2 5

  6. Motivation “String values have lost their innocence and #1 are being used in many unforeseen contexts.” #2 [Thiemann05] 6

  7. Motivation “String values have lost Now their innocence and are being used in many unforeseen contexts.” what? [Thiemann05] 7

  8. Goal Make string analysis available to a wider class of program analysis tools. 8

  9. Outline • String Constraint Solving • Preliminary Results • Proposed Research 9

  10. Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 10

  11. Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 11

  12. Example // v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic(); 12

  13. Query: Will this code ever execute magic ? 13

  14. Example // v1 and v2 are user inputs 1 if (!ereg('o(pp)+', v1)){exit;} 2 if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat 3 if (v3 != 'oppppq'){exit;} magic(); 14

  15. Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 15

  16. Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 16

  17. Definitions String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable 17

  18. Definitions Definitions Constraint System S = { C 1 ,..., C n } where each C i ∈ S is a well-formed string constraint. 18

  19. Definitions Definitions Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable } 19

  20. Definitions Definitions Soundness [ D ( S ) = Sat. ] → Completeness S is sat. S is sat. → [ D ( S ) = Sat. ] 20

  21. Definitions Definitions Soundness [ D ( S ) = Sat. ] → Completeness S is sat. S is sat. → [ D ( S ) = Sat. ] 21

  22. Definitions Definitions Constraint System Decision Procedure D : constraint system → S = { C 1 ,..., C n } { Satisfiable, where each C i ∈ S is a well- Unsatisfiable } formed string constraint. Soundness String Constraint [ D ( S ) = Sat. ] → S is sat. C ::= E ∈ R E ::= V Completeness | E ∉ R | E ◦ V S is sat. → R : regex V : variable [ D ( S ) = Sat. ] 22

  23. Definitions Definitions Constraint System Decision Procedure D : constraint system → S = { C 1 ,..., C n } { Satisfiable, where each C i ∈ S is a well- Unsatisfiable } formed string constraint. Soundness String Constraint [ D ( S ) = Sat. ] → S is sat. C ::= E ∈ R E ::= V Completeness | E ∉ R | E ◦ V S is sat. → R : regex V : variable [ D ( S ) = Sat. ] 23

  24. Definitions Definitions Constraint System Decision Procedure D : constraint system → S = { C 1 ,..., C n } { Satisfiable, where each C i ∈ S is a well- Unsatisfiable } formed string constraint. Soundness String Constraint [ D ( S ) = Sat. ] → S is sat. C ::= E ∈ R E ::= V Completeness | E ∉ R | E ◦ V S is sat. → R : regex V : variable [ D ( S ) = Sat. ] 24

  25. Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 25

  26. Existing Tools DPRLE [PLDI09] Automata Hampi [ISSTA09] Encode to STP Rex [ICST10] Encode to Z3 Kaluza [Oakland10] Encode to Hampi & STP Our Prototype Lazy Automata 26

  27. Questions Make string analysis available to a wider class of program analysis tools. 27

  28. Questions • What is acceptable performance? • What type of constraints should we allow? 28

  29. Outline • String Constraint Solving • Preliminary Results – scalability – expressive utility • Proposed Research 29

  30. Scalability Subjects: - Decision Procedure for Regular Language Equations [PLDI09] - Hampi [ISSTA09] - Lazy Prototype 30

  31. Scalability Task: find a string that is in both [a-c]*a[a-c]{n+1} and [a-c]*b[a-c]{n} 31

  32. Scalability Time to Generate First String 32

  33. Scalability Time to Generate First String 33

  34. Scalability • Existing approaches are less scalable than they could be on the tested benchmarks • Interaction with an underlying solver introduces performance artifacts 34

  35. Outline • String Constraint Solving • Preliminary Results – scalability – expressive utility • Proposed Research 35

  36. Expressive Utility • Picked 88 PHP projects on SourceForge = 9.6 million LOC • Tally:111 distinct string functions 36

  37. Expressive Utility 37

  38. Expressive Utility Index : 63,003 (substr, strlen, strpos, ...) Regex : 29,141 (preg_match, preg_replace, ...) 38

  39. Expressive Utility • Existing approaches typically support 'Regex,' but not 'Index' operations • 'Index' operations were 2x as common in the sample under study 39

  40. Outline • String Constraint Solving • Preliminary Results – scalability – expressive utility • Proposed Research 40

  41. Outline • String Constraint Solving • Preliminary Results • Proposed Research – subset constraints – scalability through laziness – integer index operations – proof strategies 41

  42. Thesis Statement It is possib Time to Generate First String practical a the satisfiab cover both operations program an Index : 63,003 (substr, strlen, strpos, ...) admits a m Regex : 29,141 (preg_match, preg_replace, ...) of correctn 42

  43. Thesis Statement It is possible to construct a practical algorithm that decides the satisfiability of constraints that cover both string and integer index operations, scales up to real-world program analysis problems, and admits a machine-checkable proof of correctness. 43

  44. Outline • String Constraint Solving • Preliminary Results • Proposed Research – subset constraints – scalability through laziness – integer index operations – proof strategies 44

  45. Subset Constraints [PLDI'09] concatenation constants variables 45

  46. Approach Input 1 2 3 46

  47. Approach Input Cross Product ✔ Sat. ✘ Unsat. (c 1 ◦ c 2 ) c ∩ 3 47

  48. Example // v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic(); 48

  49. ε p q o p p ε b1 b2 a1 a2 a3 a4 o d1 a1d1 o p d2 a2d2 p p d3 a3d3 p ε ε d4 a2d4 a4d4 b1d4 p p p d5 a3d5 b1d5 p p ε p d6 a4d6 b1d6 q q b2d7 49 d7

  50. ε p q o p p ε b1 b2 a1 a2 a3 a4 Solution I: v 1 = { opp } o a1d1 v 2 = { ppq } p a2d2 p a3d3 ε ε a2d4 a4d4 b1d4 p p a3d5 b1d5 p ε p a4d6 b1d6 q b2d7 50

  51. ε p q o p p ε b1 b2 a1 a2 a3 a4 Solution I: v 1 = { opp } o a1d1 v 2 = { ppq } p a2d2 Solution II: v 1 = { opppp } v 2 = { q } p a3d3 ε ε a2d4 a4d4 b1d4 p p a3d5 b1d5 p ε p a4d6 b1d6 q b2d7 51

  52. Algorithms and a Proof • Concat-Intersect (CI) algorithm: – two variables, three constants; fixed form – mechanically verified proof in Coq 8.1pl3 – proof size is ~1300 lines • Regular Matching Assignments (RMA): – implemented in a tool, DPRPLE – applies CI procedure inductively 52

  53. Evaluation • Find SQL injection vulnerabilities [Wassermann and Su; PLDI07] • For each vulnerability: – generate SQL + program path – check path consistency (Simplify) – solve string constraints (DPRLE) 53

  54. Outline • String Constraint Solving • Preliminary Results • Proposed Research – subset constraints – scalability through laziness – integer index operations – proof strategies 54

  55. Scalability through Laziness Idea: Cast constraint solving as a search problem. Traverse as little of the search space as possible. 55

  56. Proposed Approach datatype searchstate = { next : variable; states : variable → pos → status} datatype status = | Unknown of status | StartsAt of nfastate → status | Path of nfapath → status 56

  57. Proposed Evaluation • Within-domain performance comparison: – CFG Analyzer – DPRLE – Rex – Hampi • Use previously-published benchmarks: – long strings task [Veanes et al. ] – set difference task [Veanes et al .] – grammar intersection task [Kiezun et al .] 57

  58. Outline • String Constraint Solving • Preliminary Results • Proposed Research – subset constraints – scalability through laziness – integer index operations – proof strategies 58

Recommend


More recommend