symbolic finite automata
play

Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, - PowerPoint PPT Presentation

Applications of Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, Grenoble, France 1 Overview Are SFAs applicable to analysis of software evolution? automata modulo theories S ymbolic Finite Automaton (SFA) Main


  1. Applications of Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, Grenoble, France 1

  2. Overview • Are SFAs applicable to analysis of software evolution? • automata modulo theories S ymbolic Finite Automaton (SFA) • Main properties Boolean closed, succinct for large  • Symbolic finite transducers SFA with symbolic outputs • Current Applications – Testing (unit, fuzz) – Regex processing – Web security – SMT theory plugin – backend for MSO • Extensions – look-ahead – trees – registers April 5, 2014 VSSE'14, Grenoble, France 2

  3. Automata based analysis of software evolution? • Possible extension of graph based approaches – SFAs are directed graphs – In addition to talking about structural properties such as cyclicity and rank , one can talk about regularity and language • Possible extension of FSA based approaches – Not bound to a finite small alphabet – The alphabet can be rich, possibly infinite • Brings in an aspect of model based analysis – SFA can act as a model or oracle April 5, 2014 VSSE'14, Grenoble, France 3

  4. Mile-high view SFA 1  Software learn monitor traces  v1  evolve L (SFA 1 ) = L (SFA 2 ) ? trace  L (SFA 1 ) ? SFA 2 Software  traces learn monitor v2   April 5, 2014 VSSE'14, Grenoble, France 4

  5. Possible scenario Prog.v1 = loop{t= now ; critical_code ; save(now-t) } 0-255 regex: [\0-\xFF]+ SFA 1 : 0-255 Prog.v2 = loop{t= now ; critical_code_upd ; save(now-t) } trace: [56,150, 500 ]  L (SFA 1 ) April 5, 2014 VSSE'14, Grenoble, France 5

  6. Symbolic Finite Automaton (SFA) • Alphabet is an effective Boolean Algebra A • Labels are predicates over A  x . 'a' ≤ x ≤ 'd' one symbolic p q transition: for x  〚 'a ' ≤ x ≤ 'd ' 〛 'a' denotes 'c' many concrete p q 'b' transitions: 'd' April 5, 2014 VSSE'14, Grenoble, France 6

  7. SFA Execution Example odd(x) even(x) even(x) p q odd(x) 1 2 5 3 p p q p p p is final  accept the input 7 April 5, 2014 VSSE'14, Grenoble, France

  8. Alphabet Effective Boolean Algebra   2 D Domain Predicates April 5, 2014 VSSE'14, Grenoble, France 8

  9. Alphabet SMT int • D = Integers  = integer linear arithmetic formulas • (with one fixed free variable) • 〚    〛 = 〚  〛  〚  〛 • 〚  〛 =  , 〚   〛 = D \ 〚  〛 • Sat atis isfiab fiability ility: 〚  〛   April 5, 2014 VSSE'14, Grenoble, France 9

  10. Alphabet 2 {a,b} {  ,{a},{b},{a,b}} c {a,b}    {a,b} id {a} {a,b} {b} SFA over 2 {a,b} : p q regex : a*b(a|b)* April 5, 2014 VSSE'14, Grenoble, France 10

  11. Alphabet 2 bv k • D = {n | 0  n < 2 k } •  = BDDs of depth k • Boolean operations are BDD operations • Below 〚  i 〛 = {n  D | i'th bit of n is 1}  i has fixed size independent of i April 5, 2014 VSSE'14, Grenoble, France 11

  12. Boolean operations over SFAs • Intersection (product of transitions)  1 A 1 : p 1 q 1  1  2 A 1  A 2 : p 1 q 1 X  2 p 2 q 2 A 2 : p 2 q 2 delete when  1  2 unsat April 5, 2014 VSSE'14, Grenoble, France 12

  13. Boolean operations over SFAs • Complementation ( first determinize then swap final and nonfinal states ) delete unsat guards  {q}  p q {p} determinize {q,r}   r {r}  April 5, 2014 VSSE'14, Grenoble, France 13

  14. Intersection example let  k ( x )  (( x mod k ) = 0)  2  2  3 a 1 a 2 A:  6  6 a 2 a 1 A  B: b 2 b 1  6   3  3 B: X  6  3 b 1 b 2 a 1  3 b 2 April 5, 2014 VSSE'14, Grenoble, France 14

  15. Are SFAs a useful extension of classical automata? • Can classical automata theory and algorithms be extended to work modulo large (infinite) alphabets  ? • The answer is nontrivial. For example. – NFA determinization is O ( |  |2 n ) – DFA minimization is O ( |  | n log n ) What happens when  is infinite? April 5, 2014 VSSE'14, Grenoble, France 15

  16. Why care about symbolic representation at all? • Scalability . – Explicit expansion is expensive even for finite case (take e.g. ASCII where |  | = 2 7 ) • String analysis – typically  is UTF16, |  | = 2 16 • Often characters are lifted to integers and use arithmetic operations • List processing – elements are integers or have composite types, such as tuples or lists April 5, 2014 VSSE'14, Grenoble, France 16

  17. Perhaps SFA  NFA ? • Given SFA Create NFA whose characters are minterms of predicates occurring in the SFA • Minterms (  ,  ) = {  ,  ,  ,  } (keep satisfiable combinations only) • May blow up exponentially, e.g., the following SFA has 2 k minterms (alphabet 2 bv k ) April 5, 2014 VSSE'14, Grenoble, France 17

  18. We also want output • ... transducers April 5, 2014 VSSE'14, Grenoble, France 18

  19. Symbolic Finite Transducer (SFT) • Labels are guarded transformation functions Concrete transitions: Symbolic transition: guard p 1920 p transitions  x . 80 16 ≤ x ≤ 7FF 16 / … ‘ \ x7FF’/ ‘ \ x80’/ [C0 16 | x  10,6  , 80 16 | x  5,0  ] “ \xDF\xBF ” “ \xC2\ x80” bitvector q q operations April 5, 2014 VSSE'14, Grenoble, France 19

  20. SFT Execution Example odd(x)/[x-1] even(x)/[] even(x)/[x, x] p q odd(x)/[x-1] Input tape 1 2 5 3 p p q p p Output tape 0 2 2 4 2 20 April 5, 2014 VSSE'14, Grenoble, France

  21. Some Applications of SFAs/SFTs • SFAs: – Regex support in parameterized unit testing – Password generation • SFTs: – Analysis of string encoders/decoders – Security analysis of sanitizers April 5, 2014 VSSE'14, Grenoble, France 21

  22. Application 1 Regexes in parameterized unit testing • Rex component in Pex • Generate values for s that reach the return branches – s is a string of Unicode characters (16-bit bit-vectors) bool IsValidEmail(string s) { string r1 = @"^[A-Za-z0-9]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+$"; string r2 = @"^\d.*$"; if (System.Text.RegularExpressions.Regex.IsMatch(s, r1)) if (System.Text.RegularExpressions.Regex.IsMatch(s, r2)) return false; //branch 1 else Solve : s  L(r1)  L(r2) [eg . s = “3@a.b”] return true; //branch 2 else return false; //branch 3 Solve : s  L(r1)\L(r2) [eg . s = “a@b.c”] } Solve : s  L(r1) [eg . s = “a@..c”] April 5, 2014 VSSE'14, Grenoble, France 23

  23. Application 2 Password generation Given constraints: • Length is k: "^[\x21-\x7E]{k}$" • Contains 2 capital letters: "[A-Z].*[A-Z]" • Contains a digit: "\d" • Contains a non-word character: "\W" Generate random instances with uniform distribution that match all the above conditions. k=4 : http://www.rise4fun.com/Rex/4nE April 5, 2014 VSSE'14, Grenoble, France 24

  24. Application 3 String analysis ( motivating scenario) req = http://www.x.com/%c0%ae%c0%ae/%c0%ae%c0%ae/private/ 1) security check : req must not contain Analysis question : "../" Does utf8decode 2) dir = reject overlong utf8decode ("%c0%ae utf8-encodings such %c0%ae/%c0%ae%c 0%ae/private/") as "%C0%AE" for '.'? = "../../private/" access granted to "../../private/" Windows 2000 vulnerability: http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php April 5, 2014 Apache Tomcat vulnerability: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938 VSSE'14, Grenoble, France 25

  25. Application 3 (cont.) SFA Example • Utf8 validator (for up to 2 octet encodings) – Rejects invalid utf8 encoded strings Regex R utf8 : ^([\x00-\x7F]|[\xC2-\xDF][\x80-\xBF])*$ Accepts “../../”  x . C2 16 ≤ x ≤ DF 16  x . 0 ≤ x ≤ 7 F 16 q p  x . 80 16 ≤ x ≤ BF 16 Rejects “..%C0%AF../” April 5, 2014 VSSE'14, Grenoble, France 26

  26. Application 3 (cont.) Complete R utf8 April 5, 2014 VSSE'14, Grenoble, France 27

  27. Application 3 (cont.) Analysis scenario • Valid inputs A = SFA( R utf8 ) • Invalid inputs ( attack vectors ) A c = Complement(A) • Inputs accepted by Utf8Decode D = Domain( Utf8Decode ) • Does Utf8Decode accept an invalid input? A c  D   ? (e.g. "%c0%ae%c0%ae"  D) April 5, 2014 VSSE'14, Grenoble, France 28

  28. We also want to handle outputs • Want to analyze questions such as : Does Utf8Encode produce a bad output?  x ( Utf8Encode ( x )  Complement(SFA( R utf8 ))) ? • SFA + outputs = SFT April 5, 2014 VSSE'14, Grenoble, France 29

  29. SFT Example • Utf8 encoder – Input : valid utf16 encoded string – Output : equivalent utf8 encoded string For example utf8encode(“ \uFF28\ uFF29”) = “ \xEF\xBC\xA8\xEF\xBC\ xA9” Equiv. classical 5 states & transducer has 2 16 transitions 11 transitions April 5, 2014 VSSE'14, Grenoble, France 30

  30. Bek (a frontend language for SFTs) program smileycipher(w) { return iter (c in w) { case ( true ): yield (0xD83D,(c - 'a') + 0xDE00); }; } http://www.rise4fun.com/Bek/ZH0 April 5, 2014 VSSE'14, Grenoble, France 31

Recommend


More recommend