viruses 3 anti virus
play

viruses 3 / anti-virus 1 Changelog Corrections made in this - PowerPoint PPT Presentation

viruses 3 / anti-virus 1 Changelog Corrections made in this version not in fjrst posting: 8 Feb 2017: slide 31: visible space after negative foo example 8 Feb 2017: slide 35: [a-zA-Z]*ing instead of [a-zA-Z]ing 8 Feb 2017: slide 56: correct


  1. regular expressions one method of representing patterns like this: regular expressions (regexes) restricted language allows very fast implementations especially when there’s a long list of patterns to look for homework assignment next week 29

  2. regular expressions: implementations multiple implementations of regular expressions we will target: fmex, a parser generator 30

  3. simple patterns alphanumeric characters match themselves foo : matches exactly foo only does not match Foo does not match foobar backslash might be needed for others C\+\+ matches exactly C++ only 31 does not match foo ␣

  4. metachars (1) special ways to match characters \n , \t , \x3C , …— work like in C [b-fi] — b or c or d or e or f or i [^b-fi] — any character but b or c or … . — any character except newline (.|\n) — any character 32

  5. metachars (2) a* — zero or more as: (empty string), a , aa , aaa , … a{3,5} — three to fjve as: aaa , aaaa , aaaaa (abc){3,5} — three to fjve abcs: (“grouping”) abcabcabc , abcabcabcabc , abcabcabcabcabc ab|cd ab , cd (ab|cd){2} — two ab-or-cds: abab , abcd , cdab , cdcd 33

  6. metachars (3) \xAB — the byte 0xAB \x00 — the byte 0x00 fmex is designed for text, handles binary fjne \n — newline (and other C string escapes) 34

  7. example regular expressions match words ending with ing : [a-zA-Z]*ing match C /* ... */ comments: /\*([^*]|\*[^/])*\*/ 35

  8. fmex fmex is a regular expression matching tool generates C code parser function called yylex 36 intended for writing parsers

  9. fmex example int main(void) { extra code to include as parser: return “token” here patterns, code to run on match C code in output fjle fjrst — declarations for later three sections } num_bytes, num_lines, num_foos); printf("%d bytes, %d lines, %d foos\n", yylex(); %% int num_bytes = 0, num_lines = 0; { num_lines += 1; num_bytes += 1; } \n { num_bytes += 1; } . } num_foos += 1; num_bytes += 3; { foo %% int num_foos = 0; 37

  10. fmex example int main(void) { extra code to include as parser: return “token” here patterns, code to run on match C code in output fjle fjrst — declarations for later three sections } num_bytes, num_lines, num_foos); printf("%d bytes, %d lines, %d foos\n", yylex(); %% int num_bytes = 0, num_lines = 0; { num_lines += 1; num_bytes += 1; } \n { num_bytes += 1; } . } num_foos += 1; num_bytes += 3; { foo %% int num_foos = 0; 37

  11. fmex example int main(void) { extra code to include as parser: return “token” here patterns, code to run on match C code in output fjle fjrst — declarations for later three sections } num_bytes, num_lines, num_foos); printf("%d bytes, %d lines, %d foos\n", yylex(); %% int num_bytes = 0, num_lines = 0; { num_lines += 1; num_bytes += 1; } \n { num_bytes += 1; } . } num_foos += 1; num_bytes += 3; { foo %% int num_foos = 0; 37

  12. fmex example int main(void) { extra code to include as parser: return “token” here patterns, code to run on match C code in output fjle fjrst — declarations for later three sections } num_bytes, num_lines, num_foos); printf("%d bytes, %d lines, %d foos\n", yylex(); %% int num_bytes = 0, num_lines = 0; { num_lines += 1; num_bytes += 1; } \n { num_bytes += 1; } . } num_foos += 1; num_bytes += 3; { foo %% int num_foos = 0; 37

  13. fmex example int main(void) { extra code to include as parser: return “token” here patterns, code to run on match C code in output fjle fjrst — declarations for later three sections } num_bytes, num_lines, num_foos); printf("%d bytes, %d lines, %d foos\n", yylex(); %% int num_bytes = 0, num_lines = 0; { num_lines += 1; num_bytes += 1; } \n { num_bytes += 1; } . } num_foos += 1; num_bytes += 3; { foo %% int num_foos = 0; 37

  14. fmex: matched text %% yytext); } (.|\n) {} /* default rule: would output text */ %% int main(void) { yylex(); } yytext — text of matched thing 38 [aA][a − z]* { printf("found a − word '%s'\n",

  15. fmex: matched text %% yytext); } (.|\n) {} /* default rule: would output text */ %% int main(void) { yylex(); } yytext — text of matched thing 38 [aA][a − z]* { printf("found a − word '%s'\n",

  16. fmex: defjnitions {ANY} included later defjnitions of common patterns } yylex(); int main(void) { %% output text */ {} /* default rule would } A yytext); {A}{LOWERS}* { %% (.|\n) ANY LOWERS [aA] 39 [a − z] printf("found a − word '%s'\n",

  17. fmex: defjnitions {ANY} included later defjnitions of common patterns } yylex(); int main(void) { %% output text */ {} /* default rule would } A yytext); {A}{LOWERS}* { %% (.|\n) ANY LOWERS [aA] 39 [a − z] printf("found a − word '%s'\n",

  18. ) 2 k c a b ( fmex: state machines o (back 1) \n o f foo \n . foo fo f start {...} \n {...} . {...} 40 other

  19. fmex: state machines foo (back 1) \n o o foo \n . f fo {...} f . {...} \n {...} start 40 other ) 2 k c a b (

  20. state machine matching \n (back 1) \n abfoofoabffoo o f o . foo fo f start alt 41 other ) 2 k c a b (

  21. state machine matching \n (back 1) \n abfoofoabffoo o f o . foo fo f start alt 41 other ) 2 k c a b (

  22. state machine matching \n (back 1) \n abfoofoabffoo o f o . foo fo f start alt 41 other ) 2 k c a b (

  23. state machine matching \n (back 1) \n abfoofoabffoo o f o . foo fo f start alt 41 other ) 2 k c a b (

  24. state machine matching \n (back 1) \n abfoofoabffoo o f o . foo fo f start alt 41 other ) 2 k c a b (

  25. state machine matching \n (back 1) \n abfoofoabffoo o f o . foo fo f start alt 41 other ) 2 k c a b (

  26. state machine matching \n (back 1) \n abfoofoabffoo o f o . foo fo f start alt 41 other ) 2 k c a b (

  27. state machine matching \n (back 1) \n abfoofoabffoo o f o . foo fo f start alt 41 other ) 2 k c a b (

  28. fmex states (1) { printf("foo out of string\n"); } which state determines what patterns are active declare “state” to track } yylex(); int main(void) { %% <INITIAL,str>(.|\n) {} foo %x str { printf("foo in string\n"); } <str>foo { BEGIN(INITIAL); } <str>\" { BEGIN(str); } \" %% 42

  29. fmex states (1) { printf("foo out of string\n"); } which state determines what patterns are active declare “state” to track } yylex(); int main(void) { %% <INITIAL,str>(.|\n) {} foo %x str { printf("foo in string\n"); } <str>foo { BEGIN(INITIAL); } <str>\" { BEGIN(str); } \" %% 42

  30. fmex states (1) <INITIAL,str>(.|\n) {} “x” — exclusive which state determines what patterns are active declare “state” to track } yylex(); int main(void) { %% { printf("foo out of string\n"); } %x str foo { printf("foo in string\n"); } <str>foo { BEGIN(INITIAL); } <str>\" { BEGIN(str); } \" %% 42

  31. fmex states (2) BEGIN(afterfoo); declare non-exclusive state } yylex(); int main( void ) { %% {} (.|\n) } foo\n"); %s afterFoo printf("first { foo foo\n"); } { printf("later <afterFoo>foo %% 43 ␣ ␣

  32. fmex states (2) BEGIN(afterfoo); declare non-exclusive state } yylex(); int main( void ) { %% {} (.|\n) } foo\n"); %s afterFoo printf("first { foo foo\n"); } { printf("later <afterFoo>foo %% 43 ␣ ␣

  33. why this? (basically) one pass matching basically speed of fjle I/O handles multiple patterns well fmexible for “special cases” real anti-virus: probably custom pattern “engine” 44

  34. why this? (basically) one pass matching basically speed of fjle I/O handles multiple patterns well fmexible for “special cases” real anti-virus: probably custom pattern “engine” 44

  35. other fmex features escape hatch — I/O directly from code including “unget” function (match normally instead) allows extra ad-hoc logic 45

  36. future fmex assignment coming weeks — will have a fmex assignment give you idea what pattern matching can do produce pattern for push $…; ret . 46

  37. Vienna patterns (1) simple Vienna patterns: /* bytes of fixed part of Vienna sample */ \xFC\x89\xD6\x83\xC6\x81\xc7\x00\x01\x83 (etc) { printf("found Vienna code\n"); } 47

  38. Vienna patterns (2) simple Vienna patterns: /* Vienna sample with wildcards for changing bytes: */ /* push %CX; mov ???, %dx; cld; ... */ \x51\xBA(.|\n)(.|\n)\xFC\x89 (etc) { printf("found Vienna code w/placeholder\n"); } /* mov $0x100, %di; push %di; xor %di, %di; ret */ \xBF\x00\x01\x57\x31\xFF\xC3 { printf("found Vienna return code\n"); } 48

  39. Vienna patterns (2) simple Vienna patterns: /* Vienna sample with wildcards for changing bytes: */ /* push %CX; mov ???, %dx; cld; ... */ \x51\xBA(.|\n)(.|\n)\xFC\x89 (etc) { printf("found Vienna code w/placeholder\n"); } /* mov $0x100, %di; push %di; xor %di, %di; ret */ \xBF\x00\x01\x57\x31\xFF\xC3 { printf("found Vienna return code\n"); } 48

  40. avoiding sensitivity: virus patterns recall: things viruses can’t easily change! example: inserted jumps to virus codes code in weird parts of executable fjle code that modifjes executables … 49

  41. generic generalizing take static parts of virus e.g. foobarbaz is 2 from fooxaxbaz slower than regular-expression-like scanners 50 look for distance to match

  42. pattern cost constructed by hand? question: how could we automate? false positives? push + ret really unused? jmp at beginning? what about data bytes? … 51

  43. after scanning — disinfection antivirus software wants to repair requires specialized scanning no room for errors need to fjnd relocated bits of code 52 need to identify all

  44. making scanners efficient lots of viruses! huge number of states, tables copies of every piece of malware pretty large reading fjles is slow! 53

  45. making scanners efficient lots of viruses! huge number of states, tables copies of every piece of malware pretty large reading fjles is slow! 54

  46. handling volume storing signature strings is non-trivial tens of thousands of states??? observation: fjxed strings dominate 55

  47. scanning for fjxed strings (full pattern for Virus B) hash function … 994254A3 34598873 FC923131 4-byte hash … … malware 16-byte “anchor” 56 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 … 204D616C6963696F7573205468696E6720 Virus A 34567890ABCDEF023456789ABCDEFG0345 Virus B 6120766972757320737472696E679090F2 Virus C

  48. scanning for fjxed strings (full pattern for Virus B) hash function … 994254A3 34598873 FC923131 4-byte hash … … malware 16-byte “anchor” 56 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 … 204D616C6963696F7573205468696E6720 Virus A 34567890ABCDEF023456789ABCDEFG0345 Virus B 6120766972757320737472696E679090F2 Virus C

  49. scanning for fjxed strings (full pattern for Virus B) hash function … 994254A3 34598873 FC923131 4-byte hash … … malware 16-byte “anchor” 56 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 … 204D616C6963696F7573205468696E6720 Virus A 34567890ABCDEF023456789ABCDEFG0345 Virus B 6120766972757320737472696E679090F2 Virus C

  50. scanning for fjxed strings (full pattern for Virus B) hash function … 994254A3 34598873 FC923131 4-byte hash … … malware 16-byte “anchor” 56 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 … 204D616C6963696F7573205468696E6720 Virus A 34567890ABCDEF023456789ABCDEFG0345 Virus B 6120766972757320737472696E679090F2 Virus C

  51. scanning for fjxed strings (full pattern for Virus B) hash function … 994254A3 34598873 FC923131 4-byte hash … … malware 16-byte “anchor” 56 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 03 45 67 … 204D616C6963696F7573205468696E6720 Virus A 34567890ABCDEF023456789ABCDEFG0345 Virus B 6120766972757320737472696E679090F2 Virus C

  52. real signatures: ClamAV ClamAV: open source email scanning software signature types: hash of fjle hash of contents of segment of executable built-in executable, archive fjle parser fjxed string basic regular expressions wildcards, character classes, alternatives more complete regular expressions including features that need more than state machines meta-signatures: match if other signatures match icon image fuzzy-matching 57

  53. the I/O problem scanning still requires reading the whole fjle can we do better? 58

  54. selective scanning check entry point and end only a lot less I/O, maybe check known ofgsets from entry point heuristic: is entry point close to end of fjle? 59

  55. virus choices? why don’t viruses always append/replace? why don’t viruses always change start location? why did I bother talking about all these strategies? head/tail scanning? check for suspicious starting location? 60

  56. playing mouse techniques so far: scan for pattern of constant part of virus scan for strings, approx. 16-bytes long scan top and bottom virus-writer hat: how can you defeat these? change some trivial part of virus — e.g. add nops somewhere insert nops everywhere; split any big strings insert jump in middle keep code out of end of fjle 61

  57. playing mouse techniques so far: scan for pattern of constant part of virus scan for strings, approx. 16-bytes long scan top and bottom virus-writer hat: how can you defeat these? change some trivial part of virus — e.g. add nops somewhere insert nops everywhere; split any big strings insert jump in middle keep code out of end of fjle 61

  58. playing mouse techniques so far: scan for pattern of constant part of virus scan for strings, approx. 16-bytes long scan top and bottom virus-writer hat: how can you defeat these? change some trivial part of virus — e.g. add nops somewhere insert nops everywhere; split any big strings insert jump in middle keep code out of end of fjle 61

  59. playing mouse techniques so far: scan for pattern of constant part of virus scan for strings, approx. 16-bytes long scan top and bottom virus-writer hat: how can you defeat these? change some trivial part of virus — e.g. add nops somewhere insert nops everywhere; split any big strings insert jump in middle keep code out of end of fjle 61

  60. playing mouse: preview later: metamorphic/polymorphic viruses signature resistent change every time anti-analysis techniques make reverse engineering harder 62

  61. playing cat harder to fool ways of detecting malware? goal: small changes to malware preserve detection ideal: detect new malware 63

  62. detecting new malware look for anomalies patterns of code that real executables “won’t” have identify bad behavior 64

Recommend


More recommend