a semantics based approach to malware detection
play

A Semantics-based Approach to Malware Detection Mila Dalla Preda - PowerPoint PPT Presentation

A Semantics-based Approach to Malware Detection Mila Dalla Preda University of Verona, Italy Mihai Christodorescu, Somesh Jha University of Wisconsin, USA Saumya Debray University of Arizona, USA 17-19 Jan, POPL 07, Nice A


  1. A Semantics-based Approach to Malware Detection Mila Dalla Preda – University of Verona, Italy Mihai Christodorescu, Somesh Jha – University of Wisconsin, USA Saumya Debray – University of Arizona, USA 17-19 Jan, POPL ’07, Nice A Semantics-based Approach to Malware Detection – p.1

  2. A Few Basic Definitions Malware represents malicious software. Malware detector is a program D that determines whether another program P is infected with a malware M . � True if D determines that P is infected with M D ( P, M ) = False otherwise A Semantics-based Approach to Malware Detection – p.2

  3. A Few Basic Definitions Malware represents malicious software. Malware detector is a program D that determines whether another program P is infected with a malware M . � True if D determines that P is infected with M D ( P, M ) = False otherwise An ideal malware detector detects all and only the programs infected with M , i.e., it is sound and complete. Sound = no false positives (no false alarms) Complete = no false negatives (no missed alarms) A Semantics-based Approach to Malware Detection – p.2

  4. Malware Trends There is more malware every year. New Malware 10992 445 2002 2003 2004 2005 A Semantics-based Approach to Malware Detection – p.3

  5. Malware Trends There is more malware every year. New Malware 10992 New Malware Families 445 141 101 2002 2003 2004 2005 But the number of malware families has almost no variation. Beagle family has 197 variants (as of Nov. 30). Warezov family has 218 variants (as on Nov. 27). A Semantics-based Approach to Malware Detection – p.3

  6. The Malware Threat Current detectors are signature-based: P matches byte-signature sig P is infected ⇒ Signature-based detectors, when sound, are not complete. Malware writers use obfuscation to evade current detectors. A Semantics-based Approach to Malware Detection – p.4

  7. The Malware Threat Current detectors are signature-based: P matches byte-signature sig P is infected ⇒ Signature-based detectors, when sound, are not complete. Malware writers use obfuscation to evade current detectors. Virus–antivirus “coevolution” 1. Malware writers create new, undetected malware. 2. Antimalware tools are updated to catch the new malware. 3. Repeat... A Semantics-based Approach to Malware Detection – p.4

  8. Common Obfuscations Nop insertion Register renaming Junk insertion Code reordering Encryption Reordering of independent statements Reversing of branch conditions Equivalent instruction substitution Opaque predicate insertion ... and many others... A Semantics-based Approach to Malware Detection – p.5

  9. Common Obfuscations Nop insertion Register renaming Junk insertion Code reordering Encryption Reordering of independent statements Reversing of branch conditions Equivalent instruction substitution Opaque predicate insertion ... and many others... A Semantics-based Approach to Malware Detection – p.5

  10. Obfuscation Example (Pseudo-)Code: mov eax, [edx+0Ch] push ebx push [eax] call ReleaseLock A Semantics-based Approach to Malware Detection – p.6

  11. Obfuscation Example (Pseudo-)Code: Obfuscated code (junk): mov eax, [edx+0Ch] mov eax, [edx+0Ch] push ebx inc eax push [eax] push ebx call ReleaseLock dec eax push [eax] call ReleaseLock A Semantics-based Approach to Malware Detection – p.6

  12. Obfuscation Example (Pseudo-)Code: Obfuscated code (junk + reordering): mov eax, [edx+0Ch] mov eax, [edx+0Ch] jmp +3 push ebx push [eax] push ebx call ReleaseLock dec eax jmp +4 inc eax jmp -3 call ReleaseLock jmp +2 push [eax] jmp -2 A Semantics-based Approach to Malware Detection – p.6

  13. Solutions? Recent developments based on deep static analysis: Detecting Malicious Code by Model Checking [Kinder et al. 2005] Semantics-Aware Malware Detection [Christodorescu et al. 2005] Behavior-based Spyware Detection [Kirda et al. 2006] A Semantics-based Approach to Malware Detection – p.7

  14. Solutions? Recent developments based on deep static analysis: Detecting Malicious Code by Model Checking [Kinder et al. 2005] Semantics-Aware Malware Detection [Christodorescu et al. 2005] Behavior-based Spyware Detection [Kirda et al. 2006] Lack of a formal framework for assessing these techniques. A Semantics-based Approach to Malware Detection – p.7

  15. Our Contributions Challenges: Many different obfuscations Obfuscations are usually combined Detection schemes usually rely on static/dynamic analyses A Semantics-based Approach to Malware Detection – p.8

  16. Our Contributions Challenges: Many different obfuscations Obfuscations are usually combined Detection schemes usually rely on static/dynamic analyses A framework for assessing the resilience to obfuscation of malware detectors. Obfuscation as transformation of trace semantics Malware detection as abstract interpretation of trace semantics Composing obfuscations vs. composing detectors A Semantics-based Approach to Malware Detection – p.8

  17. Two Worlds of Malware Detectors Malware detector on finite semantic structure Disassembler CFG construction Other analyses A Semantics-based Approach to Malware Detection – p.9

  18. Two Worlds of Malware Detectors Malware detector Malware detector on finite semantic structure on trace semantics Disassembler CFG construction Other analyses A Semantics-based Approach to Malware Detection – p.9

  19. Two Worlds of Malware Detectors Malware detector Malware detector on finite semantic structure on trace semantics Disassembler CFG construction Other analyses A Semantics-based Approach to Malware Detection – p.9

  20. Abstract Interpretation Design approximate semantics of programs [Cousot & Cousot ’77, ’79]. ⊤ ⊤ γ γ ( α ( c )) α ( c ) α c ⊥ ⊥ A C Galois Connection: � C, α, γ, A � , A and C are complete lattices. � Abs ( C ) , ⊑� set of all possible abstract domains, A 1 ⊑ A 2 if A 1 is more concrete than A 2 A Semantics-based Approach to Malware Detection – p.10

  21. Outline Semantic Malware Detector Soundness and Completeness Classifying Obfuscations Composing Obfuscations Proving Soundness and Completeness A Semantics-based Approach to Malware Detection – p.11

  22. Semantic Malware Detector A program P is infected by malware M , denoted M ֒ → P if (a part) of P execution is similar to that of M : A Semantics-based Approach to Malware Detection – p.12

  23. Semantic Malware Detector A program P is infected by malware M , denoted M ֒ → P if (a part) of P execution is similar to that of M : ∃ restriction r : S [ ] ) ⊆ α r ( S [ [ M ] [ P ] ] ) A Semantics-based Approach to Malware Detection – p.12

  24. Semantic Malware Detector A program P is infected by malware M , denoted M ֒ → P if (a part) of P execution is similar to that of M : ∃ restriction r : S [ ] ⊆ α r ( S [ [ M ] [ P ] ]) α r program trace malware trace A Semantics-based Approach to Malware Detection – p.12

  25. Semantic Malware Detector A program P is infected by malware M , denoted M ֒ → P if (a part) of P execution is similar to that of M : ∃ restriction r : S [ ] ⊆ α r ( S [ [ M ] [ P ] ]) α r program trace malware trace Vanilla Malware i.e. not obfuscated malware A Semantics-based Approach to Malware Detection – p.12

  26. Obfuscated Malware O : P → P obfuscating transformation α : Sem → A abstraction that discards the details changed by the obfuscation while preserving maliciousness ∃ restriction r : α ( S [ [ M ] ]) ⊆ α ( α r ( S [ [ P ] ])) A Semantics-based Approach to Malware Detection – p.13

  27. Obfuscated Malware O : P → P obfuscating transformation α : Sem → A abstraction that discards the details changed by the obfuscation while preserving maliciousness ∃ restriction r : α ( S [ [ M ] ]) ⊆ α ( α r ( S [ [ P ] ])) obfuscated malware trace α α r program trace malware trace α A Semantics-based Approach to Malware Detection – p.13

  28. Sound vs. Complete Precision of the Semantic Malware Detector (SMD) depends on α A Semantics-based Approach to Malware Detection – p.14

  29. Sound vs. Complete Precision of the Semantic Malware Detector (SMD) depends on α A SMD on α is complete w.r.t. a set O of transformations if ∀O ∈ O : � ∃ restriction r : O ( M ) ֒ → P ⇒ α ( S [ [ M ] ]) ⊆ α ( α r ( S [ [ P ] ])) always detects programs that are infected (no false negatives) A Semantics-based Approach to Malware Detection – p.14

  30. Sound vs. Complete Precision of the Semantic Malware Detector (SMD) depends on α A SMD on α is complete w.r.t. a set O of transformations if ∀O ∈ O : � ∃ restriction r : O ( M ) ֒ → P ⇒ α ( S [ [ M ] ]) ⊆ α ( α r ( S [ [ P ] ])) always detects programs that are infected (no false negatives) If α is preserved by O then the SMD on α is complete w.r.t. O . A Semantics-based Approach to Malware Detection – p.14

  31. Sound vs. Complete Precision of the Semantic Malware Detector (SMD) depends on α A SMD on α is sound w.r.t. a set O of transformations if: � ∃ restriction r : ⇒ ∃O ∈ O : O ( M ) ֒ → P α ( S [ [ M ] ]) ⊆ α ( α r ( S [ [ P ] ])) never erroneously claims a program is infected (no false positives) A Semantics-based Approach to Malware Detection – p.14

Recommend


More recommend