pe miner mining structural information to detect
play

PE-Miner : Mining Structural Information to Detect Malicious - PowerPoint PPT Presentation

PE-Miner : Mining Structural Information to Detect Malicious Executables in Realtime M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, Muddassar Farooq RAID,2009


  1. PE-Miner : Mining Structural Information to Detect Malicious Executables in Realtime M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, Muddassar Farooq RAID,
2009 
 Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan


  2. Agenda Outline Introduc1on
to
Domain
 Problem
Defini1on
 Proposed
Solu1on
 Evalua1on
 Literature
Survey
 Results
and
Discussion
 Conclusion
 Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 2

  3. Domain Introduction Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 3

  4. Introduction Computer malware is a widespread problem… Backdoor, Virus, Worm, Trojan, etc. A number of commercial anti-virus software Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 4

  5. Financial losses… Number of new threats Estimated Damage (in billions of US Dollars) Total threats 55 600 Milliers 500 400 300 25 200 17,1 13,2 12,1 100 0 Jan-Jun 2002 Jul-Dec 2002 Jan-Jun 2003 Jul-Dec 2003 Jan-Jun 2004 Jul-Dec 2004 Jan-Jun 2005 Jul-Dec 2005 Jan-Jun 2006 Jul-Dec 2006 Jan-Jun 2007 1999 2000 2001 2002 2003 Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 5

  6. Need of non-signature based AV?  Problems with signature matching…  Size of signature database cannot scale  Evaded by simple code obfuscation techniques Norton AV Command AV McAfee AV Chernobyl-1.4 F0sf0r0 Hare Z0mbie-6.b Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 6

  7. How good are non-signature based solutions  Usually leverage  Statistical analysis of machine level byte content  Disassembled code  Run-time API calls  Problems with existing non-signature solutions…  High false alarm rate  Large scanning overheads Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 7

  8. Problem Definition Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 8

  9. Problem Definition  Non-signature based detector  Keep run-time complexity low  “Content Independent” features  Low false alarm rate Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 9

  10. Proposed Solution Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 10

  11. Proposed Solution: “PE-Miner” Leverage the structural information of an executable • Extract structural features from all portions of an executable • Standard pre-processing to remove redundancy • • Use supervised classification algorithms for detection • Training models provide comprehendible insights for forensic experts Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 11

  12. PE-Miner Framework  Uses novel structural features to efficiently detect malicious PE files  Strict requirements of the system:  Must be a pure non-signature based framework with an ability to detect zero-day malicious PE files.  Must be realtime deployable i.e. more than 99% tp rate and less than 1% fp rate  Design must be modular that allows for the plug-n-play design philosophy Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 12

  13. PE-Miner Framework  A threefold research methodology in our static analysis: Identify a set of structural features for PE files which 1. is computable in realtime, use an efficient preprocessor for removing 2. redundancy in the features’ set, and select an efficient data mining algorithm for final 3. classification Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 13

  14. Proposed Architecture Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan


  15. Which PE format features to select?  Structural features from Windows PE file format  189 features selected  For example malicious exe’s have usually  bigger import tables,  smaller resource tables  no exception tables Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 15

  16. Which PE format features to select? Name Benign Backdor Cons DoS+ Flooder Exploit+ Worm Trojan Virus Malfease of +Sniffer +Virto Nuker Hacktool feature ol Import 5.8 19.2 6.1 7.9 20.8 7.1 23.4 10.3 6.2 4.7 Table Size Rsrc 32.6 5.5 1.5 1.4 6.2 1.0 2.6 2.2 0.5 5.9 Table Size Excep 12.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.5 tion Table Full table in the paper Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 16

  17. Which Pre-processor be used?  Why preprocessing?  Out of 189 features, some might not convey useful information!  Either remove / combine such features  To reduce the dimensionality of input feature space  Reduces training / testing times of classifiers  Three pre-processing algorithms used Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 17

  18. Feature Pre-processing Algorithms Redundant Feature Removal (RFR) -- repeated values • Principal Component Analysis (PCA) -- data variance • Haar Wavelet Transform (HWT) -- approximation of function •  RFR is selected due to the high detection accuracy obtained after applying it, as well as it is realtime deployable Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 18

  19. Which classification algorithm? IBk – nearest neighbor algorithm • J48 – decision tree • NB – Bayesian classifier • SMO – optimized support vector machine • RIPPER – inductive rule learning algorithm •  J48 is selected due to its highest detection accuracy and low computational complexity and it is also realtime deployable after performing the timing analysis Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 19

  20. Evaluation Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 20

  21. Evaluation  Evaluation of the proposed framework is done on 2 well known malware collections.  Evaluation datasets  VX Heavens virus collection  10 thousand labeled malware  Malfease malware collection  5 thousand malware Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 21

  22. Literature Survey Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 22

  23. Learning to Detect and Classify Malicious Executables in the Wild J. Zico Kolter, Macus A. Maloof @ Stanford University, George Town University, USA Journal of Machine Learning Research, MIT Press, 2006 . (ISI Impact Factor: 2.682) Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 23

  24. Learning to Detect and Classify Malicious Executables in the Wild N-gram Executable File Analysis Benign N-gram Feature Extraction Classification Malicious N-gram Result? Algorithm Overview of KM Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 24

  25. Critiques + First “real” application of n-gram analysis for malware detection + Forensic insights from trained models + High accuracy + Classification of malicious executables as a function of their payload function (i.e., backdoor, worm, virus, etc.) - Huge computational complexity in training. (several days) - Not robust to malware packing - False alarms for packed benign files Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 25

  26. McBoost: Boosting Scalability in Malware Analysis Using Statistical Classification of Executables R. Perdisci, A. Lanzi, W. Lee @ Georgia Tech University, Damballa Inc., USA Annual Computer Security Applications Conference (ACSAC), USA, 2008. (acceptance rate 24.3%) Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 26

  27. McBoost: Boosting Scalability in Malware Analysis Using Statistical Classification of Executables hidden dynamic code unpacker C1 packed A1 Executable Σ A2 C2 File non-packed A3 Malcode Classifier Heuristic packer detector Result? Overview of McBoost Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 27

  28. Critiques + First ever technique that leverages packer identification + Uses unpacker to extract hidden malicious code + Separate n-gram training models for packed and unpacked executable files - High run-time computational overhead; not feasible for realtime deployment - Inherits problems with the use of dynamic unpacker; halt, crash, evasion Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 28

  29. Results and Discussion Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 29

  30. Results Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 30

  31. Discussion  Highly Accurate  Low scanning overheads  Structural features are robust to evasion attempts? Next
Generation
Intelligent
Networks
Research
Center
(nexGIN
RC),
Pakistan
 31

Recommend


More recommend