PE-Miner : Mining Structural Information to Detect Malicious Executables in Realtime M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, Muddassar Farooq RAID, 2009 Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Agenda Outline Introduc1on to Domain Problem Defini1on Proposed Solu1on Evalua1on Literature Survey Results and Discussion Conclusion Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 2
Domain Introduction Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 3
Introduction Computer malware is a widespread problem… Backdoor, Virus, Worm, Trojan, etc. A number of commercial anti-virus software Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 4
Financial losses… Number of new threats Estimated Damage (in billions of US Dollars) Total threats 55 600 Milliers 500 400 300 25 200 17,1 13,2 12,1 100 0 Jan-Jun 2002 Jul-Dec 2002 Jan-Jun 2003 Jul-Dec 2003 Jan-Jun 2004 Jul-Dec 2004 Jan-Jun 2005 Jul-Dec 2005 Jan-Jun 2006 Jul-Dec 2006 Jan-Jun 2007 1999 2000 2001 2002 2003 Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 5
Need of non-signature based AV? Problems with signature matching… Size of signature database cannot scale Evaded by simple code obfuscation techniques Norton AV Command AV McAfee AV Chernobyl-1.4 F0sf0r0 Hare Z0mbie-6.b Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 6
How good are non-signature based solutions Usually leverage Statistical analysis of machine level byte content Disassembled code Run-time API calls Problems with existing non-signature solutions… High false alarm rate Large scanning overheads Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 7
Problem Definition Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 8
Problem Definition Non-signature based detector Keep run-time complexity low “Content Independent” features Low false alarm rate Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 9
Proposed Solution Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 10
Proposed Solution: “PE-Miner” Leverage the structural information of an executable • Extract structural features from all portions of an executable • Standard pre-processing to remove redundancy • • Use supervised classification algorithms for detection • Training models provide comprehendible insights for forensic experts Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 11
PE-Miner Framework Uses novel structural features to efficiently detect malicious PE files Strict requirements of the system: Must be a pure non-signature based framework with an ability to detect zero-day malicious PE files. Must be realtime deployable i.e. more than 99% tp rate and less than 1% fp rate Design must be modular that allows for the plug-n-play design philosophy Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 12
PE-Miner Framework A threefold research methodology in our static analysis: Identify a set of structural features for PE files which 1. is computable in realtime, use an efficient preprocessor for removing 2. redundancy in the features’ set, and select an efficient data mining algorithm for final 3. classification Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 13
Proposed Architecture Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan
Which PE format features to select? Structural features from Windows PE file format 189 features selected For example malicious exe’s have usually bigger import tables, smaller resource tables no exception tables Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 15
Which PE format features to select? Name Benign Backdor Cons DoS+ Flooder Exploit+ Worm Trojan Virus Malfease of +Sniffer +Virto Nuker Hacktool feature ol Import 5.8 19.2 6.1 7.9 20.8 7.1 23.4 10.3 6.2 4.7 Table Size Rsrc 32.6 5.5 1.5 1.4 6.2 1.0 2.6 2.2 0.5 5.9 Table Size Excep 12.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.5 tion Table Full table in the paper Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 16
Which Pre-processor be used? Why preprocessing? Out of 189 features, some might not convey useful information! Either remove / combine such features To reduce the dimensionality of input feature space Reduces training / testing times of classifiers Three pre-processing algorithms used Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 17
Feature Pre-processing Algorithms Redundant Feature Removal (RFR) -- repeated values • Principal Component Analysis (PCA) -- data variance • Haar Wavelet Transform (HWT) -- approximation of function • RFR is selected due to the high detection accuracy obtained after applying it, as well as it is realtime deployable Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 18
Which classification algorithm? IBk – nearest neighbor algorithm • J48 – decision tree • NB – Bayesian classifier • SMO – optimized support vector machine • RIPPER – inductive rule learning algorithm • J48 is selected due to its highest detection accuracy and low computational complexity and it is also realtime deployable after performing the timing analysis Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 19
Evaluation Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 20
Evaluation Evaluation of the proposed framework is done on 2 well known malware collections. Evaluation datasets VX Heavens virus collection 10 thousand labeled malware Malfease malware collection 5 thousand malware Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 21
Literature Survey Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 22
Learning to Detect and Classify Malicious Executables in the Wild J. Zico Kolter, Macus A. Maloof @ Stanford University, George Town University, USA Journal of Machine Learning Research, MIT Press, 2006 . (ISI Impact Factor: 2.682) Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 23
Learning to Detect and Classify Malicious Executables in the Wild N-gram Executable File Analysis Benign N-gram Feature Extraction Classification Malicious N-gram Result? Algorithm Overview of KM Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 24
Critiques + First “real” application of n-gram analysis for malware detection + Forensic insights from trained models + High accuracy + Classification of malicious executables as a function of their payload function (i.e., backdoor, worm, virus, etc.) - Huge computational complexity in training. (several days) - Not robust to malware packing - False alarms for packed benign files Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 25
McBoost: Boosting Scalability in Malware Analysis Using Statistical Classification of Executables R. Perdisci, A. Lanzi, W. Lee @ Georgia Tech University, Damballa Inc., USA Annual Computer Security Applications Conference (ACSAC), USA, 2008. (acceptance rate 24.3%) Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 26
McBoost: Boosting Scalability in Malware Analysis Using Statistical Classification of Executables hidden dynamic code unpacker C1 packed A1 Executable Σ A2 C2 File non-packed A3 Malcode Classifier Heuristic packer detector Result? Overview of McBoost Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 27
Critiques + First ever technique that leverages packer identification + Uses unpacker to extract hidden malicious code + Separate n-gram training models for packed and unpacked executable files - High run-time computational overhead; not feasible for realtime deployment - Inherits problems with the use of dynamic unpacker; halt, crash, evasion Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 28
Results and Discussion Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 29
Results Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 30
Discussion Highly Accurate Low scanning overheads Structural features are robust to evasion attempts? Next Generation Intelligent Networks Research Center (nexGIN RC), Pakistan 31
Recommend
More recommend