RESource: A Framework for Online Matching of Assembly with Open Source Code Ashkan Rahimian*, Philippe Charland**, Stere Preda*, and Mourad Debbabi* *Computer Security Laboratory, CIISE Concordia University, Montreal, Quebec, Canada **Mission Critical Cyber Security Section, Defence R&D Canada - Valcartier, Quebec, Canada ETS – Montréal Oct. 26 th 2012
Outline • Background • Motivation • Methodology • Case study • Conclusion
Background • Software Reverse Engineering • Problem : Binary (Assembly) to Source Matching • Domain : Malware Analysis • Facts : Code Reuse • Code Search Engines • Shared Library Imports and Utilization • E.g., cryptographic libraries • Free and Open Source Software (FOSS) • Assumptions: No obfuscation, De-obfuscated code
Background Malware might be built on top of standard components. – e.g. VCL, MFC, … Malware developers use specific development environment. – MS Visual Studio, Borland (Embarcadero), Eclipse, … Some code may contain fingerprints of the programmer. – Executable File Malware authors may utilize free and open-source software. – Encryption algorithms Malware often call low-level kernel APIs. – User level vs. Kernel level, Bypass common signature templates
Outline • Background • Motivation • Methodology • Case study • Conclusion
Motivation • 26 million new malware samples identified in 2011 [1] • Software reverse engineering is a manually intensive and time- consuming process • Malware authors share source code • Code sharing websites, Forums, etc. • E.g. Flame and Stuxnet are linked • Open source libraries widespread • Koders, Ohloh, Antepidia, Krugle, Google Code, etc. • Software reverse engineers need Automated Tools • Mapping ASM to Source Code • First attempt: RE-Google [1] Panda Security, “PandaLabs Annual Report: 2011 Summary,” Jan. 2011; http://press.pandasecurity.com/wp-content/uploads/2012/01/Annual-Report- PandaLabs-2011.pdf.
Outline • Background • Motivation • Methodology • Case study • Conclusion
Methodology (1/4) • Static Code Analysis • Input: ASM file obtained with IDA Pro Query Generator Code Search Engines S.E. Driver Request Processing Engine • RESource Response Parser Engine Data Extraction Feature Extraction Offline Analysis ASM
Methodology (2/4) • Features Extraction • Something exploitable at both ASM and Source Code levels • E.g., function names int sum (int a, int b){ return a + b; • Types of Features } • Immediate Values (Constants) sum : push %ebp • Strings mov %esp,%ebp mov 0xc(%ebp),%eax • Functions Imports add 0x8(%ebp),%eax pop %ebp • Exports (By name, Ordinal) ret • Function Prototypes (Signatures) • Stack Frame Information (Offline Analysis) • Var., Ret. Values, Parameters, Arguments • Size, Number, Sequence • Register utilization
Methodology (3/4) • Processing Engine • Query Building for Code Search Engines • Encoding HTTP Requests • Query Filtering (Removing Special Chars) • Parsing and Information Extraction • Filenames and URLs • Pre-defined Regex Template • Online Analysis • Search Code Repositories for a close match • Specify programming languages as part of Request
Methodology (4/4) • Offline Analysis • Information about function prototypes: • Complement Online Analysis Results • Lower level analysis for each function • Function stack frame analysis • Dictionary of low-level system calls (Windows API) • A statement for describing the overall functionality • Return values, Number and size of arguments • Number and size of parameters and type information • Rank the results best of typing information • Output: ASM file with Comments, Analysis Report
Implementation (1/2) • Plug-in for IDA Pro • Execution Flow • Python 2.7.3, IDAPython 1.5.2, IDA Pro 6.1+
Implementation (2/2) • Example for query building: • Multiple search engine support • Interleaving algorithm (Optimizing Time) • The results are added as comments in the ASM file • for both Online and Offline analysis
Outline • Background • Motivation • Methodology • Case study • Conclusion
Case Study (1/2) • PreciseCalc Project • Open source project • Hosted on Sourceforge • Using the Koders seach engine • Several full matches found • Matches for mathematical functions
Case Study (2/2) • Malware Analysis • Low level APIs matching • Offline Analysis proves more useful • Gives insight into the potential code output • Screenshots • Example1: File I/O • Example2: Screen Capture • Example3: Network Connectivity • Example4: Loading Libraries • Example5: Services • Example6: Low-level Network
Example 1. File I/O
Example 2. Screen Capture
Example 3. Network Connectivity
Example 4. Loading Libraries
Example 5. Services
Example 6. Low-level Network Con.
Outline • Background • Motivation • Methodology • Case study • Conclusion
Conclusion • Improved the idea of Re-Google • Offline Analysis, Multiple Search Engines • Better results handling • Automated tool for reverse engineers • Malware Analysis • Limitation • Quality of output depends on the repositories • Currently optimized of C/C++ • Some features may not be always available • For validation, we need all source files (CFG)
Q&A • Thank you. • Q&A?
Recommend
More recommend