Toward Automated Forensic Analysis of Obfuscated Malware Ryan J. - PowerPoint PPT Presentation

Toward Automated Forensic Analysis of Obfuscated Malware Ryan J. Farley George Mason University Department of Computer Science Committee: Xinyuan Wang, Hakan Aydin, Songqing Chen, Brian Mark Where Innovation Is Tradition April 24, 2015

Overview • The Need for Forensics • Forensics Problems and Our Contribution • Background • Problem Model • Challenges and Solutions • Empirical Evaluation • Conclusion 2

The Need for Forensics 3

Why? 1 Attacker infects host • Malware is a serious threat Vulnerable System Botmaster 2 Host becomes a bot and joins – Internet of [Insecure] Things 4 Botmaster sends IRC Botnet botnet commands to bots Bots – Stuxnet, Regin 3 Bots log in IRC Server – Christmas holiday tradition – Compromise is an eventuality 5 Bots send collected data to botmaster • Forensics seeks to understand the how – Embrace the ownage – Collect evidence, Analyze, Extrapolate • Enables us to build better defenses 4

Scenario: Vulnerable Web Server Memory Attack Vulnerable Defense v1.0 Process 5

Scenario: Exploit, What Now? Memory Detection Mechanism Attack Forensic Evidence Vulnerable Defense v1.0 Process Memory Dump 6

Scenario: What Now? • Upon first non-self system call      – Attack code fragments remain in  memory  • Packing, self-modification,  armoring  – Staged C2  • Can the fragments reveal clues? – Robust system needed to generically model execution 7

Scenario: Build Better Defense Forensic Evidence Memory Defense v2.0 Forensic Analysis Dump 8

Forensics Problems and Our Contribution 9

Problem • Need to automate forensic response upon detection in memory – Avoid substantial manual effort • Automatically recover malcode • Extract/unpack/recover attack code – Memory dump, transient artifacts Input Output Analysis Engine Static Memory Dump Attack String Dynamic Process Context Malicious Code Hybrid Registers Vulnerability Execution Trace Arbitration Log Files Obfuscation Removal Obfuscated Code Normalized Code 10

Problem • Human oversight is costly Attack Code • Trade-off between – Generic binary – Malware specific Heavyweight Lightweight Heavyweight Binary Malware Malware Generic Specific Specific • Need – Automated generic malware tool that Human Oversight approaches detail from Scope of generic binary tools Results 11

Motivation, Existing Tools • Only work within known boundaries – Typically exclude support for code fragments • e.g., shellcode – Things get messy without given boundaries • e.g., arbitrary byte streams • Do not generically handle: – Malformed, Misaligned – Obfuscated, Armored – Too specific or too abstracted 12

Solution: CodeXt • Discovers executable code within memory dump – Upon real-time detection DASOS Forensic Dump Vital Runtime Information Upon Detection Write Dump to Disk HDD 13

Solution: CodeXt • Extracts packed or obfuscated malcode – First to generically handle Incremental and Shikata-Ga-Nai Decoder3 w/ K3 Decoder3 w/ K3 Decoder3 w/ K3 Decoder3 w/ K3 Layer 1 decoded Transient code 1 Transient code 1 Decoder2 w/ K2 Encoded Layer 3 decoded Layer 3 decoded Layer 3 decoded Encoded code, data Encoded Transient code 2 code, data code, data Layer 2 decoded Layer 2 decoded Decoder1 w/ K1 Original memory First snapshot Second snapshot Third snapshot 14

Solution: CodeXt • Uses data-flow analysis (taint tracking) – Finds attack string within network traffic • Models both shellcode and full executables Run-time info CodeXt Report of the attack Symbolic Execution Recovered code Obfuscation info Run-time Offline Analysis Run-time Dynamic Binary Intermediate results memory dump analysis info Analysis • Framework built upon S2E – Selective means QEMU vs. KLEE (LLVM) 15

Background 16

Background • S2E, Selective Symbolic Execution – KLEE for symbolic – QEMU for concrete • We extended QEMU to detect system calls • KLEE – Expressive IR allows low level operations • Down to the bit – States = Shadow Memory + Constraints – Memory = Expressions • Even concrete values are expressions 17

Attack Code vs. Attack String • Attack string: % . /"*,*$#012)"*$,1'&1 – Crafted input to the process 34)-15'6- – May include non-code • Attack code: !"#$ !"#$%&'"(#)*'$ – Executed within process +$,*$- – May include immediate values (data) • Removing layers of obfuscation !7%8.1!7!7%88.191 % &' . – How many, and by what function? /:&;%<#)-612)"*$, – What about self-destructive code? 18

Framing the Problem • Assumptions – All malicious code exists within dump – Malicious code has not overwritten itself destructively • Requirements – No code semantics known – Coding conventions irrelevant – Capable of accuracy with self-modifying code – Capable of modeling network-based server applications 19

CodeXt Output • Instruction Trace of executed instructions – Grouping of fragments into chunks – Reveals original and unpacked malcode – Assisted by a translation trace • Data Trace of memory writes – Intelligent memory update clustering – Multi-layer snapshots • Call Trace of system calls – With CPU context 20

Data-flow Analysis Output • For each labeled byte – Follow propagration – Generate trace – Generate memory map • Add events that qualify as success – EIP contains tainted values 21

Problems + Challenges + Solutions 22

Handling Byte Streams • S2E expects well structured binaries – We wrap the binary for execution Info Host to Guest Wrapper Buffer File Transfer Guest OS CodeXt S2E Plugin Output S2E (Modified QEMU) • S2E uses basic block granularity – Our modified QEMU translation returns more info – We leverage translation and execution hooks to verify 23

Code Fragments S2E ( , offset 1 ) ( ) . . . . S2E Fragments Match . . S2E ( , offset n ) • Fragmentation – Clustering into Chunks, adjacency, execution trace • Density – Usage: Executed/Range – Overlay: Unique executed/Range over snapshots • Enclosure – Continuous executable bytes adjacent to end 24

Defeating Obfuscation • FPU instructions, fnstenv – Added small change to QEMU to comply • Intra-basic block self-modification – We know address range of each translated block – During execution we track writes – If any write is to same block we retranslate block • Emulator detection – Tested for a set of obscure instructions used as canaries 25

Multipath, Arbitrary Bytes • Multipath Execution – Existing trace tool manages path merging – KLEE manages state forking and resources • Mark Arbitrary Bytes as Symbolic Vulnerable Process Memory Network Traffic from Labeled Network Input Attacker Attack String Vulnerable a b c d e Vulnerable a b c d e Process Process 26

Executing Symbolic Code • Taint labels can be search upon events – KLEE prefers constraints over solving • Constraint cleanup – Silent concretization Exploited Process Memory Executed Segments After Decoding With Labels a e Vulnerable Vulnerable Analysis a b c d e b c Process Process b c c c c d b c c c c 27

Executing Symbolic Code, con’t • Data-flow validity, intermingled code • Symbolic EIP • Periodic or triggered custom simplifier • Inheritance enforcer • Bit-wise and mov Exploited Process Memory Executed Segments After Decoding With Labels a e Vulnerable Vulnerable Analysis a b c d e b c Process Process b c c c c d b c c c c 28

Executable Modeling • OS introspection – Snag CR3 as PID • Load and link overhead – 95,000 instructions to ignore – Canary • Real-time attacks – Buffer overflow – Sockets – SSL 29

Empirical Evaluation 30

Experiments, Part 1 • Hidden code search – 1KB to 100KB buffers, 40B to 80B shellcodes – Filled with either null, live-capture, or random bytes – Varied assistance data: EIP, EAX, both, neither • Accuracy – De-obfuscation, Anti-emulation detection – Various packers mentioned in previous research – In-shop: Junk code insertion, Ranged xor, Incremental • Symbolic Branching 31

Multi-Layered Encoders 0 5 10 15 20 25 30 35 40 xor_key1 xor_key2 of xor_key1 xor_key2 junk inserted bytes 32

Toward Automated Forensic Analysis of Obfuscated Malware Ryan J. - PowerPoint PPT Presentation

Toward Automated Forensic Analysis of Obfuscated Malware Ryan J. Farley George Mason University Department of Computer Science Committee: Xinyuan Wang, Hakan Aydin, Songqing Chen, Brian Mark Where Innovation Is Tradition April 24, 2015

Forensic Challenge V2.0 UNAM-CERT RedIRIS Topics * Forensic Challenge V1.0 * Forensic

Forensic Science Center Forensic Science Center -10 Budget 10 Budget FY 09- FY 09 Forensic

Specialized Topics in Ethical Forensic Practice, Part 3: Bias in Forensic Evaluations November 18,

Forensic Mental Health Care in the Texas State Hospital System Matthew Faubion, M.D. Forensic

THE NEW FORENSIC PATIENT Learning Objectives Review the epidemiology of forensic populations

Regional Forensic Trainings 2013 Pathways to Conditional Release: An Overview of the Forensic

qPCR in forensic DNA analysis Johannes Hedman Researcher, Applied Microbiology, Lund University

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

Drugs in Oral Fluid AS4760 Olaf H. Drummer December 9, 2013 DEPARTMENT OF FORENSIC MEDICINE

CS CSI: I: DUND DUNDEE EE Th The e Fo Fore rensic nsic To Tool olkit kit Meet the

Challenges in Crime Scene Investigation Technical challenges in forensic STR profiling

Forensic Ballistics In Court Interpretation And Presentation Of Firearms Evidence Forensic

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Current Forensic DNA Typing o Forensic cases -- matching suspect with evidence Involves generation

GOJ Audit Commission Conference 2016 PRESENTS : FORENSIC Forensic Audits-Help for Todays

Presentation of Forensic Science Evidence Dr. Ran B. Singh, Forensic Science Laboratory, Lucknow

Perl Memory Use Tim Bunce @ OSCON July 2012 1 Scope of the talk... Not really

Linux Systems Compromised Understanding and dealing with break-ins Michael Boelen

Search for the Memory Duplicities in the Java Applications Using Shallow and Deep Object

DESIGN CONSIDERATIONS FOR SHAPE MEMORY POLYMER COMPOSITES WITH MAGNETO-SENSITIVE PARTICLES P. H.

LTTng & Tools Roadmap LTTng & Tools Roadmap Content LTTng new and upcoming

Elementary Data Structures Biostatistics 615/815 Lecture 6: . . 1 / 31 . SortedArray Array

Kokkos Hierarchical Task-Data Parallelism Photos placed in horizontal position with even amount

Stato produzone e tests FEI4 Roberto Beccherle FEI4 ~6 ;mes size of FEI3 Pixel

Toward Automated Forensic Analysis of Obfuscated Malware Ryan J. - PowerPoint PPT Presentation

Toward Automated Forensic Analysis of Obfuscated Malware Ryan J. Farley George Mason University Department of Computer Science Committee: Xinyuan Wang, Hakan Aydin, Songqing Chen, Brian Mark Where Innovation Is Tradition April 24, 2015

Forensic Challenge V2.0 UNAM-CERT RedIRIS Topics * Forensic Challenge V1.0 * Forensic

Forensic Science Center Forensic Science Center -10 Budget 10 Budget FY 09- FY 09 Forensic

Specialized Topics in Ethical Forensic Practice, Part 3: Bias in Forensic Evaluations November 18,

Forensic Mental Health Care in the Texas State Hospital System Matthew Faubion, M.D. Forensic

THE NEW FORENSIC PATIENT Learning Objectives Review the epidemiology of forensic populations

Regional Forensic Trainings 2013 Pathways to Conditional Release: An Overview of the Forensic

qPCR in forensic DNA analysis Johannes Hedman Researcher, Applied Microbiology, Lund University

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

Drugs in Oral Fluid AS4760 Olaf H. Drummer December 9, 2013 DEPARTMENT OF FORENSIC MEDICINE

CS CSI: I: DUND DUNDEE EE Th The e Fo Fore rensic nsic To Tool olkit kit Meet the

Challenges in Crime Scene Investigation Technical challenges in forensic STR profiling

Forensic Ballistics In Court Interpretation And Presentation Of Firearms Evidence Forensic

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Current Forensic DNA Typing o Forensic cases -- matching suspect with evidence Involves generation

GOJ Audit Commission Conference 2016 PRESENTS : FORENSIC Forensic Audits-Help for Todays

Presentation of Forensic Science Evidence Dr. Ran B. Singh, Forensic Science Laboratory, Lucknow

Perl Memory Use Tim Bunce @ OSCON July 2012 1 Scope of the talk... Not really

Linux Systems Compromised Understanding and dealing with break-ins Michael Boelen

Search for the Memory Duplicities in the Java Applications Using Shallow and Deep Object

DESIGN CONSIDERATIONS FOR SHAPE MEMORY POLYMER COMPOSITES WITH MAGNETO-SENSITIVE PARTICLES P. H.

LTTng &amp; Tools Roadmap LTTng &amp; Tools Roadmap Content LTTng new and upcoming

Elementary Data Structures Biostatistics 615/815 Lecture 6: . . 1 / 31 . SortedArray Array

Kokkos Hierarchical Task-Data Parallelism Photos placed in horizontal position with even amount

Stato produzone e tests FEI4 Roberto Beccherle FEI4 ~6 ;mes size of FEI3 Pixel

LTTng & Tools Roadmap LTTng & Tools Roadmap Content LTTng new and upcoming