Malware Analysis – Connecting Variants and Versions Arun Lakhotia University of Louisiana at Lafayette 1 ISSISP 2014 (C) Lakhotia 7/19/2017
Demo 2 ISSISP 2014 (C) Lakhotia 7/19/2017
MAGIC Connect – Summary FOLLOW THS LINK: http://www.virustotal.com/en/arunlakhotia 3 ISSISP 2014 (C) Lakhotia 7/19/2017
MAGIC Connect: Full report FOLLOW THIS LINK: http://beta.magic.cythereal.com/report/1f1f560c29db6a61b05212eea0e3c68de0b9d61e 4 ISSISP 2014 (C) Lakhotia 7/19/2017
MAGIC Report via API https://api.magic.cythereal.com/magic/1cf646f9fa78a5c253 647dd9220d0502/ff9790d7902fea4c910b182f6e0b00221a 40d616/ 5 ISSISP 2014 (C) Lakhotia 7/19/2017
Find Matching Procedures (via API) https://api.magic.cythereal.com/search/procs/1cf646f9fa78 a5c253647dd9220d0502/ff9790d7902fea4c910b182f6e0b0 0221a40d616/0x1000 6 ISSISP 2014 (C) Lakhotia 7/19/2017
MAGIC Features, via API https://api.magic.cythereal.com/show/proc/1cf646f9fa78a5 c253647dd9220d0502/ff9790d7902fea4c910b182f6e0b00 221a40d616/0x1000 7 ISSISP 2014 (C) Lakhotia 7/19/2017
API Documentation https://api.magic.cythereal.com/docs http://docs.cythereal.com Other links: http://www.virustotal.com/en/arunlakhotia http://beta.magic.cythereal.com/ 8 ISSISP 2014 (C) Lakhotia 7/19/2017
Cythereal MAGIC API Key T emporary API Key for ISSISP 1cf646f9fa78a5c253647dd9220d0502 T o get own key: Visit https://api.magic.cythereal.com/docs/ Look for “Register” Click on “Try It Out” Fill form, and “Execute” 9 ISSISP 2014 (C) Lakhotia 7/19/2017
Problem Definition 10 ISSISP 2014 (C) Lakhotia 7/19/2017
Malware (software) Generative Process Source Sharing Compile Source Binary Morph Edit Morph Bugfix Pack Translate Generate 11 ISSISP 2014 (C) Lakhotia 7/19/2017
Problem Given a collection of malware, consisting of VERSIONS and VARIANTS: find malware similar to a given file find functions (disassembled) similar to a given 12 ISSISP 2014 (C) Lakhotia 7/19/2017
Challenge: “Undo” Metamorphism push ecx mov ecx, [ebp + 10] push ecx mov ecx, ebp mov ecx, ebp push eax push eax add eax, 2342 push ecx mov eax, 33 mov eax, 33 mov ecx,ebp add ecx, eax add ecx, eax push ecx add ecx,33 pop eax pop eax mov ecx,ebp push esi mov eax, esi mov [ebp - 3], eax add ecx,33 mov esi,ecx push esi push eax mov [ecx-36],eax sub esi,34 mov esi, ecx mov esi, ecx pop ecx mov [esi-2],eax push edx push edx pop esi xor edx, 778f pop ecx mov edx, 34 mov edx, 34 sub esi, edx sub esi, edx pop edx pop edx mov [esi - 2], eax mov [esi-2], eax pop esi pop esi pop ecx pop ecx 13 ISSISP 2014 (C) Lakhotia 7/19/2017
Challenge: Similar Binaries Symantec McAfee W32.NetSky.A W32/NetSky.A W32.NetSky.B W32/NetSky.B W32.NetSky.D W32/Bugbear.17916intd W32.Beagle.A@mm W32/Bagle.a@mm W32.Beagle.J@mm W32/Bagle.j@mm ?? W32.Beagle.AO@mm W32/Bagle.aq@mm W32.Beagle.U@mm W32/Bagle.u@mm ?? W32.Klez.E@mm.enc W32/Klez.e@MM W32.Klez.F@mm W32/Klez.f@MM W32.Klez.I@mm W32/Klez.i@MM 14 ISSISP 2014 (C) Lakhotia 7/19/2017
Information Retrieval 15 ISSISP 2014 (C) Lakhotia 7/19/2017
Info Retrieval: Use Case - I Nearest Match (Unsupervised) Document Collection Matching Document 0.90 0.82 IRS 0.76 New Document 0.30 16 ISSISP 2014 (C) Lakhotia 7/19/2017
Info Retrieval: Use Case - 2 Partition Collection (Unsupervised) Document Collection IRS Document Families 17 ISSISP 2014 (C) Lakhotia 7/19/2017
Info Retrieval: Use Case - 3 Match Label (Supervised) Document Families Assign Label IRS 0.90 New Document 18 ISSISP 2014 (C) Lakhotia 7/19/2017
Step 1: Model ‘Documents’ Bag of features model 1. Define a method to identify “features” Example: k-consecutive words 2. Make a bag of features Have you wondered When is a rose a rose? Have you wondered You wondered when Wondered when rose When rose rose 19 ISSISP 2014 (C) Lakhotia 7/19/2017
Step 2: Define Similarity Function B A Forest Three Wolf Wolf Coat Blow House House Grandma Pigs Red Red Girl Similarity(A,B) = | A B | / | A B| = 3 / 10 = 0.3 20 ISSISP 2014 (C) Lakhotia 7/19/2017
Alternate: Vector Space Model Vector Space: Ordered list of ALL of the words in ALL of the documents: Blow x Coat x Forest x Girl x Grandma x House x Pigs x Red x Three x Wolf Vector: A Boolean vector representing presence/absence of a word A B [0, 1, 1, 1, 1, 1, 0, 1, 0, 1] [1, 0, 0, 0, 0, 1, 1, 1, 1, 1] Distance: Euclidian Distance between two points. Benefits: Can use vector processors (Nvidia, Google Tensorflow) Cons: Very, very large vectors 21 ISSISP 2014 (C) Lakhotia 7/19/2017
Step 3: Choose/create algorithm Supervised Learning Semi-supervised Neural Networks Use some labels to seed clusters Bayesian Statistics Inductive Learning Support Vector Machines Regression Unsupervised Learning K-Means Clustering Hierarchical Clustering K-Nearest Neighbor 22 ISSISP 2014 (C) Lakhotia 7/19/2017
Modeling Malware as Documents 23 ISSISP 2014 (C) Lakhotia 7/19/2017
Modeling Malware as Documents Create a bag of features of binaries such that `similar’ programs have `similar’ bags Similar programs: Related through code evolution New capability, bug fixes Code reuse, shared libraries, shared strategies Stealth – deliberate attempt to hide similarity 24 ISSISP 2014 (C) Lakhotia 7/19/2017
Malware Document: Byte N-gram Word = N-Bytes (380091df) (0091df96) (91df96f6) (df96f633) 25 ISSISP 2014 (C) Lakhotia 7/19/2017
Malware Document: Abstracted Bytes Disassemble Zap Address bytes Word = N-Bytes of Abstracted Bytecode 26 ISSISP 2014 (C) Lakhotia 7/19/2017
Malware Document: Mnemonics Disassemble Word = N-mnemonic (je push) (push mov) (mov pop) (pop xor) Variation: N-perm 27 ISSISP 2014 (C) Lakhotia 7/19/2017
Malware Document: using semantics Binary Disassembly CFG Abstracted Bytecode Abstracted Disassembly Word = Block Semantics Juice 28 ISSISP 2014 (C) Lakhotia 7/19/2017
Code to Semantics • Sequential • Parallel Code Semantics • Focus on operations • Captures affect eax = def(ebp) push ebp ebp = -4+def(esp) mov ebp,esp esp = -8+def(esp) sub esp,4 memdw(-8+def(esp))= def(ebp) mov eax, DWORD ebp+4 memdw(-4+def(esp))= def(ebp) mov DWORD ebp+8,eax memdw(4+def(esp)) = def(memdw(def(esp))) mov eax, DWORD ebp mov DWORD ebp-4,eax 29 ISSISP 2014 (C) Lakhotia 7/19/2017
Concrete Semantics Interpret State State Instruction ax = 10 ax = 30 Interpret bx = 20 bx = 20 cx = 30 cx = 30 … … add ax, bx M[4000] = 50045 M[4000] = 50045 M[4004] = 20 M[4004] = 20 M[4008] = 30 M[4008] = 30 … … 30 7/19/2017 ISSISP 2014 (C) Lakhotia
Symbolic Semantics Sym Interpret SymState SymState Instruction ax = def(ax) ax = def(ax)+20 Sym Interpret bx = 20 bx = 20 cx = def(cx) cx = def(cx) … … add ax, bx M[4000] = def(cx) M[4000] = def(cx) M[4004] = 5005 M[4004] = 5005 M[4008] = def(4008) M[4008] = def(4008) … … 31 7/19/2017 ISSISP 2014 (C) Lakhotia
Symbolic Semantics: Formal Sketch Interpret : seq(Instruction) -> State -> State where : State = LValue -> RValue LValue = Register + Mem RValue = Number + def(RValue) Previous state + RValue op Rvalue + op RValue Unsimplified 32 ISSISP 2014 (C) Lakhotia 7/19/2017
Algebraic Simplification Num op Num => Num Evaluate op Num => Num Expr + Num => Num + Expr Commute Expr * Num => Num * Expr Exp1 * (Exp2 + Exp3) => Exp1 * Exp2 + Exp1 * Exp3 Distribute Exp1 shift-right Num => Exp1 * 2 ^ Num Equivalent 33 ISSISP 2014 (C) Lakhotia 7/19/2017
Semantic matches push(esi) mov(esi,-1545600507) or(ecx,esi) pop(esi) push(edi) mov(edi,ebp) mov(ecx,ebp) mov(ecx,edi) sub(ecx,63) pop(edi) mov(dptr(ecx+59),eax) pop(ecx) push(eax) lea(eax,wptr(ebp-28)) mov(eax,63) push(edi) sub(ecx,eax) mov(edi,1148415812) pop(eax) mov(dptr(ecx+59),eax) pop(ecx) lea(eax,wptr(ebp-28)) push(edi) mov(edi,880280128) push(esi) mov(esi,268135684) add(edi,esi) pop(esi) 34 ISSISP 2014 (C) Lakhotia 7/19/2017
Semantic matches push(edx) mov(dl,al) cmp(bptr(esi),al) cmp(bptr(esi),dl) pop(edx) mov(ebx,251658400) mov(ebx,1684957510) xor(ebx,1802398182) push(ecx) mov(cl,al) mov(bptr(edi),al) mov(bptr(edi),cl) pop(ecx) mov(ecx,1342369920) mov(cl,0) mov(cl,69) sub(cl,69)] push(ebx) mov(bh,0) cmp(al,0) cmp(al,bh) pop(ebx) 35 ISSISP 2014 (C) Lakhotia 7/19/2017
Semantics to Word memdw(-4+def(esp))= def(ebp) esp = -8+def(esp) ebp = -4+def(esp) eax = def(ebp) memdw(-8+def(esp))= def(ebp) memdw(-4+def(esp))= def(ebp) eax = def(ebp) memdw(4+def(esp)) = 20 + def(eax) memdw(4+def(esp)) = def(eax) + 20 memdw(-8+def(esp))= def(ebp) esp = -8+def(esp ebp = -4+def(esp) SORT eax = def(ebp) eax = def(ebp) ebp = -4+def(esp) ebp = -4+def(esp) esp = -8+def(esp) esp = -8+def(esp) memdw(-8+def(esp))= def(ebp) memdw(-8+def(esp))= def(ebp) memdw(-4+def(esp))= def(ebp) memdw(-4+def(esp))= def(ebp) memdw(4+def(esp)) = def(eax) + 20 memdw(4+def(esp)) = def(eax) + 20 HASH 0da5678afdgfh732 0da5678afdgfh732 36 ISSISP 2014 (C) Lakhotia 7/19/2017
Recommend
More recommend