mining malware secrets
play

Mining Malware Secrets Paul Black Arun Lakhotia Federation - PowerPoint PPT Presentation

Mining Malware Secrets Paul Black Arun Lakhotia Federation University University of Louisiana at Lafayette Introductions Paul Black Arun Lakhotia Malware Analyst 5 years Professor of Computer Science CEO, Cythereal,


  1. Mining Malware Secrets Paul Black Arun Lakhotia Federation University University of Louisiana at Lafayette

  2. Introductions Paul Black Arun Lakhotia • • Malware Analyst – 5 years Professor of Computer Science • • CEO, Cythereal, Inc. PhD Candidate, ICSL, • Federation Uni Malware Analysis Research: 12+ years • Masters thesis (2013): • Participant in DARPA Cyber – Decryption of Zeus Genome Configuration File • Other AFRL/ARO projects.

  3. Secrets Hidden in Malware • Decryption Keys • C2 Server(s) • DGA keys • Malware Version

  4. Example: Citadel - decrypting secret 0x11a23: push esi mov edx,56CH push edx Copy configuration buffer push 403AA8H push eax call 1443AH Mem Copy 0x11a35: mov ecx,dptr(4342FCH) add ecx,dptr(4346DCH) Access decryption key mov esi,edx sub ecx,eax 0x11a45: mov dl,bptr(esp+eax*1) xor bptr(eax),dl inc eax Decryption loop dec esi jnz 11A45H 0x11a4e: pop esi retn

  5. Example: Citadel - parameters 41C6EA 56 push esi Config Buffer Size 416CEB BA 6C 05 00 00 mov edx, 56Ch 41C6F0 52 push edx 41C6F1 68 A8 3A 40 00 push 403AA8H Config Buffer Start 41C6F6 50 push eax 41C6F7 E8 C5 C9 00 00 call 143AAH 41C6FC 8B 0D FC 42 43 00 mov ecx, 4342FCH Pointer Xor key 41C702 03 0D DC 46 43 00 add ecx, 4346DCH 41C708 8B F2 mov esi, edx 41C70A 2B C8 sub ecx, eax

  6. Example: Citadel – YARA Rule 41C6EA 56 Config Buffer size 416CEB BA 6C 05 00 00 41C6F0 52 41C6F1 68 A8 3A 40 00 Config Buffer Start 41C6F6 50 41C6F7 E8 C5 C9 00 00 41C6FC 8B 0D FC 42 43 00 Pointer Xor Key 41C702 03 0D DC 46 43 00 41C708 8B F2 41C70A 2B C8 56 BA [4] 52 68 [4] 50 E8 [4] 8B [5] 03 [5] 8B F2 2B C8

  7. Mining Malware for Fun & Profit • Tools • Automation Steps – Run malware – Debugger – Dump memory – Volatility – Run volatility plugin – IDA – * Locate code segment – YARA – * Extract secrets

  8. YARA: Yet Another Recursive Analyzer • Developer: Victor Alvarez, VirusTotal • Like grep over binaries • Regular expressions over binaries • Does not use program or file structure • File type agnostic • Easy to use and effective

  9. YARA - Disadvantages • • Silent failures Rules require exact match of bytes – Because sample is old – Strings, Data, Instructions – Might be unknown • Easily broken by small changes variant – Malware are frequently – Updated version of updated • known malware Rules fail silently – Requires manual verification – May not be the expected family

  10. Alternative: Use Semantics

  11. Recap: Requirement for Automation • Support the process – Step 1: Find relevant code in binary – Step 2: Extract parameters about secrets from code • Should be resistant to code changes – Ideally, one set of rules for versions AND variants of the same malware family

  12. Code vs Cod e vs Se Sema mantics ntics Code push ebp Semantics eax = def(ebp) mov ebp,esp ebp = -4+def(esp) sub esp,4 esp = -8+def(esp) mov eax, DWORD ebp+4 memdw(-8+def(esp))= def(ebp) mov DWORD ebp+8,eax mov eax, DWORD ebp memdw(-4+def(esp))= def(ebp) mov DWORD ebp-4,eax memdw(4+def(esp)) = def(memdw(def(esp))) Instruction dependent Instruction independent Order dependent Order independent

  13. Semantics Neutralizes push(esi) mov(esi,-1545600507) or(ecx,esi) Polymorphism pop(esi) push(edi) mov(edi,ebp) mov(ecx,edi) mov(ecx,ebp) pop(edi) Semantics sub(ecx,63) push(eax) mov(dptr(ecx+59),eax) mov(eax,63) pop(ecx) sub(ecx,eax) lea(eax,wptr(ebp-28)) push(edi) pop(eax) mov(edi,1148415812) mov(dptr(ecx+59),eax) pop(ecx) lea(eax,wptr(ebp-28)) push(edi) mov(edi,880280128) push(esi) mov(esi,268135684) add(edi,esi) pop(esi)

  14. push(edx) mov(dl,al) cmp(bptr(esi),al) cmp(bptr(esi),dl) pop(edx) mov(ebx,251658400) mov(ebx,1684957510) xor(ebx,1802398182) push(ecx) mov(cl,al) mov(bptr(edi),al) mov(bptr(edi),cl) Sensitive to behavior addition pop(ecx) mov(ecx,1342369920) mov(cl,0) mov(cl,69) sub(cl,69)] push(ebx) mov(bh,0) cmp(al,0) cmp(al,bh) pop(ebx)

  15. Revisit Example: Citadel 41C6EA 56 push esi Config Buffer Size 416CEB BA 6C 05 00 00 mov edx, 56Ch 41C6F0 52 push edx 41C6F1 68 A8 3A 40 00 push 403AA8H Config Buffer Start 41C6F6 50 push eax 41C6F7 E8 C5 C9 00 00 call Mem::copy 41C6FC 8B 0D FC 42 43 00 mov ecx, 4342FCH Pointer Xor key 41C702 03 0D DC 46 43 00 add ecx, 4346DCH 41C708 8B F2 mov esi, edx 41C70A 2B C8 sub ecx, eax 56 BA

  16. Example: Citadel – Semantics 41C6EA 56 push esi edx=0x56c 416CEB BA 6C 05 00 00 mov edx, 56Ch esp=-16+def(esp), 41C6F0 52 push edx memdw(-16+def(esp))=def(eax) 41C6F1 68 A8 3A 40 00 push 403AA8H memdw(-12+def(esp))=0x403aa8 41C6F6 50 push eax memdw(-8+def(esp))=0x56c 41C6F7 E8 C5 C9 00 00 call Mem::copy memdw(-4+def(esp))=def(esi) 41C6FC 8B 0D FC 42 43 00 mov ecx, 4342FCH 41C702 03 0D DC 46 43 00 add ecx, 4346DCH ecx=dptr(0x4342fc,def(ds)) 41C708 8B F2 mov esi, edx +dptr(0x4346dc,def(ds)) 41C70A 2B C8 sub ecx, eax -def(eax) esi=def(edx)

  17. Two versions of the same function 0x11a23: push esi 0x54fa: push esi mov edx,56CH mov edx,330H push edx push edx push 403AA8H push 2D80H push eax push eax call 1443AH call CB8CH 0x11a35: mov ecx,dptr(4342FCH) 0x550c: mov esi,dptr(20980H) add ecx,dptr(4346DCH) mov ecx,dptr(204d4H) add ecx,esi mov esi,edx sub ecx,eax sub ecx,eax mov esi,edx 0x11a45: mov dl,bptr(esp+eax*1) 0x551e: mov dl,bptr(esp+eax*1) xor bptr(eax),dl xor bptr(eax),dl inc eax inc eax dec esi dec esi jnz 11A45H jnz 551EH 0x11a4e: pop esi 0x5527: pop esi retn retn

  18. Semantics ‘similar’, not ‘same’ edx=0x56c edx=0x330 esp=-16+def(esp), esp=-16+def(esp), memdw(-16+def(esp))=def(eax) memdw(-16+def(esp))=def(eax) memdw(-12+def(esp))=0x403aa8 memdw(-12+def(esp))=0x2d80 memdw(-8+def(esp))=0x56c memdw(-8+def(esp))=0x330 memdw(-4+def(esp))=def(esi) memdw(-4+def(esp))=def(esi) ecx=dptr(0x4342fc,def(ds)) ecx=dptr(0x20980,def(ds)) +dptr(0x4346dc,def(ds)) +dptr(0x204d4,def(ds)) -def(eax) -def(eax) esi=def(edx) esi=def(edx)

  19. Matching code on ‘similar’ semantics • BinHunt • BinJuice • Strong Equivalence • Semantic ‘Similarity’ • Theorem Proving • Generalize semantics • Prove equivalence under • abstract registers, and register renaming constants • Accurate, glacially slow • Normalized string form • Thrown off by different • Match String constants/addresses • Fuzzy, but fast

  20. ‘Abstract’ semantics = Juice edx=A edx=A esp=-16+def(esp), esp=-16+def(esp), memdw(-16+def(esp))=def(eax) memdw(-16+def(esp))=def(eax) memdw(-12+def(esp))=B memdw(-12+def(esp))=B memdw(-8+def(esp))=C memdw(-8+def(esp))=C memdw(-4+def(esp))=def(esi) memdw(-4+def(esp))=def(esi) ecx=dptr(D,def(ds)) ecx=dptr(D,def(ds)) +dptr(E,def(ds)) +dptr(E,def(ds)) -def(eax) -def(eax) esi=def(edx) esi=def(edx)

  21. Advantage of Juice • Determine ‘similarity’ using ‘string match’. • Change the nature of the problem • Data mining, instead of pairwise comparison

  22. Mining Intelligence Using Semantics

  23. Mining Mining Malw Malwar are e in the L in the Lar arge ge ❶ Unpack ❷ Use juice for features ❶ ❷ ❸ ❹ ❸ Create indexes ❹ Search

  24. Cythereal MAGIC: Malware Genomic Correlation Google Cloud VM VM HYPERVISOR Unpack Extract Juice Index Cluster Classify Search

  25. Mining Malware Repository Step 1: – “Search” for functions semantically similar to an example Step 2: – Extract parameters from abstract state For every function found: - get semantics of its blocks - select blocks of interest - fetch values of memory/register from abstract state

  26. Example: Citadel parameters sha1[4] static_config config_size xor_offset version count 0c4d 0x401668 0x328 0x422a3c+0x422ee8 0x1020500 11 56f9 0x402638 0x388 0x4237f4+0x423ca0 0x1020600 20 ac52 0x4018c8 0x3b8 0x423adc+0x423f88 0x1020700 1 836a 0x401578 0x360 0x41a2e4+0x41a790 0x2000700 5 8a2f 0x402d80 0x330 0x4204d4+0x420980 0x2000700 5 8a7f 0x402b98 0x34c 0x422adc+0x422f88 0x2000807 1 70d1 0x401690 0x31c 0x422a64+0x422f10 0x2000809 1 7084 0x4018b8 0x2e8 0x422d7c+0x423228 0x2010001 1

  27. Pros and Cons Semantics • Very accurate • Shift brittleness • Use for both steps – Disassembly and – Find and Extract CFG • No silent failure • Resilient

  28. Takeaway • Malware analysis not all about signatures – Embedded secrets in malware can unlock defenses • Bytecode based tools are simple, but brittle • Theoretically rigorous tools realistic – Scalable; new way of looking at the problem

  29. Contacts Arun Lakhotia Paul Black University of Louisiana at Lafayette Federation University arun@Louisiana.edu p.black@federation.edu.au arun@cythereal.com

  30. Extra Slides

  31. Other YARA Like methods • Bytecode based: – IDA Flirt • Disassemble, and use instructions – Improvement – Still brittle

  32. Using Program Structure • Create Graphs – control flow graph, call graph – Bindiff, Malwise • Find relevant code No known implementation – Instructions – Graph structure • Extract parameters – Peek in code (Use Yara) – Byte/Instruction order • Better, but still brittle

Recommend


More recommend