safe self attentive function embedding for binary
play

SAFE: Self Attentive Function Embedding for Binary Similarity Luca - PowerPoint PPT Presentation

SAFE: Self Attentive Function Embedding for Binary Similarity Luca Massarelli PhD Student @ Sapienza University of Rome Who am I? Exploring how to leverage Artificial Intelligence to improve security! Reverse Engineering is painful


  1. SAFE: Self Attentive Function Embedding for Binary Similarity Luca Massarelli

  2. PhD Student @ Sapienza University of Rome Who am I? Exploring how to leverage Artificial Intelligence to improve security!

  3. Reverse Engineering is painful … Image Credit: G. A. Di Luna

  4. Binary Similarity Problem

  5. App ppli licatio ions • Vulnerability Detection • Library Function Identification • Malware Hunting

  6. Existing Commercial IDA F.L.I.R.T. Solutions DIAPHORA

  7. Not Scalable (BinDiff - Diaphora) Require an extact copy of the function (IDA F.L.I.R.T. - YARA) Analyst have to write rule (YARA) Mai ain Lim imit itatio ions

  8. A few word about recompilation Easy to do! Effective

  9. How to create new efficient and effective solutions?

  10. Representation of words, sentences or documents using vector! EMBEDDINGS!! 𝐶𝐽𝑂𝐵𝑆𝑍 = 𝑤1 = [ 0.17 , 0. 19 , … , 0.21] 𝐶𝐽𝑂𝐵𝑆𝐽𝐹𝑇 = 𝑤2 = [ 0.16 , 0. 23 , … , 0.20] 𝑇𝐽𝑁 𝐶𝐽𝑂𝐵𝑆𝑍, 𝐶𝐽𝑂𝐵𝑆𝐽𝐹𝑇 = < 𝑤1, 𝑤2 > = 0.9 IDEA BORROWED FROM Natural Language Processing

  11. • The embedding of each word is computed with an unsupervised Word2Vec Model algorithm that consider the context in od the word.

  12. • Words relationship can be retrieved from the embeddings: 𝑛𝑏𝑜 ∶ 𝑥𝑝𝑛𝑓𝑜 = 𝑙𝑗𝑜𝑕 ∶ ? ? ? Word2Vec Model 𝑤2𝑥 𝑛𝑏𝑜 − 𝑤2𝑥 𝑙𝑗𝑜𝑕 + 𝑤2𝑥 𝑥𝑝𝑛𝑓𝑜 = 𝑥2𝑤(𝑟𝑣𝑓𝑓𝑜)

  13. Word2Vec Model For ASM We can do the same with assembly code! 𝑞𝑣𝑡ℎ 𝑠𝑐𝑞 ∶ 𝑞𝑝𝑞 𝑠𝑐𝑞 = 𝑞𝑣𝑡ℎ 𝑠𝑏𝑦 ∶ ? ? ? pop rax

  14. How we ag aggregate instruction embeddings to function embeddings?

  15. Structured Self Attentive Model

  16. The Full Pipeline

  17. • This is easy!!! • We compile 11 different projects with different compilers and optimization! • … and we disassemble everithing! Creating the dataset

  18. It works!! • AUC: • SAFE: 0.99 • I2v_attention: 0.96 • Gemini (MFE): 0.95 • We tested SAFE on different task!

  19. Function Search Engine! • We tested SAFE as a function search engine! • We try to retrieve from a knowledge base similar function to the query!

  20. Semantic Classification • We try to classify functions to 4 different semantic classes using embeddings! • Math • String • Encryption • Sorting

  21. Semantic Classification (S) Sorting (E) Encryption Visualization (SM) String Manipulation (M) Math Embeddings are clustered in the space according to their semantic! classifier flagged classifier • flags confirmed files • fier flags confirmed find final files

  22. IDENTIFICATION OF AN IDENTIFICATION OF A ENCRYPTION FUNCTION VULNERABLE FUNCTIONS INSIDE A MALWARE! INSIDE A FIRMWARE! Applications YARASAFE – USING SAFE INSIDE YARA

  23. TeslaCrypt Ransomware • We disassemble the sample with IDA and we used our semantic classifier to analyze every function! • The Classifier founds seven functions that has encryption semantic! • 6 of them were effectively performing encryption!! Sample:3372c1edab46837f1e973164fa2d726c5c5e17bcb888828ccd7c4dfcc234a370 Detected Functions: 0x41e900, 0x420ec0, 0x4210a0,0x4212c0, 0x421665,0x421900, 0x4219c0

  24. Function Detected At 0x41E900 SHA1 Constant

  25. Possible improvent: Detecting Suspicious functionality inside a firmware

  26. • We develop a tool: YARASAFE, to simplify this process! Spotting Vulnerability in COTS software

  27. YARA-SAFE

  28. import "safe" rule Heartbleed { condition: safe.similarity ("[0.094, …. , 0.0597]") > 0.97 } YARA-SAFE Rule

  29. Rule - Creation

  30. DEMO!!

  31. Pape per Github hub

Recommend


More recommend