safe self attentive function embedding for binary
play

SAFE: Self Attentive Function Embedding for binary similarity 16th - PowerPoint PPT Presentation

Introduction State of the art Solution overview Evaluation SAFE: Self Attentive Function Embedding for binary similarity 16th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2019) Luca Massarelli 1 ,


  1. Introduction State of the art Solution overview Evaluation SAFE: Self Attentive Function Embedding for binary similarity 16th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2019) Luca Massarelli 1 , Giuseppe Antonio Di Luna 2 , Fabio Petroni 3 , Roberto Baldoni 1 , Leonardo Querzoni 1 , 1 University of Rome ”La Sapienza” 2 CINI, National Laboratory of Cyber Security 3 Facebook AI Research Gothenburg, June 20, 2019 Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 1 / 25

  2. Introduction State of the art Solution overview Evaluation A world of interconnected devices Intelligent devices enable new ”smart” production processes. More and more organization rely on them every day. Organizations do not develop their own devices, but mostly rely on commercial ones. Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 2 / 25

  3. Introduction State of the art Solution overview Evaluation The dark side COTS a devices are provided as Black-Box with no access to their firmware’s source code. While improving production processes, organizations have to trust devices manufacturers for assessing the absence of vulnerabilities or backdoors. a Commercial Off-the-Shelf Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 3 / 25

  4. Introduction State of the art Solution overview Evaluation “Trust is good, control is even better.” Even for a COTS device it is still possible to analyze its binary firmware, but this process is time consuming and requires skilled personnel. There is a strong need of new tools that enable more efficient analysis of binary code. Natural Language Processing (NLP) techniques has proved to be powerful when applied to binary code. Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 4 / 25

  5. Introduction State of the art Solution overview Evaluation Embeddings A common approach in NLP is to associate to an entity (e.g. a word, a sentence, a whole text ...) an embedding vector, i.e. a fixed size vector of real numbers that contains information on the entity it represents. Defining a relation between entities we can build a model that can be able to represent entities with embedding preserving the chosen relation. � � ” binary ” − 0 . 846 0 . 332 0 . 954 → ... � � ” binaries ” − 0 . 844 0 . 334 0 . 984 → ... Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 5 / 25

  6. Introduction State of the art Solution overview Evaluation Similarity Definition Definition: Similar Functions Two binary functions are considered similar if they have been compiled from the same source code but possibly using different compilers, different optimizations and/or for different platforms. Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 6 / 25

  7. Introduction State of the art Solution overview Evaluation Similarity Definition Definition: Similar Functions Two binary functions are considered similar if they have been compiled from the same source code but possibly using different compilers, different optimizations and/or for different platforms. How to compute similarity-preserving embeddings for a binary function? Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 6 / 25

  8. Introduction State of the art Solution overview Evaluation Related Work No-Embeddings Embeddings Cross platform Bindiff Genius [CCS-16] [STICC-05] Strand [PLDI-16] Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 7 / 25

  9. Introduction State of the art Solution overview Evaluation Genius GENIUS by 1 showed that a binary function can be represented with a similarity-preserving vector. That is, given two similar functions, their embedding vectors should be similar in terms of cosine similarity. Computing the cosine similarity of two vectors is extremely faster than comparing two graphs. The Binary-Similarity problem has been reduced to the computation of similarity-preserving function embeddings. 1 Q. Feng, et al. Scalable graph-based bug search for firmware images. In CCS, 2016 Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 8 / 25

  10. Introduction State of the art Solution overview Evaluation Related Work No-Embeddings Embeddings Cross platform Bindiff Genius Gemini [CCS-16] [CCS-17] [STICC-05] Strand [PLDI-16] Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 9 / 25

  11. Introduction State of the art Solution overview Evaluation Gemini GEMINI by Xiaojun et al. 2 proposes a graph embedding deep neural network 3 to produce an embedding vector of the annotated control flow graph (ACFG) of a function. Learned Parameters Features Extraction Embedding Model CFG ACFG f = (3 . 12 , . . . , 5 . 31) Addr_1: mov eax,10 x 1 = (1 . 3 , . . . , 3 . 1) Basic Block Structure2vec Annotation Addr_2: dec eax x 2 = (3 . 3 , . . . , 1 . 1) Addr_3: mov [base+eax],0 Addr_4: jnz Addr_2 Addr_5: mov eax,ebx x 3 = (5 . 1 , . . . , 1 . 2) ~ 2 Xiaojun Xu, et al. Neural network-based graph embedding for cross-platform binary code similarity detection. In CCS, 2017. 3 Dai Hanjun, et al. Discriminative embeddings of latent variable models for structured data. In ICML 2016. Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 10 / 25

  12. Introduction State of the art Solution overview Evaluation Related Work No-Embeddings Embeddings Cross platform Unsupervised Bindiff Genius Gemini feature learning [CCS-16] [CCS-17] [STICC-05] [BAR-19] Stripped Binaries Strand [PLDI-16] Single Inner eye platform [NDSS-18] Asm2Vec Solves a [SP-19] subproblem Massarelli, Di Luna, Petroni, Querzoni, Baldoni SAFE: Self Attentive Function Embedding 11 / 25

Recommend


More recommend