SAFE: Self Attentive Function Embedding for Binary Similarity Luca - PowerPoint PPT Presentation

SAFE: Self Attentive Function Embedding for Binary Similarity Luca Massarelli

PhD Student @ Sapienza University of Rome Who am I? Exploring how to leverage Artificial Intelligence to improve security!

Reverse Engineering is painful … Image Credit: G. A. Di Luna

Binary Similarity Problem

App ppli licatio ions • Vulnerability Detection • Library Function Identification • Malware Hunting

Existing Commercial IDA F.L.I.R.T. Solutions DIAPHORA

Not Scalable (BinDiff - Diaphora) Require an extact copy of the function (IDA F.L.I.R.T. - YARA) Analyst have to write rule (YARA) Mai ain Lim imit itatio ions

A few word about recompilation Easy to do! Effective

How to create new efficient and effective solutions?

Representation of words, sentences or documents using vector! EMBEDDINGS!! 𝐶𝐽𝑂𝐵𝑆𝑍 = 𝑤1 = [ 0.17 , 0. 19 , … , 0.21] 𝐶𝐽𝑂𝐵𝑆𝐽𝐹𝑇 = 𝑤2 = [ 0.16 , 0. 23 , … , 0.20] 𝑇𝐽𝑁 𝐶𝐽𝑂𝐵𝑆𝑍, 𝐶𝐽𝑂𝐵𝑆𝐽𝐹𝑇 = < 𝑤1, 𝑤2 > = 0.9 IDEA BORROWED FROM Natural Language Processing

• The embedding of each word is computed with an unsupervised Word2Vec Model algorithm that consider the context in od the word.

• Words relationship can be retrieved from the embeddings: 𝑛𝑏𝑜 ∶ 𝑥𝑝𝑛𝑓𝑜 = 𝑙𝑗𝑜𝑕 ∶ ? ? ? Word2Vec Model 𝑤2𝑥 𝑛𝑏𝑜 − 𝑤2𝑥 𝑙𝑗𝑜𝑕 + 𝑤2𝑥 𝑥𝑝𝑛𝑓𝑜 = 𝑥2𝑤(𝑟𝑣𝑓𝑓𝑜)

Word2Vec Model For ASM We can do the same with assembly code! 𝑞𝑣𝑡ℎ 𝑠𝑐𝑞 ∶ 𝑞𝑝𝑞 𝑠𝑐𝑞 = 𝑞𝑣𝑡ℎ 𝑠𝑏𝑦 ∶ ? ? ? pop rax

How we ag aggregate instruction embeddings to function embeddings?

Structured Self Attentive Model

The Full Pipeline

• This is easy!!! • We compile 11 different projects with different compilers and optimization! • … and we disassemble everithing! Creating the dataset

It works!! • AUC: • SAFE: 0.99 • I2v_attention: 0.96 • Gemini (MFE): 0.95 • We tested SAFE on different task!

Function Search Engine! • We tested SAFE as a function search engine! • We try to retrieve from a knowledge base similar function to the query!

Semantic Classification • We try to classify functions to 4 different semantic classes using embeddings! • Math • String • Encryption • Sorting

Semantic Classification (S) Sorting (E) Encryption Visualization (SM) String Manipulation (M) Math Embeddings are clustered in the space according to their semantic! classifier flagged classifier • flags confirmed files • fier flags confirmed find final files

IDENTIFICATION OF AN IDENTIFICATION OF A ENCRYPTION FUNCTION VULNERABLE FUNCTIONS INSIDE A MALWARE! INSIDE A FIRMWARE! Applications YARASAFE – USING SAFE INSIDE YARA

TeslaCrypt Ransomware • We disassemble the sample with IDA and we used our semantic classifier to analyze every function! • The Classifier founds seven functions that has encryption semantic! • 6 of them were effectively performing encryption!! Sample:3372c1edab46837f1e973164fa2d726c5c5e17bcb888828ccd7c4dfcc234a370 Detected Functions: 0x41e900, 0x420ec0, 0x4210a0,0x4212c0, 0x421665,0x421900, 0x4219c0

Function Detected At 0x41E900 SHA1 Constant

Possible improvent: Detecting Suspicious functionality inside a firmware

• We develop a tool: YARASAFE, to simplify this process! Spotting Vulnerability in COTS software

YARA-SAFE

import "safe" rule Heartbleed { condition: safe.similarity ("[0.094, …. , 0.0597]") > 0.97 } YARA-SAFE Rule

Rule - Creation

DEMO!!

Pape per Github hub

SAFE: Self Attentive Function Embedding for Binary Similarity Luca - PowerPoint PPT Presentation

SAFE: Self Attentive Function Embedding for Binary Similarity Luca Massarelli PhD Student @ Sapienza University of Rome Who am I? Exploring how to leverage Artificial Intelligence to improve security! Reverse Engineering is painful

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

SAFE: Self Attentive Function Embedding for binary similarity 16th Conference on Detection of

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

in Presence of Firewalls and Network Address Translation Knut Omang Ifi/Oracle 1 About Knut

Isotropic Intercategories Robert Par e (with Marco Grandis) Halifax August 2016 Robert

LIEF: Library to Instrument Executable Formats Table of Contents Introduction Project Overview

A Guide to Budgeted Tree Search Nathan R. Sturtevant University of Alberta Amii Fellow, CIFAR

Workplace Wellbeing Produced by the design firm Team Number: IIDA Huntsman Architectural Group,

Univalent categories and the Rezk completion Benedikt Ahrens 1 , Krzysztof Kapulkin 2 , Michael

Towards a realizability model of homotopy type theory Jonas Frey joint work (in progress) with

Arad Neamt 366 Bucharest 0 87 Zerind 151 Craiova 160 75 Dobreta 242 Iasi Eforie Arad