Rendezvous: A search engine for binary code Wei Ming Khoo, Alan - PowerPoint PPT Presentation

Rendezvous: A search engine for binary code Wei Ming Khoo, Alan Mycroft, Ross Anderson University of Cambridge MSR 2013 19 May 2013 Demo: http://www.rendezvousalpha.com 1

To audit or not to audit You can’t trust code that you did not totally create yourself (Ken Thompson, 1984) • Engineering: Software quality - ‘CVE top 20’, bugtraq, App Store “Bouncers” - Diebold voting machines’ crypto (e.g. [Yasinsac’07]) • Legal: Software compliance - EU data protection directive 95/46/EC (2012) • Legal: Software 3rd-party licensing - GPL non-compliance: Apple (GNU Go in Appstore 2010) and Microsoft (Win7 USB/DVD download tool 2009) included 2

Software reverse engg. Software RE is sometimes necessary for audit • Source code not always available - Third-party sub-contractors, sub-sub-contractors, app store publishers • “What you see [in the source] is not what you execute” [Balakrishnan, Reps 2005] • Decompilers - Boomerang, REC Studio 4, Anatomizer, Andromeda, exetoc, desquirr - Current state-of-the-art: Hex-Rays, US$1,160 per license per year + expertise - 415 man-hours to decompile 1,500 LoC comprising 8% of code base [VanEmmerik’04] 3

But, code reuse is prevalent And increasingly so due to advances in software mining and SBSE • Catalysts include market competitiveness, application complexity, quality of reusable components [Schmidt’99, ’00, ’06] • Six open source projects: On average 74% of code base was external [Haefliger’08] • Sometimes illegally: > 250 products found GPL non- compliant, most famously Linksys WRT54G 4

Proposed solution Search-based reverse engineering (SBRE) “Google” it: Replace “How do we decompile?” with “Given a candidate decompilation, how good a match is it?” Same shift occurred for statistical machine translation 5

Take away slide • Software RE is tedious but sometimes necessary for audit • Code reuse is common in software • We propose reframing: software RE as a search problem , relying on existing software to obtain source code • Q: How can we do this in a way that is compiler-agnostic? 6

How we achieve this • Design trade-offs • Feature extraction • Indexing & Querying • Experimental results 7

Design space • We want features that can uniquely identify functions • We want speed + accuracy: We chose Speed first • Speed meant that we chose static over dynamic analysis (Assumption: no obfuscation) • We studied heuristic features from existing literature that can be extracted directly from a disassembly: - Instruction mnemonics - Control-flow sub-graphs - Data constants 8

Feature extraction Executable Disassemble Disassembly Tokenise Data Mnemonic Control-flow n-grams sub-graphs Constants Token-specific processing Alphabetic strings (Query terms) 9

Instruction mnemonics • Instruction mnemonic (textual) differs from an opcode (hex), e.g. 0x8b (load) and 0x89 (store) map to ‘ mov ’ • Assume a Markov property, n th token is influenced by the previous n - 1 tokens • Considered n = 1, 2, 3, 4 push, mov, push 0x73f973 XvxFGF 10

Control-flow k -graphs • k -graph is a connected sub-graph comprising k nodes, compute them all ( k = 3, 4, 5, 6, 7 ) • Convert to k -by- k matrix and compute its canonical form, rep as string (Nauty graph library) XvxFGF baNUAL 11

Constants • Empirical observation that data constants do not change with compiler or options • Considered 32 -bit integers and strings • Immediate operands, pointer offsets (excluding stack and frame pointer offsets) • Integer may be an address, do a lookup 12

Indexing & querying corpus alphabetic alphabetic strings strings 13

Results at a glance Combining features increases F 2 , implying independence 14

Conclusion • Software RE is tedious (but sometimes necessary) for audit • Code reuse is common in software • We propose reframing: software RE as a search problem • Able to achieve F 2 rates of 0.867 & 0.830 combining mnemonics, k -graphs and constants http://www.rendezvousalpha.com 15

Rendezvous: A search engine for binary code Wei Ming Khoo, Alan - PowerPoint PPT Presentation

Rendezvous: A search engine for binary code Wei Ming Khoo, Alan Mycroft, Ross Anderson University of Cambridge MSR 2013 19 May 2013 Demo: http://www.rendezvousalpha.com 1 To audit or not to audit You cant trust code that you did not

Design Aspects of HIP Design Aspects of HIP Rendezvous Mechanisms Rendezvous Mechanisms draft-

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Search Trees A binary search tree is a binary tree T such that - each internal node

Implementing CRSM Rendezvous Communication for a Distributed Robotic System B. Election Reddy

A Rendezvous-based Paradigm A Rendezvous-based Paradigm for Analysis of Solicited and for

Rendezvous-based Traffic Rendezvous-based Traffic Classification, Measurement, Classification,

This is a slide Group Philsophy Binary Search Tree (BST) A binary search tree 12 5 15 3

binary search trees Oct. 30, 2017 1 (binary search) tree binary (search tree) 2 class

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Multi-Target Rendezvous Search Malika Meghjani, Sandeep Manjanna and Gregory Dudek 2017. 05. 29

Open Source in M&A Transactions Ibrahim Haddad, Ph.D. Oskar Swirtun VP of R&D and Head

So#wareSupportforSo#wareIndependent Audi3ng GabrielleA.Gianelli,

Treasurer Club Officer Training Agenda Treasurer Treasurer Treasurer Role

Financial Repor ort Aug August 2020 t 2020 Lora Conger Chief Financial Officer To serve

HARDSPLOIT Framework for Hardware Security Audit a bridge between hardware & a so0ware

2 PuLSE-DSSA PuLSE-DSSA is based on the generic architecture development process shown in Figure

Telehealth use of telecommunication techniques for the purpose of providing telemedicine, medical

Constructed, Augmented MaxDiff Method and Case Study Chris Chapman , Principal Researcher,

Sambuz

Useful Links

Newsletter

Mail Us

Rendezvous: A search engine for binary code Wei Ming Khoo, Alan - PowerPoint PPT Presentation

Rendezvous: A search engine for binary code Wei Ming Khoo, Alan Mycroft, Ross Anderson University of Cambridge MSR 2013 19 May 2013 Demo: http://www.rendezvousalpha.com 1 To audit or not to audit You cant trust code that you did not

Design Aspects of HIP Design Aspects of HIP Rendezvous Mechanisms Rendezvous Mechanisms draft-

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Search Trees A binary search tree is a binary tree T such that - each internal node

Implementing CRSM Rendezvous Communication for a Distributed Robotic System B. Election Reddy

A Rendezvous-based Paradigm A Rendezvous-based Paradigm for Analysis of Solicited and for

Rendezvous-based Traffic Rendezvous-based Traffic Classification, Measurement, Classification,

This is a slide Group Philsophy Binary Search Tree (BST) A binary search tree 12 5 15 3

binary search trees Oct. 30, 2017 1 (binary search) tree binary (search tree) 2 class

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Multi-Target Rendezvous Search Malika Meghjani, Sandeep Manjanna and Gregory Dudek 2017. 05. 29

Open Source in M&amp;A Transactions Ibrahim Haddad, Ph.D. Oskar Swirtun VP of R&amp;D and Head

So#wareSupportforSo#wareIndependent Audi3ng GabrielleA.Gianelli,

Treasurer Club Officer Training Agenda Treasurer Treasurer Treasurer Role

Financial Repor ort Aug August 2020 t 2020 Lora Conger Chief Financial Officer To serve

HARDSPLOIT Framework for Hardware Security Audit a bridge between hardware &amp; a so0ware

2 PuLSE-DSSA PuLSE-DSSA is based on the generic architecture development process shown in Figure

Telehealth use of telecommunication techniques for the purpose of providing telemedicine, medical

Constructed, Augmented MaxDiff Method and Case Study Chris Chapman , Principal Researcher,

Sambuz

Useful Links

Newsletter

Mail Us

Open Source in M&A Transactions Ibrahim Haddad, Ph.D. Oskar Swirtun VP of R&D and Head

HARDSPLOIT Framework for Hardware Security Audit a bridge between hardware & a so0ware