RetDec: An Open-Source Machine-Code Decompiler Jakub K roustek - - PowerPoint PPT Presentation

retdec an open source machine code decompiler
SMART_READER_LITE
LIVE PREVIEW

RetDec: An Open-Source Machine-Code Decompiler Jakub K roustek - - PowerPoint PPT Presentation

RetDec: An Open-Source Machine-Code Decompiler Jakub K roustek Peter Matula Petr Zemek Threat Labs Botconf 2017 1 / 51 > whoarewe Jakub K roustek founder of RetDec Threat Labs lead @Avast (previously @AVG) reverse


slide-1
SLIDE 1

RetDec: An Open-Source Machine-Code Decompiler

Jakub Kˇ roustek Peter Matula Petr Zemek Threat Labs

Botconf 2017 1 / 51

slide-2
SLIDE 2

> whoarewe

♂ Jakub Kˇ roustek

  • founder of RetDec
  • Threat Labs lead @Avast (previously @AVG)
  • reverse engineer, malware hunter, security researcher
  • @JakubKroustek
  • jakub.kroustek@avast.com

Botconf 2017 2 / 51

slide-3
SLIDE 3

> whoarewe

♂ Jakub Kˇ roustek

  • founder of RetDec
  • Threat Labs lead @Avast (previously @AVG)
  • reverse engineer, malware hunter, security researcher
  • @JakubKroustek
  • jakub.kroustek@avast.com

♂ Peter Matula

  • main developer of the RetDec decompiler
  • senior developer @Avast (previously @AVG)
  • ♥ rock climbing and
  • peter.matula@avast.com

Botconf 2017 2 / 51

slide-4
SLIDE 4

Quiz Time

Botconf 2017 3 / 51

slide-5
SLIDE 5

Quiz Time

Botconf 2017 4 / 51

slide-6
SLIDE 6

Quiz Time

Botconf 2017 5 / 51

slide-7
SLIDE 7

Quiz Time

Botconf 2017 6 / 51

slide-8
SLIDE 8

Disassembling vs. Decompilation

Botconf 2017 7 / 51

slide-9
SLIDE 9

Decompilation? What is it?

Botconf 2017 8 / 51

slide-10
SLIDE 10

Decompilation? What good is it?

Binary analysis

  • reverse engineering
  • malware analysis
  • vulnerability detection
  • verification
  • binary comparison
  • . . .

Botconf 2017 9 / 51

slide-11
SLIDE 11

Decompilation? What good is it?

Binary analysis

  • reverse engineering
  • malware analysis
  • vulnerability detection
  • verification
  • binary comparison
  • . . .

Binary recompilation (yeah, like that’s ever gonna work)

  • porting
  • bug fixing
  • adding new features
  • original sources got lost
  • optimizations

Botconf 2017 9 / 51

slide-12
SLIDE 12

Ok, why aren’t we already using it?

  • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc.

Botconf 2017 10 / 51

slide-13
SLIDE 13

Ok, why aren’t we already using it?

  • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc.
  • It is damn hard

Botconf 2017 10 / 51

slide-14
SLIDE 14

Ok, why aren’t we already using it?

  • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc.
  • It is damn hard
  • compilation is not lossless
  • high-level constructions
  • data types
  • names
  • comments, macros, . . .

Botconf 2017 10 / 51

slide-15
SLIDE 15

Ok, why aren’t we already using it?

  • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc.
  • It is damn hard
  • compilation is not lossless
  • high-level constructions
  • data types
  • names
  • comments, macros, . . .
  • compilers are optimizing

Botconf 2017 10 / 51

slide-16
SLIDE 16

Ok, why aren’t we already using it?

  • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc.
  • It is damn hard
  • compilation is not lossless
  • high-level constructions
  • data types
  • names
  • comments, macros, . . .
  • compilers are optimizing
  • computer science goodies
  • undecidable problems
  • complex algorithms
  • exponential complexities

Botconf 2017 10 / 51

slide-17
SLIDE 17

Ok, why aren’t we already using it?

  • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc.
  • It is damn hard
  • compilation is not lossless
  • high-level constructions
  • data types
  • names
  • comments, macros, . . .
  • compilers are optimizing
  • computer science goodies
  • undecidable problems
  • complex algorithms
  • exponential complexities
  • obfuscation, packing, anti-debugging

Botconf 2017 10 / 51

slide-18
SLIDE 18

Generic decompilation? Even harder

  • Many architectures
  • x86, ARM, MIPS, PowerPC, . . .
  • CISC vs. RISC
  • bit length, endianness, floating points
  • versions & extensions

Botconf 2017 11 / 51

slide-19
SLIDE 19

Generic decompilation? Even harder

  • Many architectures
  • x86, ARM, MIPS, PowerPC, . . .
  • CISC vs. RISC
  • bit length, endianness, floating points
  • versions & extensions
  • Many ABIs

Botconf 2017 11 / 51

slide-20
SLIDE 20

Generic decompilation? Even harder

  • Many architectures
  • x86, ARM, MIPS, PowerPC, . . .
  • CISC vs. RISC
  • bit length, endianness, floating points
  • versions & extensions
  • Many ABIs
  • Many OFFs (object-file formats)
  • ELF

, PE,  Mach-O, . . .

Botconf 2017 11 / 51

slide-21
SLIDE 21

Generic decompilation? Even harder

  • Many architectures
  • x86, ARM, MIPS, PowerPC, . . .
  • CISC vs. RISC
  • bit length, endianness, floating points
  • versions & extensions
  • Many ABIs
  • Many OFFs (object-file formats)
  • ELF

, PE,  Mach-O, . . .

  • Many programming languages

Botconf 2017 11 / 51

slide-22
SLIDE 22

Generic decompilation? Even harder

  • Many architectures
  • x86, ARM, MIPS, PowerPC, . . .
  • CISC vs. RISC
  • bit length, endianness, floating points
  • versions & extensions
  • Many ABIs
  • Many OFFs (object-file formats)
  • ELF

, PE,  Mach-O, . . .

  • Many programming languages
  • Many compilers & optimizations

Botconf 2017 11 / 51

slide-23
SLIDE 23

Generic decompilation? Even harder

  • Many architectures
  • x86, ARM, MIPS, PowerPC, . . .
  • CISC vs. RISC
  • bit length, endianness, floating points
  • versions & extensions
  • Many ABIs
  • Many OFFs (object-file formats)
  • ELF

, PE,  Mach-O, . . .

  • Many programming languages
  • Many compilers & optimizations
  • Statically linked code

Botconf 2017 11 / 51

slide-24
SLIDE 24

Generic decompilation? Even harder

  • Many architectures
  • x86, ARM, MIPS, PowerPC, . . .
  • CISC vs. RISC
  • bit length, endianness, floating points
  • versions & extensions
  • Many ABIs
  • Many OFFs (object-file formats)
  • ELF

, PE,  Mach-O, . . .

  • Many programming languages
  • Many compilers & optimizations
  • Statically linked code
  • . . .

Botconf 2017 11 / 51

slide-25
SLIDE 25

Retargetable Decompiler (RetDec)

◎ Goal

  • generic decompilation of binary code

Botconf 2017 12 / 51

slide-26
SLIDE 26

Retargetable Decompiler (RetDec)

◎ Goal

  • generic decompilation of binary code

History

  • 2011–2013

(AVG + BUT FIT via TA ˇ CR TA01010667 grant)

  • 2013–2016

(AVG + BUT FIT students via diploma theses)

  • 2016–*

(Avast + BUT FIT students)

Botconf 2017 12 / 51

slide-27
SLIDE 27

Retargetable Decompiler (RetDec)

◎ Goal

  • generic decompilation of binary code

History

  • 2011–2013

(AVG + BUT FIT via TA ˇ CR TA01010667 grant)

  • 2013–2016

(AVG + BUT FIT students via diploma theses)

  • 2016–*

(Avast + BUT FIT students)

People

3-4 core developers ≈ 20 BSc/MSc/PhD students

Botconf 2017 12 / 51

slide-28
SLIDE 28

Retargetable Decompiler (RetDec)

◎ Goal

  • generic decompilation of binary code

History

  • 2011–2013

(AVG + BUT FIT via TA ˇ CR TA01010667 grant)

  • 2013–2016

(AVG + BUT FIT students via diploma theses)

  • 2016–*

(Avast + BUT FIT students)

People

3-4 core developers ≈ 20 BSc/MSc/PhD students

Lines of code

419,451 code 205,222 comments, etc. + 624,673 total

Botconf 2017 12 / 51

slide-29
SLIDE 29

RetDec? What does it do

Supports

  • architectures (32-bit): x86, ARM, PowerPC, MIPS
  • OFFs: ELF

, PE, COFF , Mach-O, Intel HEX, AR, raw

  • compilers (we test with): gcc, Clang, MSVC

Botconf 2017 13 / 51

slide-30
SLIDE 30

RetDec? What does it do

Supports

  • architectures (32-bit): x86, ARM, PowerPC, MIPS
  • OFFs: ELF

, PE, COFF , Mach-O, Intel HEX, AR, raw

  • compilers (we test with): gcc, Clang, MSVC

Does

  • compiler/packer detection
  • statically linked code detection
  • OS loader simulation
  • recursive traversal disassembling
  • high-level constructions/types reconstruction
  • pattern detection
  • . . .

Botconf 2017 13 / 51

slide-31
SLIDE 31

RetDec? What does it do

Supports

  • architectures (32-bit): x86, ARM, PowerPC, MIPS
  • OFFs: ELF

, PE, COFF , Mach-O, Intel HEX, AR, raw

  • compilers (we test with): gcc, Clang, MSVC

Does

  • compiler/packer detection
  • statically linked code detection
  • OS loader simulation
  • recursive traversal disassembling
  • high-level constructions/types reconstruction
  • pattern detection
  • . . .

Runs on (hopefully)

Botconf 2017 13 / 51

slide-32
SLIDE 32

Good news everyone!

 RetDec goes open-source under the MIT license

  • december 2017, shortly after the conference

Botconf 2017 14 / 51

slide-33
SLIDE 33

Good news everyone!

 RetDec goes open-source under the MIT license

  • december 2017, shortly after the conference

Repositories

11 core 6 support 8 third party

Contacts

https://retdec.com/ https://github.com/avast-tl https://twitter.com/retdec https://retdec.com/rss/ info@retdec.com

Botconf 2017 14 / 51

slide-34
SLIDE 34

How to get from EXE to C . . .

Botconf 2017 15 / 51

slide-35
SLIDE 35

. . . by using cool technologies

Botconf 2017 16 / 51

slide-36
SLIDE 36

We need to go deeper!

Botconf 2017 17 / 51

slide-37
SLIDE 37

Preprocessing

Botconf 2017 18 / 51

slide-38
SLIDE 38

Preprocessing

Botconf 2017 19 / 51

slide-39
SLIDE 39

Preprocessing

Botconf 2017 19 / 51

slide-40
SLIDE 40

Preprocessing

Botconf 2017 19 / 51

slide-41
SLIDE 41

Preprocessing

Botconf 2017 19 / 51

slide-42
SLIDE 42

Preprocessing

Botconf 2017 19 / 51

slide-43
SLIDE 43

Preprocessing

Botconf 2017 19 / 51

slide-44
SLIDE 44

Preprocessing

Botconf 2017 19 / 51

slide-45
SLIDE 45

Preprocessing

Botconf 2017 19 / 51

slide-46
SLIDE 46

Preprocessing

Botconf 2017 20 / 51

slide-47
SLIDE 47

Preprocessing

Botconf 2017 20 / 51

slide-48
SLIDE 48

Preprocessing repos

Fileformat

  • fileformat, loader, cpdetect, fileinfo, unpacker
  • ar-extractor, macho-extractor, . . .

Botconf 2017 21 / 51

slide-49
SLIDE 49

Preprocessing repos

Fileformat

  • fileformat, loader, cpdetect, fileinfo, unpacker
  • ar-extractor, macho-extractor, . . .

PeLib

  • strengthened
  • new modules (rich header, delayed imports, security dir, . . . )

Botconf 2017 21 / 51

slide-50
SLIDE 50

Preprocessing repos

Fileformat

  • fileformat, loader, cpdetect, fileinfo, unpacker
  • ar-extractor, macho-extractor, . . .

PeLib

  • strengthened
  • new modules (rich header, delayed imports, security dir, . . . )

ELFIO

  • strengthened

Botconf 2017 21 / 51

slide-51
SLIDE 51

Preprocessing repos

Fileformat

  • fileformat, loader, cpdetect, fileinfo, unpacker
  • ar-extractor, macho-extractor, . . .

PeLib

  • strengthened
  • new modules (rich header, delayed imports, security dir, . . . )

ELFIO

  • strengthened

PDBparser

  • will hopefully be replaced by LLVM parsers

Botconf 2017 21 / 51

slide-52
SLIDE 52

Preprocessing repos

Fileformat

  • fileformat, loader, cpdetect, fileinfo, unpacker
  • ar-extractor, macho-extractor, . . .

PeLib

  • strengthened
  • new modules (rich header, delayed imports, security dir, . . . )

ELFIO

  • strengthened

PDBparser

  • will hopefully be replaced by LLVM parsers

Yaracpp

  • YARA C++ wrapper

Botconf 2017 21 / 51

slide-53
SLIDE 53

Core

Botconf 2017 22 / 51

slide-54
SLIDE 54

Core

Botconf 2017 22 / 51

slide-55
SLIDE 55

Core

Botconf 2017 22 / 51

slide-56
SLIDE 56

Core

Botconf 2017 22 / 51

slide-57
SLIDE 57

Core: LLVM

  • dozens of analysis & transform & utility passes
  • dead global elimination, constant propagation, inlining, reassociation,

loop optimization, memory promotion, dead store elimination, . . .

Botconf 2017 23 / 51

slide-58
SLIDE 58

Core: LLVM

  • dozens of analysis & transform & utility passes
  • dead global elimination, constant propagation, inlining, reassociation,

loop optimization, memory promotion, dead store elimination, . . .

  • clang -o hello hello.c -O3
  • 217 passes
  • targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info
  • forceattrs -inferattrs -ipsccp -globalopt -domtree -mem2reg -deadargelim -domtree -basicaa -aa
  • instcombine -simplifycfg -basiccg -globals-aa -prune-eh -inline -functionattrs -argpromotion
  • domtree -sroa -basicaa -aa -memoryssa -early-cse-memssa -speculative-execution -domtree -basicaa
  • aa -lazy-value-info -jump-threading . . .

Botconf 2017 23 / 51

slide-59
SLIDE 59

Core: LLVM IR

  • LLVM IR = LLVM Intermediate Representation
  • kind of assembly language / three address code

@global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y }

Botconf 2017 24 / 51

slide-60
SLIDE 60

Core: LLVM IR

  • LLVM IR = LLVM Intermediate Representation
  • kind of assembly language / three address code

@global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y }

  • SSA = Static Single Assignment
  • %y = add i32 %x, %arg
  • Load/Store architecture
  • %x = load i32, i32* @global
  • Functions, arguments, returns, data types
  • (Un)conditional branches, switches

Botconf 2017 24 / 51

slide-61
SLIDE 61

Core: LLVM IR

  • LLVM IR = LLVM Intermediate Representation
  • kind of assembly language / three address code

@global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y }

  • SSA = Static Single Assignment
  • %y = add i32 %x, %arg
  • Load/Store architecture
  • %x = load i32, i32* @global
  • Functions, arguments, returns, data types
  • (Un)conditional branches, switches
  • Universal IR for efficient compiler transformations and analyses

Botconf 2017 24 / 51

slide-62
SLIDE 62

Core: decoder

Botconf 2017 25 / 51

slide-63
SLIDE 63

Core: decoder

Botconf 2017 25 / 51

slide-64
SLIDE 64

Core: decoder

Botconf 2017 25 / 51

slide-65
SLIDE 65

Core: decoder

Botconf 2017 25 / 51

slide-66
SLIDE 66

Core: decoder

Botconf 2017 25 / 51

slide-67
SLIDE 67

Core: decoder

Botconf 2017 25 / 51

slide-68
SLIDE 68

Core: decoder

Botconf 2017 25 / 51

slide-69
SLIDE 69

Core: decoder

Botconf 2017 25 / 51

slide-70
SLIDE 70

Core: Capstone2LlvmIR

  • Capstone insn → sequence of LLVM IR
  • Handcoded sequences
  • 32/64-bit x86 – 1 person ≈ 2-3 weeks

Botconf 2017 26 / 51

slide-71
SLIDE 71

Core: Capstone2LlvmIR

  • Capstone insn → sequence of LLVM IR
  • Handcoded sequences
  • 32/64-bit x86 – 1 person ≈ 2-3 weeks
  • Architectures (core instruction sets):
  • ARM + Thumb extension – 32-bit
  • MIPS – 32/64-bit
  • PowerPC – 32/64-bit
  • x86 – 32/64-bit
  • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x

Botconf 2017 26 / 51

slide-72
SLIDE 72

Core: Capstone2LlvmIR

  • Capstone insn → sequence of LLVM IR
  • Handcoded sequences
  • 32/64-bit x86 – 1 person ≈ 2-3 weeks
  • Architectures (core instruction sets):
  • ARM + Thumb extension – 32-bit
  • MIPS – 32/64-bit
  • PowerPC – 32/64-bit
  • x86 – 32/64-bit
  • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x
  • Decompilation & advanced insns

Botconf 2017 26 / 51

slide-73
SLIDE 73

Would you rather . . .

  • PMULHUW
  • Multiply Packed Unsigned Integers and Store High Result

if (OperandSize == 64) { //PMULHUW instruction with 64-bit operands: Tmp0[0..31] = Dst[0..15] * Src[0..15]; Tmp1[0..31] = Dst[16..31] * Src[16..31]; Tmp2[0..31] = Dst[32..47] * Src[32..47]; Tmp3[0..31] = Dst[48..63] * Src[48..63]; Dst[0..15] = Tmp0[16..31]; Dst[16..31] = Tmp1[16..31]; Dst[32..47] = Tmp2[16..31]; Dst[48..63] = Tmp3[16..31]; } else { //PMULHUW instruction with 128-bit operands: // Even longer ... }

__asm_PMULHUW(mm1, mm2);

Botconf 2017 25 / 51

slide-74
SLIDE 74

Core: Capstone2LlvmIR

  • Capstone insn → sequence of LLVM IR
  • Handcoded sequences
  • 32/64-bit x86 – 1 person ≈ 2-3 weeks
  • Architectures (core instruction sets):
  • ARM + Thumb extension – 32-bit
  • MIPS – 32/64-bit
  • PowerPC – 32/64-bit
  • x86 – 32/64-bit
  • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x
  • Decompilation & advanced insns

Botconf 2017 26 / 51

slide-75
SLIDE 75

Core: Capstone2LlvmIR

  • Capstone insn → sequence of LLVM IR
  • Handcoded sequences
  • 32/64-bit x86 – 1 person ≈ 2-3 weeks
  • Architectures (core instruction sets):
  • ARM + Thumb extension – 32-bit
  • MIPS – 32/64-bit
  • PowerPC – 32/64-bit
  • x86 – 32/64-bit
  • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x
  • Decompilation & advanced insns
  • full semantics only for simple instructions

Botconf 2017 26 / 51

slide-76
SLIDE 76

Core: Capstone2LlvmIR

  • Capstone insn → sequence of LLVM IR
  • Handcoded sequences
  • 32/64-bit x86 – 1 person ≈ 2-3 weeks
  • Architectures (core instruction sets):
  • ARM + Thumb extension – 32-bit
  • MIPS – 32/64-bit
  • PowerPC – 32/64-bit
  • x86 – 32/64-bit
  • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x
  • Decompilation & advanced insns
  • full semantics only for simple instructions
  • Implementation details, testing framework (Keystone Engine + LLVM

emulator), keeping LLVM IR ↔ ASM mapping, . . .

Botconf 2017 26 / 51

slide-77
SLIDE 77

Core: Capstone2LlvmIR

  • Capstone insn → sequence of LLVM IR
  • Handcoded sequences
  • 32/64-bit x86 – 1 person ≈ 2-3 weeks
  • Architectures (core instruction sets):
  • ARM + Thumb extension – 32-bit
  • MIPS – 32/64-bit
  • PowerPC – 32/64-bit
  • x86 – 32/64-bit
  • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x
  • Decompilation & advanced insns
  • full semantics only for simple instructions
  • Implementation details, testing framework (Keystone Engine + LLVM

emulator), keeping LLVM IR ↔ ASM mapping, . . .

Botconf 2017 26 / 51

slide-78
SLIDE 78

Core: low-level passes

Botconf 2017 27 / 51

slide-79
SLIDE 79

Core: low-level passes

Botconf 2017 27 / 51

slide-80
SLIDE 80

Core: assembly generation

Botconf 2017 28 / 51

slide-81
SLIDE 81

Core: high-level passes

Botconf 2017 29 / 51

slide-82
SLIDE 82

Core repos

RetDec

  • bin2llvmir library
  • bin2llvmirtool

Botconf 2017 30 / 51

slide-83
SLIDE 83

Core repos

RetDec

  • bin2llvmir library
  • bin2llvmirtool

Capstone2LlvmIR

  • Capstone instruction to LLVM IR translation

Botconf 2017 30 / 51

slide-84
SLIDE 84

Core repos

RetDec

  • bin2llvmir library
  • bin2llvmirtool

Capstone2LlvmIR

  • Capstone instruction to LLVM IR translation

Capstone-dumper

  • what does Capstone know about any instruction

Botconf 2017 30 / 51

slide-85
SLIDE 85

Core repos

RetDec

  • bin2llvmir library
  • bin2llvmirtool

Capstone2LlvmIR

  • Capstone instruction to LLVM IR translation

Capstone-dumper

  • what does Capstone know about any instruction

Fnc-patterns

  • statically linked function pattern creation and detection

Botconf 2017 30 / 51

slide-86
SLIDE 86

Core repos

RetDec

  • bin2llvmir library
  • bin2llvmirtool

Capstone2LlvmIR

  • Capstone instruction to LLVM IR translation

Capstone-dumper

  • what does Capstone know about any instruction

Fnc-patterns

  • statically linked function pattern creation and detection

Yaramod

  • YARA to AST parsing & C++ API to build new YARA rulesets

Botconf 2017 30 / 51

slide-87
SLIDE 87

Core repos

RetDec

  • bin2llvmir library
  • bin2llvmirtool

Capstone2LlvmIR

  • Capstone instruction to LLVM IR translation

Capstone-dumper

  • what does Capstone know about any instruction

Fnc-patterns

  • statically linked function pattern creation and detection

Yaramod

  • YARA to AST parsing & C++ API to build new YARA rulesets

Ctypes

  • extraction and presentation of C function data types

Botconf 2017 30 / 51

slide-88
SLIDE 88

Core repos

RetDec

  • bin2llvmir library
  • bin2llvmirtool

Capstone2LlvmIR

  • Capstone instruction to LLVM IR translation

Capstone-dumper

  • what does Capstone know about any instruction

Fnc-patterns

  • statically linked function pattern creation and detection

Yaramod

  • YARA to AST parsing & C++ API to build new YARA rulesets

Ctypes

  • extraction and presentation of C function data types

Demangler

  • gcc/Clang, Microsoft Visual C++, and Borland C++

Botconf 2017 30 / 51

slide-89
SLIDE 89

Backend

Botconf 2017 31 / 51

slide-90
SLIDE 90

Backend

Botconf 2017 31 / 51

slide-91
SLIDE 91

Backend

Botconf 2017 31 / 51

slide-92
SLIDE 92

Backend: BIR is an AST

  • BIR = Backend IR
  • AST = Abstract syntax tree
  • while (x < 20){ x = x + (y * 2); }

Botconf 2017 32 / 51

slide-93
SLIDE 93

Backend: code structuring

  • LLVM IR: only (un)conditional branches & switches
  • identify high-level control-flow patterns
  • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51

slide-94
SLIDE 94

Backend: code structuring

  • LLVM IR: only (un)conditional branches & switches
  • identify high-level control-flow patterns
  • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51

slide-95
SLIDE 95

Backend: code structuring

  • LLVM IR: only (un)conditional branches & switches
  • identify high-level control-flow patterns
  • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51

slide-96
SLIDE 96

Backend: code structuring

  • LLVM IR: only (un)conditional branches & switches
  • identify high-level control-flow patterns
  • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51

slide-97
SLIDE 97

Backend: code structuring

  • LLVM IR: only (un)conditional branches & switches
  • identify high-level control-flow patterns
  • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51

slide-98
SLIDE 98

Backend: code structuring

  • LLVM IR: only (un)conditional branches & switches
  • identify high-level control-flow patterns
  • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51

slide-99
SLIDE 99

Backend: code structuring

  • LLVM IR: only (un)conditional branches & switches
  • identify high-level control-flow patterns
  • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51

slide-100
SLIDE 100

Backend: optimizations

  • copy propagation
  • reducing the number of variables

Botconf 2017 34 / 51

slide-101
SLIDE 101

Backend: optimizations

  • copy propagation
  • reducing the number of variables
  • arithmetic expression simplification
  • a + -1 - -4

⇒ a + 3

Botconf 2017 34 / 51

slide-102
SLIDE 102

Backend: optimizations

  • copy propagation
  • reducing the number of variables
  • arithmetic expression simplification
  • a + -1 - -4

⇒ a + 3

  • negation optimization
  • if (!(a == b))

⇒ if (a != b)

Botconf 2017 34 / 51

slide-103
SLIDE 103

Backend: optimizations

  • copy propagation
  • reducing the number of variables
  • arithmetic expression simplification
  • a + -1 - -4

⇒ a + 3

  • negation optimization
  • if (!(a == b))

⇒ if (a != b)

  • pointer arithmetic
  • *(a + 4)

⇒ a[4]

Botconf 2017 34 / 51

slide-104
SLIDE 104

Backend: optimizations

  • copy propagation
  • reducing the number of variables
  • arithmetic expression simplification
  • a + -1 - -4

⇒ a + 3

  • negation optimization
  • if (!(a == b))

⇒ if (a != b)

  • pointer arithmetic
  • *(a + 4)

⇒ a[4]

  • conversion of while (true){ ... if (cond) break; ... }
  • for (cond){ ... }
  • while (cond){ ... }

Botconf 2017 34 / 51

slide-105
SLIDE 105

Backend: optimizations

  • copy propagation
  • reducing the number of variables
  • arithmetic expression simplification
  • a + -1 - -4

⇒ a + 3

  • negation optimization
  • if (!(a == b))

⇒ if (a != b)

  • pointer arithmetic
  • *(a + 4)

⇒ a[4]

  • conversion of while (true){ ... if (cond) break; ... }
  • for (cond){ ... }
  • while (cond){ ... }
  • conversion of if/else-if/else chains to switch

Botconf 2017 34 / 51

slide-106
SLIDE 106

Backend: optimizations

  • copy propagation
  • reducing the number of variables
  • arithmetic expression simplification
  • a + -1 - -4

⇒ a + 3

  • negation optimization
  • if (!(a == b))

⇒ if (a != b)

  • pointer arithmetic
  • *(a + 4)

⇒ a[4]

  • conversion of while (true){ ... if (cond) break; ... }
  • for (cond){ ... }
  • while (cond){ ... }
  • conversion of if/else-if/else chains to switch
  • . . .

Botconf 2017 34 / 51

slide-107
SLIDE 107

Backend: code generation

  • variable name assignment
  • induction variables: for (i = 0; i < 10; ++i)
  • function arguments: a1, a2, a3, . . .
  • general context names: return result;
  • stdlib context names: int len = strlen();

Botconf 2017 35 / 51

slide-108
SLIDE 108

Backend: code generation

  • variable name assignment
  • induction variables: for (i = 0; i < 10; ++i)
  • function arguments: a1, a2, a3, . . .
  • general context names: return result;
  • stdlib context names: int len = strlen();
  • stdlib context literals
  • var_ffff7dc6 = socket(2, 3, 255)

Botconf 2017 35 / 51

slide-109
SLIDE 109

Backend: code generation

  • variable name assignment
  • induction variables: for (i = 0; i < 10; ++i)
  • function arguments: a1, a2, a3, . . .
  • general context names: return result;
  • stdlib context names: int len = strlen();
  • stdlib context literals
  • var_ffff7dc6 = socket(2, 3, 255)

sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW)

Botconf 2017 35 / 51

slide-110
SLIDE 110

Backend: code generation

  • variable name assignment
  • induction variables: for (i = 0; i < 10; ++i)
  • function arguments: a1, a2, a3, . . .
  • general context names: return result;
  • stdlib context names: int len = strlen();
  • stdlib context literals
  • var_ffff7dc6 = socket(2, 3, 255)

sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW)

  • flock(sock_id, 7)

flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB)

Botconf 2017 35 / 51

slide-111
SLIDE 111

Backend: code generation

  • variable name assignment
  • induction variables: for (i = 0; i < 10; ++i)
  • function arguments: a1, a2, a3, . . .
  • general context names: return result;
  • stdlib context names: int len = strlen();
  • stdlib context literals
  • var_ffff7dc6 = socket(2, 3, 255)

sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW)

  • flock(sock_id, 7)

flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB)

  • output generation
  • C
  • CFG = Control-Flow Graph
  • Call Graph

Botconf 2017 35 / 51

slide-112
SLIDE 112

Backend repos

RetDec

  • llvmir2hll library
  • llvmir2hlltool

Botconf 2017 36 / 51

slide-113
SLIDE 113

How to use RetDec

Online decompilation service

https://retdec.com/decompilation/

Botconf 2017 37 / 51

slide-114
SLIDE 114

How to use RetDec

Online decompilation service

https://retdec.com/decompilation/

REST API

https://retdec.com/api/

Botconf 2017 37 / 51

slide-115
SLIDE 115

How to use RetDec

Online decompilation service

https://retdec.com/decompilation/

REST API

https://retdec.com/api/

Build it yourself

CMake, gcc/Clang, Visual Studio 2015 Update 2 Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install

Botconf 2017 37 / 51

slide-116
SLIDE 116

How to use RetDec

Online decompilation service

https://retdec.com/decompilation/

REST API

https://retdec.com/api/

Build it yourself

CMake, gcc/Clang, Visual Studio 2015 Update 2 Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install

Run it yourself

decompile.sh binary.exe

Botconf 2017 37 / 51

slide-117
SLIDE 117

How to use RetDec

Online decompilation service

https://retdec.com/decompilation/

REST API

https://retdec.com/api/

Build it yourself

CMake, gcc/Clang, Visual Studio 2015 Update 2 Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install

Run it yourself

decompile.sh binary.exe

Get RetDec IDA plugin

Botconf 2017 37 / 51

slide-118
SLIDE 118

What is RetDec IDA plugin

Botconf 2017 38 / 51

slide-119
SLIDE 119

How does RetDec IDA plugin work

◎ Goals

look & feel native same object names as IDA interactive

Botconf 2017 39 / 51

slide-120
SLIDE 120

How does RetDec IDA plugin work

◎ Goals

look & feel native same object names as IDA interactive

Botconf 2017 39 / 51

slide-121
SLIDE 121

How does RetDec IDA plugin work

◎ Goals

look & feel native same object names as IDA interactive

Botconf 2017 39 / 51

slide-122
SLIDE 122

How does RetDec IDA plugin work

◎ Goals

look & feel native same object names as IDA interactive

Botconf 2017 39 / 51

slide-123
SLIDE 123

How does RetDec IDA plugin work

◎ Goals

look & feel native same object names as IDA interactive

Botconf 2017 39 / 51

slide-124
SLIDE 124

How does RetDec IDA plugin work

◎ Goals

look & feel native same object names as IDA interactive

Botconf 2017 39 / 51

slide-125
SLIDE 125

How does RetDec IDA plugin work

◎ Goals

look & feel native same object names as IDA interactive

Botconf 2017 39 / 51

slide-126
SLIDE 126

How does RetDec IDA plugin work

◎ Goals

look & feel native same object names as IDA interactive

Botconf 2017 39 / 51

slide-127
SLIDE 127

RetDec IDA plugin is interactive

Botconf 2017 40 / 51

slide-128
SLIDE 128

How was RetDec used so far

retdec.com launched on 2015-02-05

Botconf 2017 41 / 51

slide-129
SLIDE 129

How was RetDec used so far

retdec.com launched on 2015-02-05 12,000 registered users

Botconf 2017 41 / 51

slide-130
SLIDE 130

How was RetDec used so far

retdec.com launched on 2015-02-05 12,000 registered users 423,000 decompilations

350,000 Web 73,000 API

| 410 decompilations daily

Botconf 2017 41 / 51

slide-131
SLIDE 131

Real example #1: Vawtrak (x86)

Botconf 2017 42 / 51

slide-132
SLIDE 132

Real example #2: Vawtrak (x86)

Botconf 2017 43 / 51

slide-133
SLIDE 133

Real example #3: CryproWall (x86)

Botconf 2017 44 / 51

slide-134
SLIDE 134

Real example #4: Psyb0t (mips) RetDec

Botconf 2017 45 / 51

slide-135
SLIDE 135

Real example #5: Psyb0t (mips) RetDec

system sleep flock fork fread gettimeofday fopen srand fclose strcmp exit fileno function_404810 xDec function_404b1c main backup Daemonize RSeed getip ip2c fetch snprintf strncmp strncpy strlen parse

Botconf 2017 46 / 51

slide-136
SLIDE 136

Should you throw away your Hex-Rays?

Botconf 2017 47 / 51

slide-137
SLIDE 137

Should you throw away your Hex-Rays?

NO!

  • IDA and Hex-Rays are great
  • utput quality

interactive seamlessly integrated mature many plugins

  • fficial support

Botconf 2017 47 / 51

slide-138
SLIDE 138

Should you throw away your Hex-Rays?

NO!

  • IDA and Hex-Rays are great
  • utput quality

interactive seamlessly integrated mature many plugins

  • fficial support
  • IDA and Hex-Rays have flaws

not free proprietary big monolithic GUI app

Botconf 2017 47 / 51

slide-139
SLIDE 139

RetDec is handy because . . .

  • Obvious reasons

it is free + MIPS architecture MIT license you can play with the sources

Botconf 2017 48 / 51

slide-140
SLIDE 140

RetDec is handy because . . .

  • Obvious reasons

it is free + MIPS architecture MIT license you can play with the sources

  • Not so obvious reasons

LLVM is awesome

Botconf 2017 48 / 51

slide-141
SLIDE 141

RetDec is handy because . . .

  • Obvious reasons

it is free + MIPS architecture MIT license you can play with the sources

  • Not so obvious reasons

LLVM is awesome different basic designs: interactive GUI vs. pipeline

Botconf 2017 48 / 51

slide-142
SLIDE 142

RetDec is handy because . . .

  • Obvious reasons

it is free + MIPS architecture MIT license you can play with the sources

  • Not so obvious reasons

LLVM is awesome different basic designs: interactive GUI vs. pipeline LLVM is OP (don’t worry, it won’t be nerfed)

Botconf 2017 48 / 51

slide-143
SLIDE 143

RetDec is not only decompiler

RetDec – the decompiler RetDec IDA plugin – Hex-Rays impersonation

Botconf 2017 49 / 51

slide-144
SLIDE 144

RetDec is not only decompiler

RetDec – the decompiler RetDec IDA plugin – Hex-Rays impersonation Fileformat – generic OFF parsing and analysis Capstone2LlvmIR – binary to LLVM translation Fnc-patterns – statically linked code detection in YARA (IDA F .L.I.R.T.) Yaramod – hack YARA rules in C++ Yaracpp – YARA C++ wrapper Ctypes – info on function types

Botconf 2017 49 / 51

slide-145
SLIDE 145

What’s next

  • Release the sources on Github shortly after the conference

Botconf 2017 50 / 51

slide-146
SLIDE 146

What’s next

  • Release the sources on Github shortly after the conference
  • Throw a release party

Botconf 2017 50 / 51

slide-147
SLIDE 147

What’s next

  • Release the sources on Github shortly after the conference
  • Throw a release party
  • Solve some inevitable “hey guys, I’m unable to build your repo”

Botconf 2017 50 / 51

slide-148
SLIDE 148

What’s next

  • Release the sources on Github shortly after the conference
  • Throw a release party
  • Solve some inevitable “hey guys, I’m unable to build your repo”
  • Write more technical documentation on how it all works

Botconf 2017 50 / 51

slide-149
SLIDE 149

What’s next

  • Release the sources on Github shortly after the conference
  • Throw a release party
  • Solve some inevitable “hey guys, I’m unable to build your repo”
  • Write more technical documentation on how it all works
  • Present it somewhere (maybe LLVM dev meeting)

Botconf 2017 50 / 51

slide-150
SLIDE 150

What’s next

  • Release the sources on Github shortly after the conference
  • Throw a release party
  • Solve some inevitable “hey guys, I’m unable to build your repo”
  • Write more technical documentation on how it all works
  • Present it somewhere (maybe LLVM dev meeting)
  • Continue improving the implementation
  • make it more portable: Bash ⇒ Python, . . .
  • 64-bit architectures supports
  • replace some libraries & modules

Botconf 2017 50 / 51

slide-151
SLIDE 151

What’s next

  • Release the sources on Github shortly after the conference
  • Throw a release party
  • Solve some inevitable “hey guys, I’m unable to build your repo”
  • Write more technical documentation on how it all works
  • Present it somewhere (maybe LLVM dev meeting)
  • Continue improving the implementation
  • make it more portable: Bash ⇒ Python, . . .
  • 64-bit architectures supports
  • replace some libraries & modules
  • We will see. . .

Botconf 2017 50 / 51

slide-152
SLIDE 152

That’s all folks

Thanks!

Contacts

https://retdec.com/ https://github.com/avast-tl https://twitter.com/retdec https://retdec.com/rss/ info@retdec.com

Botconf 2017 51 / 51