Overview Trace-based approach Experiments Semantic Trace-based Malware Variants Detection Khalid Alzarooni CREST - DCS - UCL April 6, 2011
Overview Trace-based approach Experiments Outline Overview Trace-based approach Experiments
Overview Trace-based approach Experiments Overview
Overview Trace-based approach Experiments Malware Variants • Speed of evolution of malware partly driven by automatic generation of program variants • Semantic equivalence tables used in malware, e.g. polymorphic and metamorphic malware • These alter “local behaviour” of programs but larger scale behaviour is unchanged
Overview Trace-based approach Experiments Malware Problem Anoirel S. Issa Symantec, UK (EICAR 2009) “Poly or metamorphic engines have some essential components that help them build highly obfuscated code. A single engine is able to produce unique variants that can reach millions.” Malware evolution: M 0 → M 1 → M 2 → M 3 → . . . Syntactic view: code 0 �≈ code 1 �≈ code 2 �≈ code 3 �≈ . . .
Overview Trace-based approach Experiments Some Code Obfuscation Schemes [Beaucamps, 2007, Sz¨ or, 2005] Label Category Obfuscation gi Garbage insertion {} → { C } {} → { P T / F } op Opaque predicate ec Equivalent command { op } → { ¯ op } rr Register renaming { Rx } → { Ry } cs Command split { C } → { C x , C y } cm Command merging { C x , C y } → { C xy } cr Command reorder { ( C x , C y ) } → { ( C y , C x ) } .. . . . . . .
Overview Trace-based approach Experiments Example: a program P and its semantically equivalent variant P ′ P ′ : a ′ R0:=n cr 1 JMP rr 1 P : gi 1 R22:=R22+1 a R0:=n P T JMP cm op 1 b R1:=m rr 1 c R11:=m R2:=R1 − → gi 2 d R22:=R22+1 R3:=R2+R0 cr 2 JMP op 1 e R4:=R1+k cm R3:=R11+R0 f R5:=1 e ′ R4:=k 1 e ′ R4:=R4+R11 2 rr 2 R15:=1
Overview Trace-based approach Experiments Malware Problem • To detect variants of a known malware • Given two arbitrary programs is it possible to tell whether they are semantically equivalent? • It is undecidable: not possible to devise an algorithm to produce “yes” or “no” detection answer [Cohen, 1987] P ′ ? ≈ P
Overview Trace-based approach Experiments Semantic trace-based Program ↓ Program approximation ↓ Trace collection ↓ Semantic analysis ↓ Detection of semantic signatures
Overview Trace-based approach Experiments Test scenarios Results: • Tested samples: Bho, Binom, Mobler, Telf, . . . • Most malware successfully matched, with k ≥ 60% • No false positives, similarity ≤ 20% (10 benign executables) • 100% malware variants classification • sig-w-slice: accuracy 30% and speed 26% in detection phase • sig-wo-slice: 5:7 faster in sig. generation phase
Overview Trace-based approach Experiments Trace-based approach
Overview Trace-based approach Experiments Semantic trace-based • Design a detector that can tell when two programs are approximately equivalent, which might often be good enough • Approximate semantic equivalence is decidable • Approximate a program’s semantics [ [ P ] ] • CFG abstract traces (program paths) & test inputs • concrete & semantic traces Malware evolution: M 0 → M 1 → M 2 → M 3 → . . . Syntactic view: code 0 �≈ code 1 �≈ code 2 �≈ code 3 �≈ . . . Semantic view: [ [ M 0 ] ] ≈ [ [ M 1 ] ] ≈ [ [ M 2 ] ] ≈ [ [ M 3 ] ] ≈ . . .
Overview Trace-based approach Experiments Semantic trace-based • M 1 is a variant of M 0 if [ [ M 0 ] ] is sub-sequence of [ [ M 1 ] ] 1 2 3 4 malware trace t variant trace t ′ 4 1 2 3 . ] , ∃ t ′ ∈ [ ] : t ≺ t ′ ∀ t ∈ [ [ M 0 ] [ M 1 ]
Overview Trace-based approach Experiments Semantic trace-based Two phases: 1. Signature generation 2. Detection
Overview Trace-based approach Experiments Signature generation phase executable M ↓ (disassembler & translator) abstract code (AAPL) ↓ (test data generator) abstract trace and a test input x ↓ (semantic simulator) a concrete trace ↓ (trace slicer) trace slices ↓ (abstracter) semantic traces τ m semantic signature = ( τ m , x )
Overview Trace-based approach Experiments Detection phase executable P ↓ (disassembler & translator) abstract code (AAPL) ↓ (semantic simulator, sig m = ( τ m , x )) a concrete trace ↓ (abstracter) ( τ p , τ m ) ↓ (Matcher) yes/no
Overview Trace-based approach Experiments Experiments
Overview Trace-based approach Experiments Detector prototype Malicious program M Signature generation phase Semantic signatures Suspicious program P Detection phase Yes/No
Overview Trace-based approach Experiments Test scenarios We tested: • Robustness against real in-the-wild variants • Effectiveness of trace slicing in the signatures • Fig. gen.& detection phases: sig-wo-slice vs. sig-w-slice • False positives • Classification of malware samples
Overview Trace-based approach Experiments Test scenarios Results: • Tested samples: Bho, Binom, Mobler, Telf, . . . • Most malware successfully matched, with k ≥ 60% • sig-w-slice: accuracy 30% and speed 26% in detection phase • sig-wo-slice: 5:7 faster in sig. generation phase • No false positives, similarity ≤ 20% (10 benign executables) • 100% malware variants classification
Overview Trace-based approach Experiments Prototype limitation Technical shortcomes: • Limited to viruses and worms • Does not work for dynamic packed code and code with anti-disassembly techniques and • Relay on tools to manually unpack (encrypted) and disassemble files
Overview Trace-based approach Experiments Thank you very much ! 0 Image: Salvatore Vuono / FreeDigitalPhotos.net
Overview Trace-based approach Experiments References Alzarouni, K., Clark, D., and Tratt, L. (2010). Semantic malware detection. Technical Report TR-10-03, Department of Computer Science, King’s College London. Beaucamps, P. (2007). Advanced metamorphic techniques in computer viruses. In Proceedings of the International Conference on Computer, Electrical, and Systems Science, and Engineering - CESSE’07 . Cohen, F. (1987). Computer viruses: theory and experiments. Comput. Secur. , 6(1):22–35. Sz¨ or, P. (2005). The Art of Computer Virus Research and Defense . Addison-Wesley, Reading, Mass.
Overview Trace-based approach Experiments Detector components
Overview Trace-based approach Experiments Trace Semantics • Trace semantics of a program is the set of all traces T that the program can produce • A trace t ∈ T is a sequence of pairs of execution context X and program syntax C • Execution context: memory (locations) and environment (variables) values X = E × M • Program syntax: source code (commands) ρ ∈ E = R → Z ⊥ (environments) m ∈ M = Z → Z ⊥ ∪ C (memory) ξ ∈ X = E × M (execution contexts) S = C × X (program states)
Overview Trace-based approach Experiments Trace Semantics • Signatures refer to exact program state • Semantic signatures refer to values at particular memory locations and in registers that are observed to be constant across variants from the same malware family • Detection: environment-memory traces of M that are contained (subtraces) of environment-memory traces of M ′
Overview Trace-based approach Experiments Semantic Simulator Not “live” testing Evaluate abstract trace and collect concrete traces Semantics of Actions: ˆ A : A × X → X where ξ = ( ρ, m ) and ρ ′ = ρ ( R �→ ˆ ˆ ] ξ = ( ρ ′ , m ) A [ [ R := E ] E [ [ E ] ] ξ ) where ξ = ( ρ, m ) and m ′ = m ( ρ ( R ) �→ ˆ ˆ ] ξ = ( ρ, m ′ ) A [ [ ∗ R := E ] E [ [ E ] ] ξ ) where ξ = ( ρ, m ) and ρ ′ = ρ ( PC �→ ˆ ˆ A [ [ JMP E ] ] ξ = ( ρ ′ , m ) E [ [ E ] ] ξ ) where ξ = ( ρ, m ) and ρ ′ = ρ ( PC �→ m ( ρ ( SP )) , SP �→ SP + 1) ˆ ] ξ = ( ρ ′ , m ) A [ [ RTN ] where ξ = ( ρ, m ) and ρ ′ = ρ ( SP �→ SP − 1) and ˆ ] ξ = ( ρ ′ , m ′ ) A [ [ PUSH E ] m ′ = m ( ρ ( SP − 1) �→ ˆ E [ [ E ] ] ξ )
Overview Trace-based approach Experiments Semantic Simulator Not “live” testing Evaluate abstract trace and collect concrete traces Semantics of Commands: ˆ C : S → Σ( S ) ( determines transition relation between states ) where ξ = ( ρ, m ) , ξ ′ = ˆ ˆ C [ [ C A ] ] ξ = ( ξ ′ , C ′ ) A [ [ A ] ] ξ and � m ( ρ ( PC )) if A := JMP ∪ CALL ∪ RTN C ′ = m ( ρ ( PC + 1)) otherwise ˆ ] ξ = ( ξ ′ , C ′ ) C [ [ C B ] where ξ = ( ρ, m ) , and ξ ′ = ( ρ ′ , m ) , ρ ′ = ρ ( PC �→ ˆ ] ξ ) , C ′ = m ( ρ ( ˆ if ˆ � E [ [ E ] E [ [ E ] ] ξ )) B [ [ B ] ] ξ = true ( ξ ′ , C ′ ) = ξ ′ = ξ, C ′ = m ( ρ ( PC + 1)) otherwise
Overview Trace-based approach Experiments TSAlgo – Trace slicing → P ′ (semantically invariant subprogram wrt a criterion) • P slice − • t slice → t ′ (semantically invariant subtrace wrt tsc ) − • Trace slicing criterion tsc : recent definition points of variables in t • A conjecture: useful in the detection step for more accurate and efficient results. • Effect is to shorten the trace and thus the signature
Recommend
More recommend