St Statistical De Deobfuscati tion fo for Android Applications Benjamin Veselin Petar Martin Bichsel Raychev Tsankov Vechev Department of Computer Science
Why De-obfuscate? Android binaries (APKs) (no code available) Number of APKs on Google Play 2.4M APKs ’10 ’12 ’14 ’16 Google Play
Layout Obfuscation in Android Non-descriptive names package com.example.dbhelper package a.b.c class DBHelper extends SQLiteHelper { class a extends SQLiteHelper { SQLiteDatabase db ; SQLiteDatabase b ; Some names Obfuscate public DBHelper (Context ctx) { public a (Context ctx) { remain db = getWritableDatabase(); b = getWritableDatabase(); } } Cursor execSQL (String str) { Cursor c (String str) { return db .rawQuery(str); return b .rawQuery(str); } } Names provide } } key semantic information
Layout Obfuscation in Android Non-descriptive names package com.example.dbhelper package a.b.c Security Challenges class DBHelper extends SQLiteHelper { class a extends SQLiteHelper { SQLiteDatabase db ; SQLiteDatabase b ; Some names Obfuscate Code Inspection public DBHelper (Context ctx) { public a (Context ctx) { remain db = getWritableDatabase(); b = getWritableDatabase(); } } Third-party Library Detection Cursor execSQL (String str) { Cursor c (String str) { return db .rawQuery(str); … many others return b .rawQuery(str); } } Names provide } } key semantic information
Layout Obfuscation in Android Non-descriptive names package com.example.dbhelper package a.b.c class DBHelper extends SQLiteHelper { class a extends SQLiteHelper { SQLiteDatabase db ; SQLiteDatabase b ; Can we reverse Some names public DBHelper (Context ctx) { public a (Context ctx) { layout obfuscation remain db = getWritableDatabase(); b = getWritableDatabase(); } } Cursor execSQL (String str) { Cursor c (String str) { return db .rawQuery(str); return b .rawQuery(str); } } Names provide } } key semantic information
Layout Obfuscation in Android Non-descriptive names package com.example.dbhelper package a.b.c class DBHelper extends SQLiteHelper { class a extends SQLiteHelper { SQLiteDatabase db ; SQLiteDatabase b ; public DBHelper (Context ctx) { public a (Context ctx) { www.apk-deguard.com db = getWritableDatabase(); b = getWritableDatabase(); } } Cursor execSQL (String str) { Cursor c (String str) { return db .rawQuery(str); return b .rawQuery(str); } } Names provide } } key semantic Yes, with roughly 80% accuracy! information
www.apk-deguard.com Released last week, so far: > 5K users > 5GB APKs Reddit posts/comments Tweets . . . . . .
How Does DeGuard Work?
DeGuard: System Overview class a extends SQLiteHelper { class DBHelper extends SQLiteHelper{ SQLiteDatabase b ; SQLiteDatabase db ; Static MAP public a (Context ctx) { public DBHelper (Context ctx) { Transform analysis Inference b = getWritableDB(); db = getWritableDB(); } } } } Obfuscated Code De-obfuscated Code Prediction Phase Learning Phase Probabilistic model 𝑄 ) Open-source, Static Training unobfuscated analysis applications Semantic representation
Probabilistic Graphical Models
Probabilistic Graphical Models class a extends SQLiteHelper { name1 name2 weight SQLiteDatabase b ; 𝑔 % SQLiteHelper DBUtils 0.3 public a (Context ctx) { 𝑔 & SQLiteHelper DBHelper 0.2 b = getWritableDB(); } SQLiteHelper a name1 name2 weight extends } 𝑔 ) DBUtils instance 0.5 field-in 𝑔 * DBHelper db 0.4 gets getWritableDB b 𝑔 + … … … name1 name2 weight 𝑔 ' getWritableDB db 0.7 Graph + features define a probabilistic graphical model 𝑔 ( getWritableDB instance 0.4 𝑄 ) = 𝑃 𝐿 = 𝑄 𝑏, 𝑐 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑓𝑢𝑋𝑠𝑗𝑢𝑏𝑐𝑚𝑓𝐸𝐶 ) Known variables 𝐿 = 1 𝑎 exp (0.3 I 𝑔 % 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑏 Unknown variables 𝑃 + 0.2 I 𝑔 & 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑏 + ⋯ ) 𝑔 % , 𝑔 & , . . Feature functions
Probabilistic Graphical Models class a extends SQLiteHelper { name1 name2 weight SQLiteDatabase b ; 𝑔 % SQLiteHelper DBUtils 0.3 public a (Context ctx) { 𝑔 & SQLiteHelper DBHelper 0.2 b = getWritableDB(); } SQLiteHelper a name1 name2 weight extends } 𝑔 ) DBUtils instance 0.5 field-in 𝑔 * DBHelper db 0.4 Next gets getWritableDB b 𝑔 + … … … How are the weights name1 name2 weight and features learned? 𝑔 ' getWritableDB db 0.7 Graph + features define a probabilistic graphical model 𝑔 ( getWritableDB instance 0.4 𝑄 ) = 𝑃 𝐿 = 𝑄 𝑏, 𝑐 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑓𝑢𝑋𝑠𝑗𝑢𝑏𝑐𝑚𝑓𝐸𝐶 ) Known variables 𝐿 = 1 𝑎 exp (0.3 I 𝑔 % 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑏 Unknown variables 𝑃 + 0.2 I 𝑔 & 𝑇𝑅𝑀𝑗𝑢𝑓𝐼𝑓𝑚𝑞𝑓𝑠, 𝑏 + ⋯ ) 𝑔 % , 𝑔 & , . . Feature functions
Learning
Learning Actual graphs have >1,000 nodes >2,000 name1 name2 weight 𝑔 % SQLiteHelper DBUtils 0.3 Dependency graphs Unobfuscated 𝑔 & SQLiteHelper DBHelper 0.2 APKs name1 name2 𝑔 ' getWritableDB db 0.7 Static Train 𝑔 % SQLiteHelper DBUtils 𝑔 ( getWritableDB instance 0.4 analysis Model 𝑔 & SQLiteHelper DBHelper 𝑔 ) DBUtils instance 0.5 𝑔 ' getWritableDB db 𝑔 * DBHelper db 0.4 Feature 𝑔 ( getWritableDB instance 𝑔 + … … … templates 𝑔 ) DBUtils instance 𝑔 * DBHelper db 28 templates 𝑔 + … … Compute weights that maximize >100,000 Features (with 𝑄 𝑃 = 𝑝 N 𝐿 = 𝑙 N for all candidate names) training samples (𝑝 N , 𝑙 N )
DeGuard: System Overview class a extends SQLiteHelper { class DBHelper extends SQLiteHelper{ SQLiteDatabase b ; SQLiteDatabase db ; Static MAP public a (Context ctx) { public DBHelper (Context ctx) { Transform analysis Inference b = getWritableDB(); db = getWritableDB(); } } } } Obfuscated Code De-obfuscated Code Prediction Phase Learning Phase Probabilistic model 𝑄 ) Open-source, Static Training unobfuscated analysis applications
DeGuard: System Overview class a extends SQLiteHelper { class DBHelper extends SQLiteHelper{ SQLiteDatabase b ; SQLiteDatabase db ; Static MAP public a (Context ctx) { public DBHelper (Context ctx) { Transform analysis Inference b = getWritableDB(); db = getWritableDB(); } } } } Obfuscated Code De-obfuscated Code Prediction Phase Probabilistic model 𝑄 )
Prediction Phase name1 name2 weight SQLiteHelper DBUtils 0.3 class a extends SQLiteHelper { SQLiteHelper DBHelper 0.2 SQLiteDatabase b ; Static public a (Context ctx) { analysis SQLiteHelper a b = getWritableDB(); extends } field-in Obfuscated Code } gets getWritableDB b name1 name2 weight name1 name2 weight DBUtils instance 0.5 getWritableDB db 0.7 DBHelper db 0.4 getWritableDB instance 0.4 DBUtils db 0.2 DBHelper instance 0.2
Prediction Phase name1 name2 weight SQLiteHelper DBUtils 0.3 MAP Inference class a extends SQLiteHelper { SQLiteHelper DBHelper 0.2 SQLiteDatabase b ; Program 𝑝 ⃗ = 𝑏𝑠𝑛𝑏𝑦 𝑄 𝑃 = 𝑝 ⃗′ 𝐿 = 𝑙 public a (Context ctx) { analysis SQLiteHelper a b = getWritableDB(); 𝑝 ⃗′ ∈ Ω extends } field-in Obfuscated Code } Candidate assignment 𝒑 𝑸 𝒑 𝒍) * gets getWritableDB b a = DBUtils b = instance 1.2 a = DBHelper b = db 1.3 name1 name2 weight a = DBUtils b = db 0.8 name1 name2 weight DBUtils instance 0.5 getWritableDB db 0.7 a = DBHelper b = instance 1.2 DBHelper db 0.4 getWritableDB instance 0.4 DBUtils db 0.2 DBHelper instance 0.2 *Non-normalized
Prediction Phase name1 name2 weight SQLiteHelper DBUtils 0.3 MAP Inference class a extends SQLiteHelper { SQLiteHelper DBHelper 0.2 SQLiteDatabase b ; Program 𝑝 ⃗ = 𝑏𝑠𝑛𝑏𝑦 𝑄 𝑃 = 𝑝 ⃗′ 𝐿 = 𝑙 public a (Context ctx) { analysis SQLiteHelper a b = getWritableDB(); 𝑝 ⃗′ ∈ Ω extends } field-in Obfuscated Code } Candidate assignment 𝒑 𝑸 𝒑 𝒍) * gets getWritableDB b a = DBUtils b = instance 1.2 a = DBHelper b = db 1.3 name1 name2 weight a = DBUtils b = db 0.8 name1 name2 weight DBUtils instance 0.5 getWritableDB db 0.7 a = DBHelper b = instance 1.2 DBHelper db 0.4 getWritableDB instance 0.4 DBUtils db 0.2 DBHelper instance 0.2 *Non-normalized
Prediction Phase name1 name2 weight SQLiteHelper DBUtils 0.3 class a extends SQLiteHelper { SQLiteHelper DBHelper 0.2 SQLiteDatabase b ; Static public a (Context ctx) { analysis SQLiteHelper DBHelper b = getWritableDB(); extends } field-in Obfuscated Code } gets getWritableDB db class DBHelper extends SQLiteHelper { SQLiteDatabase db ; name1 name2 weight public DBHelper (Context ctx) { name1 name2 weight DBUtils instance 0.5 Transform db = getWritableDB(); getWritableDB db 0.7 DBHelper db 0.4 } getWritableDB instance 0.4 DBUtils db 0.2 Deobfuscated Code } DBHelper instance 0.2
Recommend
More recommend