2020 International World Wide Web Conference (i3w) MadDroid: Characterizing and Detecting Devious Ad Contents for Android Apps
MOTIVATION The perspective of mobile advertisers themselves, who provide ad content and pay ad networks, has been rarely studied. AD NETWORK Pay Distribute ADVERTIS USER ER View
AD CONTENT TYPE 1. Re-direction Link towards landing page 2. Deep link to switch current page to Google Play Store 3. Automatically downloading of a file 952634.cn Landing Page
CONTENT TYPE A click-deceptive Image
5 CATEGORIES DEVIOUS AD CONTENT A click-deceptive Image
TCM: TRAFFIC COLLECTION MODULE • Challenge: Not all traffics are ad traffic • Solution: Click on Main UI and Exit UI [61], and Click on Webview, ImageView, and ViewFlipper • Implementation: BFS, Base on Attribute Figure 5: An example of a view tree
CONTENT EXTRACT MODULE • Purpose: Get images and executable scripts, • Method: Fiddler + Http function hook • Challenge: 1. How to determine the domain is Ad domain? • 2. Given Ad libs, the domain may change. Input : Host and Ad libs Output : Host-lib Mapping Solution : Iteratively find libs
DEVIOUSNESS DETECTION MODULE • 5 categories -> 5 dedicated parts • Click deceiving image: Object recognition • Censored Image: Google API • Gambling: OCR • Malicious App, Script, redirection link: Online antivirus platform
EVALUATION: RESEARCH QUESTION
RQ1: Can MadDroid detect devious mobile ad contents
RQ1: Can MadDroid detect devious mobile ad contents
RQ1: Can MadDroid detect devious mobile ad contents
RQ1: Can MadDroid detect devious mobile ad contents
RQ2: How effective is the HTTP hooking approach (in the CEM module) in locating ad traffic from general network traffic? Host Only: Input host Lib Only: Input Lib Host&lib: Input both host name and lib
RQ3: ACCURACY Click Deceptive: 97.51% recall, 97.99% accuracy Censored Image: 100% The rest: not specified
THANK YOU
COMPARISON: TCM: TRAFFIC COLLECTION MODULE • Purpose: Generate Ad Traffic only • Method: BFS, Base on Attribute My framework: Collecting traffic after launch for 20s 这样做有⾜够的理论基础吗 是否还需要做其他⽅向的收集 Generalization
还可以⼲什么 得到买⽅域名和使⽤的函数名 COMPARISON: CONTENT EXTRACT MODULE • Purpose: Get images and executable scripts • Method: Fiddler + Http function hook My Framework text/Html: 赌博⽂字 mapping 可以⽤来做 campaign ⽣态研究 制作⽅ -libraries-domain(ip)
COMPARISON: DEVIOUSNESS DETECTION MODULE • 5 categories -> 5 dedicated parts • Click deceiving image: Object recognition • Censored Image: Google API Gambling , OCR • Malicious App, Script, redirection link: Online antivirus platform My framework 检测赌博⽂字⽤关键词,赌博图⽚可以采⽤ OCR NLP 可能⽤不上 发现新的关键词,参考 Tsinghua Duan
不能依据这⼀点做判断 EXPERIMENT JSBUNDLE: 2 HTML: Webkit Pop-up:18 CFNetwork Embedded: 17 Normal: 3 Finding 基本所有正常的 APP 包含服务器返回 url ,并之后访问此 url 的情况。 devious App 全是这种情况。
第三⽅库 TO-DO 通过动态调试⼿段,观察 app 调⽤函数的相同点 iOS 新的 challenge : ⽭盾点: ⼤量 app 基于 webview 加载 ⽆法判断是否是灰产 Parallel webview 类似占⽐
Recommend
More recommend