ReCon: Revealing and Controlling PII Leaks in Mobile Network Systems - PowerPoint PPT Presentation

ReCon: Revealing and Controlling PII Leaks in Mobile Network Systems Jingjing Ren, Martina Lindorfer, Ashwin Rao, Arnaud Legout, David Choffnes (MobiSys ‘16) Presented by : Umar Farooq CS 563 Fall 2018

Mobile Phones today.. q Offer ubiquitous connectivity q Equipped with a wide array of sensors q Examples; GPS, camera, microphone etc.

Problems q Personally identifiable info. (PII) leakage § Device Identifiers (IMEI, MAC address, etc.) § User Information (name, gender, contact info, etc.) § Location (GPS, zip code) § Credentials (?) q Device Fingerprinting q Cross Platform tracking

App Store Google Play WP Store 0.6 0.5 0.4 0.3 0.2 0.1 0 User Identifier Contact Info Location Credential Device Identifier (email, name, (username, (IMEI, Advertiser gender etc.) password) ID, MAC etc.)

Goals for this work q Identify PII leakage without a priori information q Provide users a platform to view potential PII leaks (i.e increase user visibility and transparency)

Approach.. q Opportunity: Almost all devices support VPNs q Have a trusted third party system to audit network flows § Tunnel traffic to a controlled server (trusted server) § Measure, modify, shape or block - traffic with user opt in

Why should this work?

So, what does a PII look like? GET /index.html?id=12340;foo=bar;name=CS5 63@Illini;pass=jf3jNF#5h How can we identify a PII leak? Naïve approach: Pattern matching.

ReCon: A system using supervised ML to accurately identify and control PII leaks from network traffic with crowdsource reinforcement.

Automatically Identifying PII leaks q Hypothesis: PII leaks have distinguishing characteristics § Is it just simple key/value pairs (e-g “user_id=563”) • Nope, leads to high FPR (5.1%) and high FNR (18.8%). q Need to learn structure of PII leaks. q Approach: Build ML classifiers to reliably detect leaks. § Doesn’t require knowing PII in advance § Resilient to changes in PII formats over time.

Initial Training Flows Features Training Model Continuous training with user feedback Manual test: top 100 apps from each official • User Feedback store User Automatic test: top 850 Android apps from a • Flows Model Prediction Interface third party store Rewriter architecture

Initial Training Flows Features Training Model Continuous training with user feedback • Feature extraction: bag of words User Feedback User Flows Model Prediction Interface Rewriter architecture

Initial Training Flows Features Training Model Continuous training with user feedback • Feature extraction: bag of words User Feedback • Use thresholds to remove infrequent or too frequent words User Flows Model Prediction Interface Rewriter architecture

Initial Training Flows Features Training Model Continuous training with user feedback Ground truth from the controlled experiments • User Feedback C4.5 decision tree • User Per-domain and per-OS classifier • Flows Model Prediction Interface Rewriter architecture

Initial Training Flows Features Training Model Continuous training with user feedback User Feedback User Flows Model Prediction Interface Rewriter architecture

Evaluation – Accuracy (CCR) DT outperforms Naïve Bayes • Time: DT based ensembles take more time than a simple DT • More than 95% accuracy per-domain-and per OS l • Greater than the General Classifier • 60% DTs zero error. •

Evaluation – Accuracy (AUC) Area under the curve (AUC) [0,1] • - Demonstrates the predictive power of the classifier Most (67%) DT-based classifiers have AUC = 1 •

Evaluation – Accuracy (FNR and FPR) Most DT based classifiers have zero FPs (71.4%) and FNs (76.2%)

Evaluation – Comparison with IFA q Information flow analysis (IFA) § Resilient to encrypted / obfuscated flow • Dynamic IFA: Andrubis • Static IFA: Flowdroid • Hybrid IFA: AppAudit Information flow analysis (IFA) q Susceptible to false positives, but not false negatives

ReCon vs. static and dynamic analysis 120 .0 % 100 .0 % 80 .0 % Fl ow Droi d(Sta tic IF A) An dru bi s (Dyn a mic I FA) 60 .0 % Ap pA ud it(Hy brid I FA) Re Co n 40 .0 % 20 .0 % 0. 0% De v ic e I de n tif ier Us er Id en tif ie r Co n ta c t I nf o L oc a t i on

Initial Training Flows Features Training Model Continuous training with user feedback User Feedback User Flows Model Prediction Interface Rewriter architecture

ReCon: q The retraining phase is important § FP decreased by 92% § FN increased by 0.5%

ReCon in the wild q 239 users in March 2016 (IRB approved) q 137 iOS, 108 Android devices q 14,101 PII found and 6,747 confirmed by users q 21 apps exposing passwords in plaintext § Used by millions (Match, Epocrates) § Responsibly disclosed

Discussion q Challenges § Encrypted Traffic (totally reliant on plaintext traffic) § 10-fold cross validation, does it help? • 2.2% FP and 3.5% FN, but what about overfitting? • Network flows too diverse, is the model generalizable? § Can miss out on PII leaks (FN) if model not trained for that class of PII. Standard program analysis susceptible to false positives, but not false negatives

Discussion - continued q Can we use this approach for IoT devices? § Device Identification? § PII leakage? § Monitor if IoT devices “talk” to themselves?

Questions?

ReCon: Revealing and Controlling PII Leaks in Mobile Network Systems - PowerPoint PPT Presentation

ReCon: Revealing and Controlling PII Leaks in Mobile Network Systems Jingjing Ren, Martina Lindorfer, Ashwin Rao, Arnaud Legout, David Choffnes (MobiSys 16) Presented by : Umar Farooq CS 563 Fall 2018 Mobile Phones today.. q Offer

IC OFF THE RECORD: Direct access to leaked information related to the surveillance activities of

Debugging Memory Leaks in .NET CONTACT@ADAMFURMANEK.PL HTTP://BLOG.ADAMFURMANEK.PL FURMANEKADAM

Multi-touch Interface for Controlling Multiple Mobile Robots Igarashi Laboratory, The University

ARM EDITION Matt Spisak REcon 2016, Montreal RECON 2016 ABOUT Offense-based approach to

In the news Data Leaks Mar 19 Apr 19 Mar 18 shutdown after data leaks exposed user

Rank Revealing QR factorization F. Guyomarch, D. Mezher and B. Philippe 1 Outline

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

Alexei Bulazel @0xAlexei REcon Brussels 2018 About Me Security researcher at River Loop

Revealing the origin of the X-ray variability in Sco X-1 Xiaofeng Cao Huazhong Normal University

Detecting Pipeline Leaks Whats the Right Approach? October 20, 2016 | Pipeline Safety

AQUAPHONIE TOXIQUE TROTTOIR ABOUT THE SHOW AQUAPHONIE SUITE FOR LEAKS IN GLEE MAJOR! This

Understanding Understanding and Controlling and Controlling the Risk of the Risk of

Mobile Device Integration Enhances Geo-Spatial Tracking | Salient CRGT Proprietary |

Revealing The Separation Problem H-Component of Magnetic field Data in Niemegk, Germany

Corrosion Mitigation What is the Problem? Leaks leading to: Wastage Lost

Controlling Controlling Palmer Palmer Amaranth in Amaranth in Soybean Soybean Eric P.

DRM obfuscation vs auxiliary attacks Show me your trace and Ill tell you who you are REcon

Com Complication ons of of SC C Recon onstruction ons Larry D. Field, MD Mississippi

STRUCTURAL GROUTING PERMANENTLY STOPPING LEAKS IN MRT TUNNELS MRT Tunnel Walls & Floors

Astrometry: Revealing the Other Astrometry: Revealing the Other Two Dimensions of Velocity Two

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

International Symposium on revealing the history of the universe with underground particle and

Controlling Quantum Systems Controlling Quantum Systems with Spatial Adiabatic Passage Thomas

6.2 Controlling the Visibility of Data the Visibility of Data 6.2 Controlling Area

ReCon: Revealing and Controlling PII Leaks in Mobile Network Systems - PowerPoint PPT Presentation

ReCon: Revealing and Controlling PII Leaks in Mobile Network Systems Jingjing Ren, Martina Lindorfer, Ashwin Rao, Arnaud Legout, David Choffnes (MobiSys 16) Presented by : Umar Farooq CS 563 Fall 2018 Mobile Phones today.. q Offer

IC OFF THE RECORD: Direct access to leaked information related to the surveillance activities of

Debugging Memory Leaks in .NET CONTACT@ADAMFURMANEK.PL HTTP://BLOG.ADAMFURMANEK.PL FURMANEKADAM

Multi-touch Interface for Controlling Multiple Mobile Robots Igarashi Laboratory, The University

ARM EDITION Matt Spisak REcon 2016, Montreal RECON 2016 ABOUT Offense-based approach to

In the news Data Leaks Mar 19 Apr 19 Mar 18 shutdown after data leaks exposed user

Rank Revealing QR factorization F. Guyomarch, D. Mezher and B. Philippe 1 Outline

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

Alexei Bulazel @0xAlexei REcon Brussels 2018 About Me Security researcher at River Loop

Revealing the origin of the X-ray variability in Sco X-1 Xiaofeng Cao Huazhong Normal University

Detecting Pipeline Leaks Whats the Right Approach? October 20, 2016 | Pipeline Safety

AQUAPHONIE TOXIQUE TROTTOIR ABOUT THE SHOW AQUAPHONIE SUITE FOR LEAKS IN GLEE MAJOR! This

Understanding Understanding and Controlling and Controlling the Risk of the Risk of

Mobile Device Integration Enhances Geo-Spatial Tracking | Salient CRGT Proprietary |

Revealing The Separation Problem H-Component of Magnetic field Data in Niemegk, Germany

Corrosion Mitigation What is the Problem? Leaks leading to: Wastage Lost

Controlling Controlling Palmer Palmer Amaranth in Amaranth in Soybean Soybean Eric P.

DRM obfuscation vs auxiliary attacks Show me your trace and Ill tell you who you are REcon

Com Complication ons of of SC C Recon onstruction ons Larry D. Field, MD Mississippi

STRUCTURAL GROUTING PERMANENTLY STOPPING LEAKS IN MRT TUNNELS MRT Tunnel Walls &amp; Floors

Astrometry: Revealing the Other Astrometry: Revealing the Other Two Dimensions of Velocity Two

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

International Symposium on revealing the history of the universe with underground particle and

Controlling Quantum Systems Controlling Quantum Systems with Spatial Adiabatic Passage Thomas

6.2 Controlling the Visibility of Data the Visibility of Data 6.2 Controlling Area

STRUCTURAL GROUTING PERMANENTLY STOPPING LEAKS IN MRT TUNNELS MRT Tunnel Walls & Floors