inferring the purposes of network tra ffi c in mobile apps
play

Inferring the Purposes of Network Tra ffi c in Mobile Apps Who - PowerPoint PPT Presentation

Inferring the Purposes of Network Tra ffi c in Mobile Apps Who (which app) sends the data? Where the data is being sent to? What data is being collected? Why the data is being collected? Haojian Jin , Minyi Liu, Yuanchun Li, Gaurav Srivastava,


  1. Inferring the Purposes of Network Tra ffi c in Mobile Apps Who (which app) sends the data? Where the data is being sent to? What data is being collected? Why the data is being collected? Haojian Jin , Minyi Liu, Yuanchun Li, Gaurav Srivastava, Matthew Fredrikson, Yuvraj Agarwal, Jason Hong 1

  2. who: Camera app who: Uber what: location what: location why: to tag photos why: to locate pickup location who request what data, and why 2

  3. These descriptions are only shown at the user interface layer and can be arbitrary text. No way to verify and not yet widely adopted. 3

  4. APIs user interface layer, arbitrary text no way to verify when an app calls an API and post data to remote servers over the network. network perspective 4

  5. Can we index the privacy attributes of each network request similarly as the permission dialog? APIs who, where, what, why 5

  6. https://maps.google.com Who (which app) sends the data? Uber Where the data is being sent to? Google What data is being collected? Location Why the data is being collected? Map/navigation 6

  7. Towards a public, large scale privacy database to improve the transparency of mobile data collection 7

  8. Related work 8

  9. https://maps.google.com 1, 2 State of the art Who (which app) sends the data? Uber Where the data is being sent to? Google What data is being collected? Location Why the data is being collected? Map/navigation [1] Who Knows What About Me? A Survey of Behind the Scenes Personal Data Sharing to Third Parties by Mobile Apps [2] ReCon: Revealing and Controlling PII Leaks in Mobile Network Tra ffi c 9

  10. Related work Who Knows What About Me? Zang et al. https://techscience.org/a/2015103001/ 10

  11. Related work Who Knows What About Me? https://techscience.org/a/2015103001/ Recruit ~10 participants Install VPN on their phones Test 10~100 apps Raw data type (e.g., email addr.) 11

  12. https://maps.google.com State of the art Who (which app) sends the data? Uber Where the data is being sent to? Google What data is being collected? Location Why the data is being collected? Map/navigation less explored. 12

  13. Related work Exposing the Data Sharing Practices Expectation and Purpose [Ubicomp’12] of Smartphone Apps [CHI’ 17] 13

  14. Related work Purposes are manually annotated by researchers. Exposing the Data Sharing Practices Expectation and Purpose [Ubicomp’12] of Smartphone Apps [CHI’ 17] 14

  15. MobiPurpose is a scalable in-lab solution that can index fj ne-grained privacy attributes ( who , where , what , why ) of outgoing network requests . 15

  16. 3 modules 1 Scalable network tracing 2 Data types & purposes taxonomy 3 Automated Inference 16

  17. 1 Network tracing large scale network requests at a low cost 17

  18. …… downloaded 185, 173 apps Hardware & software setup (1) 18

  19. …… installed 30,075 apps (due to OS compatibility, etc) Hardware & software setup (2) 19

  20. a men-in-the-middle VPN proxy app 3 minutes UI automation for each running for 50 days We open source the tools at: http://bit.ly/mobipurpose Hardware & software setup (3) 20

  21. Tra ffj c request snapshot source app: com.inkcreature.predatorfree connect to host: inkcreature.com server path: /_predatorServer/ key-value pairs in request body: myLat: 40.4435877 myLon: -79.9452883 .... Raw Tra ffi c Data 21

  22. Tra ffj c request snapshot source app: Who? com.inkcreature.predatorfree connect to host: Where? inkcreature.com server path: /_predatorServer/ key-value pairs in request body: Key-value pairs myLat: 40.4435877 myLon: -79.9452883 .... Raw Tra ffi c Data 22

  23. Tra ffj c request snapshot source app: 2,008,912 unique tra ffi c requests com.inkcreature.predatorfree from 14,910 apps connect to host: inkcreature.com server path: contacting /_predatorServer/ 12,046 unique domains 302,893 unique URLs key-value pairs in request body: myLat: 40.4435877 myLon: -79.9452883 .... We publish the dataset at: http://bit.ly/purposedata Tra ffi c Data stats 23

  24. 2 Taxonomy de fj ne and categorize purposes 24

  25. “ usage strings ” in iOS/Android Arbitrary texts are hard to aggregate , analyze and verify. 25

  26. • Many apps collect users’ data for similar purposes. • There are enumerable purposes. 10-50 depends on the granularity. Observations 26

  27. generate text describing the purpose build a taxonomy and classify the purpose 27

  28. 1 Comprehensive and extendable covers the majority of use cases 1. 2 Meaningful granularity not too narrow nor too broad 3 Understandable minimal explanation for dev and users Design the Taxonomy 28

  29. 10 CS graduate students categorizing 1000+ network requests and 300+ permission usages 3 independent sessions a ffi nity diagram 29

  30. Purpose at App level why a user downloads the app (e.g., app categories - Game) Purpose at Network level why an app sends the request the app (e.g., library categories - Ad) purpose granularity 30

  31. Purpose at App level why a user downloads the app (e.g., app categories - Game) Purpose at Network level why an app sends the request the app (e.g., library categories - Ad) Purpose at Data level why a developer collects the data (e.g., nearby search) purpose granularity 31

  32. Purpose at App level why a user downloads the app (e.g., app categories - Game) Purpose at Network level why a app sends the request the app (e.g., library categories - Ad) Purpose at Data level why a developer collects the data (e.g., usage descriptions) contains most privacy details, consistent with usage strings purpose granularity 32

  33. data types location taxonomy typology 33

  34. data types data purposes examples nearby search location 34

  35. data types data purposes examples nearby search location-based customization location 35

  36. data types data purposes examples nearby search location-based customization location ad analytics …… …… 36

  37. Data purposes for location data See the complete taxonomy at: http://bit.ly/mobitaxonomy 37

  38. data types 38

  39. Bluetooth extensibility 39

  40. 3 Automated inference 40

  41. Tra ffj c request snapshot source app: com.inkcreature.predatorfree connect to host: inkcreature.com What data is being collected? server path: /_predatorServer/ Why the data is being collected? key-value pairs in request body: myLat: 40.4435877 myLon: -79.9452883 .... input output 41

  42. 1 Self-explainable patterns userAdvertisingId : 901e3310-3a26-487e-83c7-2fa26ac2786c advertising, Id machine generated UUID http://reports.crashlytics.com report, crash, analytics intuitions 42

  43. 1 Self-explainable patterns 2 External knowledge (app type, server domain) a game app sends location data to http://admob.com A mobile ad company intuitions 43

  44. a bootstrapping method to predict the data type data type inference 44

  45. purposes candidates search nearby location-based customization transportation information recording map/navigation geosocial networking taxonomy lookup to get geotagging the purpose candidates location spoo fj ng alert and remind location-based game reverse geocoding advertising analytics taxonomy lookup 45

  46. Tra ffj c request snapshot Source app feature source app: predator is an o ff ender registry search app com.inkcreature.predatorfree connect to host: inkcreature.com server path: /_predatorServer/ key-value pairs in request body: myLat: 40.4435877 myLon: -79.9452883 .... purpose features 46

  47. Tra ffj c request snapshot Source app feature source app: predator is an o ff ender registry search app com.inkcreature.predatorfree connect to host: Textual feature inkcreature.com the app sends data to its own server server path: /_predatorServer/ key-value pairs in request body: myLat: 40.4435877 myLon: -79.9452883 .... purpose features 47

  48. Tra ffj c request snapshot Source app feature source app: predator is an o ff ender registry search app com.inkcreature.predatorfree connect to host: Textual feature inkcreature.com the app sends data to its own server server path: /_predatorServer/ Domain feature key-value pairs in request body: - company business type (Crunchbase) myLat: 40.4435877 - decompile app fj les to mine the domain myLon: -79.9452883 references .... purpose features 48

  49. probability purposes candidates 0.72 search nearby source app feature: 0.2 location-based customization predator is an o fg ender registry search app 0.03 transportation information 0.02 recording textual feature: 0.02 map/navigation the app sends data to its own server supervised 0.01 geosocial networking domain feature: learning 0 geotagging - company business type from Crunchbase 0 location spoo fj ng - decompile large scale app fj les to 0 alert and remind mine the domain references 0 location-based game 0 reverse geocoding 0 advertising 0 analytics supervised machine learning 49

  50. Evaluation accuracy & recall 50

  51. Labeling ”what” & “why” in each tra ffi c request. Each request has been labeled by three people. data set labeling 51

Recommend


More recommend