graphing crumbling cookies
play

Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and - PowerPoint PPT Presentation

Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn What is a device graph? a dataset that organizes digital identifiers that we create as we use the internet identifiers (IDs): browser cookies or advertising


  1. Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn

  2. What is a device graph? • a dataset that organizes digital identifiers that we create as we use the internet • identifiers (IDs): browser cookies or advertising IDs • a graph is a set of vertices and edges • a list of pairs of identifiers that are in some way related id_1 id_2 score 3D0F8F 54D3A8 3.936 7F3E10 6FFE0A 1.400 8764CF 10AFC8 3.440 501EE5 62A1F3 3.045 1F39D3 4B2686 4.763 638581 85B16 1.917 • related: same person, same household • example: two digital IDs that login with same email • Why? Targeting, content customization and accurate measurement bobfano@gmail.com bobfano@gmail.com 2

  3. Building a graph using IP-colocation • IP space is intimate • Your devices share an IP when connected to the same WiFi router • You share an IP with family, friends and co-workers . . . • ideal world: static residential IPs IP 2 IP n IP 1 • problem: IPs are dynamic, mobile operator/corporate IPs, coffee shops • observation: even when IP changes, devices travel through IP-space together over course of weeks basic idea: associate devices with each other, not IP IP 1 , IP 2 , … 3

  4. Building a graph IP 1 day 1: iPhone is home with PC ½ 1 day 2: iPhone is home alone . IP 2 . day 3: iPhone is at work with 8 devices ⅛ . day 4: iPhone is at home with PC IP 3 • score proportional to number of days two devices spend alone on an IP 4 Malloy, M., Barford, P., Alp, E. C., Koller, J., & Jewell, A. (2017, August). Internet Device Graphs. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1913-1921). ACM.

  5. Comscore’s Device Graph Comparison Benchmark Graphs* Graph Nodes Edges LiveJournal 4.8M 69M Twitter 42M 1.5B UK web graph 2007 109M 3.7B Yahoo Web 1.4B 6.6B Comscore’s Device Graph (April 2019) Facebook Graph 2016 1.39B 400B • 12 countries • 3.4 Billion nodes (cookies/advertising IDs) • 17.1 Billion edges (relationships) 5 *Adapted from: Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., & Muthukrishnan, S. (2015). One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment, 8(12), 1804-1815..

  6. Community Detection HH 1 HH 5 finding community structure HH 2 HH 3 HH 4 • goal: group identifiers into cohorts (person and household level groupings) • community detection in graphs is a well studied problem • Literature/code for finding community structure (but not billions of nodes/edges) • Louvain Modularity* 6 *Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.

  7. Challenge: non-persistent IDs • 3.4 Billion persistent IDs (in 12 countries) • 5-10x more non -persistent IDs • excluded from graphing process • incognito/private browsing (session cookies) • ITP (Intelligent Tracking Prevention) • 20+ Billion IDs worldwide not amenable to graphing or community detection 7

  8. <latexit sha1_base64="vaSE6dHl+KuSivsID1FiIdDGIHQ=">ACWnicbVHbSgMxEM2u93qrlzdfgkVQkLJbBX0UfdG3CrYVumWZTbNtMHsxmVXKsj/piwj+imBaV7CtA4EzZ85MJidBKoVGx/mw7IXFpeWV1bXK+sbm1nZ1Z7etk0wx3mKJTNRjAJpLEfMWCpT8MVUcokDyTvB0M653XrjSIokfcJTyXgSDWISCARrKrz57EeCQgcxvCt+lnhKDIYJSySv1JA/Ry+nxRKi/K5pJKf0N+3DyHeLk9NpQWNa0DACr5+gLkd7hV+tOXVnEnQeuCWokTKafvXNDGBZxGNkErTuk6KvRwUCiZ5UfEyzVNgTzDgXQNjiLju5RNrCnpkmD4NE2VOjHTC/u3IdJ6FAVGOd5az9bG5H+1bobhZS8XcZohj9nPRWEmKSZ07DPtC8UZypEBwJQwu1I2BAUMzW9UjAnu7JPnQbtRd8/qjfvz2tV1acqOSCH5Ji45IJckVvSJC3CyDv5spatFevTtu01e/1Haltlzx6ZCnv/G1wqtKI=</latexit> <latexit sha1_base64="z3vgj/ehOLGS3WySOAGJFqMvTc=">ACM3icbVDLSgMxFM34rPVdenmYhEUpcyoAhC0Y26qmBV6NQhk6ZtMkMSUYsw/yTG3/EhSAuFHrP5hpu/B14MLJufeQe08Yc6aN6z47I6Nj4xOThani9Mzs3HxpYfFCR4kitE4iHqmrEGvKmaR1wynV7GiWIScXoY3R3n/8pYqzSJ5bnoxbQrckazNCDZWCkqnvgiju9R0qYQMjgLvegMOwE+Bwb4tn0lY8wU2XSXSk1oWeJswcLRwz7W/Qx8ksS5MyiV3YrbB/wl3pCU0RC1oPTotyKSCoN4VjrhufGpliZRjhNCv6iaYxJje4QxuWSiyobqb9mzNYtUoL2pGyJQ301e+OFAuteyK0k/n6+ncvF/rNRLT3mumTMaJoZIMPmonHEwEeYDQYoSw3uWYKY3RVIFytMjI25aEPwfp/8l1xsVbztytbZTrl6OIyjgJbRClpDHtpFVXSMaqiOCLpHT+gVvTkPzovz7nwMRkecoWcJ/YDz+QUK6gm</latexit> <latexit sha1_base64="GBzhxlA8csF0BFoAqow+QKrutzI=">AB7nicbVBNSwMxEJ2tX7V+VT16CRbBU9mtgh6LXjxWsB/QLiWbZtvQbBKSrFiW/gvHhTx6u/x5r8xbfegrQ8GHu/NMDMvUpwZ6/vfXmFtfWNzq7hd2tnd2z8oHx61jEw1oU0iudSdCBvKmaBNynHaUpTiJO29H4dua3H6k2TIoHO1E0TPBQsJgRbJ3U7mGltHzqlyt+1Z8DrZIgJxXI0eiXv3oDSdKECks4NqYb+MqGdaWEU6npV5qMJkjIe06jACTVhNj93is6cMkCx1K6ERXP190SGE2MmSeQ6E2xHZtmbif953dTG12HGhEotFWSxKE45shLNfkcDpimxfOIJpq5WxEZY2JdQmVXAjB8surpFWrBhfV2v1lpX6Tx1GEziFcwjgCupwBw1oAoExPMrvHnKe/HevY9Fa8HLZ47hD7zPH5Swj7o=</latexit> <latexit sha1_base64="9KQfBfWlZmi6WmeVgDo3hyOIgNQ=">ACSXicbVBNSxBEO1Z49fGmDUevRZAgphmdGA4kniJbmt4K7CzjLW9PZoY3/R3SMuw/y9XHLz5n/IxUNCyCm9H4H48aDg9atXVPXLjeDOx/F91Fh4tbi0vLafL32Zv1ta+Nd3+nSUtajWmh7nqNjgivW89wLdm4sQ5kLdpZfH0/6ZzfMOq7VqR8bNpR4qXjBKfogZa2LVOb6tuIF1JBWwOEwVMoVbKcS/ZWV1dunSUfYeYb4Ti8dtJgpmign/1zmRogRWOsvoWpRlFUx8GbtdpxJ54CnpNkTtpkjm7WuktHmpaSKU8FOjdIYuOHFVrPqWB1My0dM0iv8ZINAlUomRtW0yRq+BCUERTahlIepur/ExVK58YyD87Jke5pbyK+1BuUvjgYVlyZ0jNFZ4uKUoDXMIkVRtwy6sU4EKSWh1uBXqF6kP4zRBC8vTLz0l/t5PsdXZPrWPs/jWCFb5D3ZJgnZJ0fkC+mSHqHkG/lBfpJf0foIfod/ZlZG9F8ZpM8QmPhL7w7sbw=</latexit> Backfilling Key Ideas: • Once cohorts of persistent IDs are defined, find the IP addresses that are associated with the cohort over time: C 1 → { (IP 1 , day 1 ) , (IP 2 , day 2 ) , . . . } • Ruleset: if the persistent IDs defined by the IP address are synonymous with the group defined by cohort, then assign non-persistent IDs to cohort: if { i : i ∈ (IP 1 , day 1 ) } ∩ V p ≈ C 1 then C + 1 = { i : i ∈ (IP 1 , day 1 ) } ∪ C 1 • Precision and recall are used to define approximate equality ( ) ≈ • Results: assign additional 2+ Billion IDs to cohorts in the US 8

  9. Privacy • Internet is great. It’s funded by ads. respecting user privacy efficiency: more relevant ads • Current/future landscape • Increases in non-persistent identifiers and rejection of 3 rd party cookies • Safari, Firefox, likely more to come • Legislation - GDPR (Europe) and CCPA (California) • Favor large entities with login information (Google, Facebook, Apple) 9

  10. How to opt-out • Reject 3 rd party cookies. • Turn off your advertising ID. 10

  11. Questions? Device Graph Publications • Graphing Crumbling Cookies, AdKDD (Malloy, Koller, Cahn) • Device Graphing by Example, KDD 2018 (Funkhouser, Malloy, Alp, Poon, Barford) • Internet Device Graphs, KDD 2017 (Malloy, Barford, Alp, Koller, Jewell) 11

Recommend


More recommend