probabilistic visitor stitching on cross device web logs
play

Probabilistic Visitor Stitching on Cross-Device Web Logs Sungchul - PDF document

Probabilistic Visitor Stitching on Cross-Device Web Logs Sungchul Kim Nikhil Kini Jay Pujara Adobe Research UC Santa Cruz UC Santa Cruz San Jose, CA 95110 Santa Cruz, CA 95064 Santa Cruz, CA 95064 sukim@adobe.com nkini@ucsc.edu


  1. Probabilistic Visitor Stitching on Cross-Device Web Logs Sungchul Kim Nikhil Kini Jay Pujara Adobe Research UC Santa Cruz UC Santa Cruz San Jose, CA 95110 Santa Cruz, CA 95064 Santa Cruz, CA 95064 sukim@adobe.com nkini@ucsc.edu jay@cs.umd.edu Eunyee Koh Lise Getoor Adobe Research UC Santa Cruz San Jose, CA 95110 Santa Cruz, CA 95064 eunyee@adobe.com getoor@soe.ucsc.edu ABSTRACT work, tablets, mobile devices, vehicles, and entertainment systems. Across these varied devices, users expect services Personalization – the customization of experiences, inter- to remember their preferences and provide a seamless user faces, and content to individual users – has catalyzed user experience and interface. However, users frequently access growth and engagement for many web services. A critical these services from a mixture of authenticated and anony- prerequisite to personalization is establishing user identity. mous sessions, making it difficult to identify the user and However the variety of devices, including mobile phones, ap- provide a tailored experience. The problem of consolidating pliances, and smart watches, from which users access web multiple visits across different devices and sessions into a services from both anonymous and logged-in sessions poses single user identity is known as visitor stitching. a significant obstacle to user identification. The resulting Traditionally, web services have relied on cookies to iden- entity resolution task of establishing user identity across de- tify users. However, in two real-world datasets we examine, vices and sessions is commonly referred to as “visitor stitch- over half of the users have multiple cookie identifiers. This ing.” We introduce a general, probabilistic approach to vis- problem has been documented in a number of research stud- itor stitching using features and attributes commonly con- ies. Dasgupta et al. [8] demonstrate that users often possess tained in web logs. Using web logs from two real-world cor- more than one cookie identifier and Coey et al. [6] showed porate websites, we motivate the need for probabilistic mod- that in an online experiment with treatment and control els by quantifying the difficulties posed by noise, ambiguity, groups, cookie-level assignment resulted in imperfect design, and missing information in deployment. Next, we introduce and has the potential to under-estimate the true treatment our approach using probabilistic soft logic (PSL), a statisti- effects. In fact, users may not only possess multiple cookie cal relational learning framework capable of capturing sim- identifiers, but they may also have identifiers across multi- ilarities across many sessions and enforcing transitivity. We ple devices, browsers, or even share them between different present a detailed description of model features and design users. For IT companies providing large-scale web services, choices relevant to the visitor stitching problem. Finally, stitching together web logs belonging to unique users across we evaluate our PSL model on binary classification perfor- several sources is a crucial barrier to accurately estimating mance for two real-world visitor stitching datasets. Our behaviors and statistics at the user level. model demonstrates significantly better performance than Typical approaches to solving the visitor stitching task several state-of-the-art classifiers, and we show how this ad- rely on proprietary information specific to a particular do- vantage results from collective reasoning across sessions. main, such as search behavior, purchase history, or topi- cal and content information [5, 9, 8]. A related problem, Keywords identifying the same user across social networks [15, 30], Visitor stitching; Cross-device users; Personalization has also been solved using proprietary information, features specific to social networks, and domain-specific problem for- mulations, such as bipartite matching. The success of these 1. INTRODUCTION approaches demonstrates the promise of visitor stitching. Ubiquitous computing has transformed the landscape of However, the reliance on proprietary features and problem how society interacts with web services. A single user will settings makes it difficult to generalize these contributions often access web services from a wide range of devices, in- across a broader set of applications. In this paper, we enu- cluding desktop and laptop computers at both home and merate features universally available in web logs, and per- form an analysis of the discriminative power of these features using real-world data from two different companies. c � 2017 International World Wide Web Conference Committee (IW3C2), One conclusion of our analysis is that web log features published under Creative Commons CC BY 4.0 License. inherently vary widely in discriminative power. We propose WWW 2017, April 3–7, 2017, Perth, Australia. a probabilistic approach that is capable of learning the reli- ACM 978-1-4503-4913-0/17/04. ability of web log features and combining these features to http://dx.doi.org/10.1145/3038912.3052711 improve discriminative power. Our solution utilizes proba- bilistic soft logic (PSL) [1], a popular statistical relational learning framework, to construct a general-purpose model . 1581

Recommend


More recommend