Effjcient Similarity Computation for Collaborative Filtering in - PowerPoint PPT Presentation

Effjcient Similarity Computation for Collaborative Filtering in Dynamic Environments Olivier Jeunen 1 , Koen Verstrepen 2 , Bart Goethals 1,2 September 18th, 2019 1 Adrem Data Lab, University of Antwerp 2 Froomle olivier.jeunen@uantwerp.be 1

Introduction & Motivation

Setting the scene i 5 t 3 u 2 i 4 t 4 u 2 i 2 t 5 u 3 i 1 u 2 t 7 u 1 u 2 i 7 t 8 u 3 i 6 t 9 We deal with implicit feedback : a set of (user, item, timestamp) -triplets, representing clicks, views, sales, … Suppose we have a set of pageviews of this form. i 3 t 6 t 2 u 1 i 2 2 i 1 t 1 u 1                                         . . . . . . . . .

Problem statement 1 0 0 0 0 0 0 0 0 1 In neighbourhood-based collaborative 0 0 0 1 0 0 0 0 0 1 0 1 Still a very competitive baseline, but often deemed unscalable 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 3 0 0 user-item matrix P . high-dimensional columns in the Items are represented as sparse , between pairs of items . 0 0 0 1 fjltering 1 , we need to compute similarity 0 0 0 1 0 0 0 1 0   . . .   . . .     . . .     . . .       . . . . . . . . . . . . . . . . . . . . .     . . .      . . .      . . .     . . .     . . .

A need for speed Typically, the model is periodically recomputed . 4 time-consuming and model recency is often sacrifjced . For ever-growing datasets, these iterative updates can become very Iterative model updates over time 60 50 40 runtime 30 20 ∆ t 10 ... ∆ t 0 0 20 40 60 80 100 120 140 160 time

Previous work Existing approaches tend to speed up computations through • Approximation. • Parallellisation . • Incremental computation. But currently existing exact solutions do not exploit the sparsity that is inherent to implicit-feedback data streams. 5

Contribution & Methodology

Incremental Similarity Computation In the binary setting, cosine -similarity simplifjes to the number of users recomputing the entire similarity with every update: As such, we can compute these building blocks incrementally instead of 6 that have seen both items, divided by the square root of their individual numbers . |U i ∩ U j | cos( i , j ) = = √ N i M i , j � � N j � |U i | |U j | N ∈ N n : N i = |U i | and M ∈ N n × n : M i , j = |U i ∩ U j | .

Dynamic Index Existing approaches tend to build inverted indices in a preprocessing step… we do this on-the-fmy ! Initialise a simple inverted index for every user , to hold their histories. 1. Increment item co-occurence for i and other items seen by u . 2. Update the item’s count . 3. Add the item to the user’s inverted index. 7 For every pageview ( u , i ) :

Online Learning As Dynamic Index consists of a single for-loop over the pageviews, 8 it can naturally handle streaming data. Impact of Online Learning × 10 7 4 runtime (s) 3 ∆ t 2 | ∆ P| 1 t i t i +1 0 0 1 2 3 4 5 6 × 10 3 |P|

Parallellisation Procedure We adopt a MapReduce -like parallellisation framework: • Mapping is the Dynamic Index algorithm. 9 • Reducing two models M = { M , N , L} and M ′ = { M ′ , N ′ , L ′ } is: 1. Summing up M , M ′ and N , N ′ 2. Cross-referencing ( u , i ) -pairs from L [ u ] with ( u , j ) -pairs from L ′ [ u ] . Step 2 is obsolete if M and M ′ are computed on disjoint sets of users !

Recommendability Often, the set of items that should be considered as recommendations is the set of recommendable items at time t , and argue that it is often much smaller than the full item collection . recommendable : To keep up-to-date with recommendability updates: add a second inverted index for every user. 10 constrained by recency, stock, licenses, seasonality, … We denote R t as �R t � ≪ �I� As such, we only need an up-to-date similarity sim ( i , j ) if either i or j is i ∈ R t ∨ j ∈ R t

Experimental Results

Datasets 5 e 6 sparsity item-item matrix sparsity user-item matrix mean users per item Table 1: Experimental dataset characteristics. 1 e 6 297 e 3 18 e 3 27 e 3 # items 113 e 6 mean items per user 480 e 3 Outbrain Movielens* Netfmix* 138 e 3 News 11 # “events” 100 e 6 # users 20 e 6 96 e 6 200 e 6 144 . 41 209 . 25 18 . 29 1 . 76 747 . 84 5654 . 50 242 . 51 184 . 50 99 . 46 % 98 . 82 % 99 . 99 % 99 . 99 % 59 . 90 % 0 . 22 % 99 . 83 % 99 . 98 %

RQ1: Are we more effjcient than the baselines? 12 Movielens Netflix × 10 3 × 10 4 1.05 1.2 0.90 runtime (s) 1.0 0.75 0.8 0.60 0.6 0.45 0.4 0.30 0.2 0.15 0.0 0.00 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00 × 10 7 × 10 8 News Outbrain × 10 3 × 10 3 7 1.2 6 1.0 runtime (s) 5 0.8 4 0.6 3 0.4 2 1 0.2 0 0.0 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 × 10 8 × 10 8 |P| |P| Sparse Baseline Dynamic Index

RQ1: Are we more effjcient than the baselines? Observations • More effjcient if M is sparse • More effjcient if users have shorter histories • Average number of processed interactions per second ranges from 14 500 to 834 000 13

RQ2: How effective is parallellisation? 14 Movielens Netflix × 10 3 8 × 10 3 1.4 7 1.2 6 runtime (s) 1.0 5 0.8 4 0.6 3 0.4 2 0.2 1 0.0 0 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 0.0 0.2 0.4 0.6 0.8 1.0 × 10 7 × 10 8 News Outbrain × 10 3 × 10 2 3.0 4 2.5 runtime (s) 3 2.0 1.5 2 1.0 1 0.5 0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 × 10 8 × 10 8 |P| |P| n = 1 n = 2 n = 4 n = 8

RQ2: How effective is parallellisation? Observations • Incremental updates complicate reduce procedure: • For suffjciently large batches , performance gains are tangible . • For small batches , single-core updates are preferred . 15 • Speedup factor of > 4 for Netfmix and News datasets with 8 cores

RQ3: What is the effect of constrained recommendability? 16 News ( n = 8) 10 3 runtime (s) 10 2 10 1 10 5 | R t | 10 4 10 3 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 time (h) × 10 2 δ = 6h δ = 18h δ = 48h δ = 168h δ = 12h δ = 24h δ = 96h δ = ∞

RQ3: What is the effect of constrained recommendability? Observations restrictions. • 24h • 6h 1.6% • Slope of increasing runtime with more data is fmattened , improving scalability . 17 • Clear effjciency gains for lower values of δ : • 48h only needs < 10% of the runtime needed without < 5%

Conclusion & Future Work

Conclusion We introduce Dynamic Index , which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incremental ly by design . • is easily parallellisable . • naturally handles and exploits recommendability of items. 18

Questions? Source code is available: Academics hire too! PhD students + Post-docs 19

Future Work • More advanced similarity measures : • Jaccard index, Pointwise Mutual Information ( PMI ), Pearson correlation,… are all dependent on the co-occurrence matrix M . • Beyond item-item collaborative fjltering: • With relatively straightforward extensions … (e.g. including a value in the inverted indices to allow for non-binary data ) …we can tackle more general Information Retrieval use-cases. 20

Effjcient Similarity Computation for Collaborative Filtering in - PowerPoint PPT Presentation

Effjcient Similarity Computation for Collaborative Filtering in Dynamic Environments Olivier Jeunen 1 , Koen Verstrepen 2 , Bart Goethals 1,2 September 18th, 2019 1 Adrem Data Lab, University of Antwerp 2 Froomle olivier.jeunen@uantwerp.be 1

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue Ruslan Nikolaev Systems

Effjcient pairing computation with theta functions. ANTS IX David Lubicz 1,2 , Damien Robert 3 1

Effjcient Computation of Change-Graph Scores David Eppstein (includes joint work with Emma Spiro,

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and

LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan,

I/O-EFFICIENT SIMILARITY JOIN R. Pagh, N. Pham, F. Silvestri, M. Stckel Similarity Join R = Q

COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

COLLABORATIVE COMMUNITY PRESENTATION MAY 30TH, 2018 One San Pedro COLLABORATIVE One San Pedro

Similarity searching using multiple starting points Peter Willett, University of Sheffield, UK

PETER Fast similarity searches and similarity joins in Oracle DB Astrid Rheinlnder, Ulf Leser

Outline Morning program Preliminaries Modeling user behavior Semantic matching Learning to

15-388/688 - Practical Data Science: Recommender systems J. Zico Kolter Carnegie Mellon

COMS 4721: Machine Learning for Data Science Lecture 17, 3/30/2017 Prof. John Paisley Department

CS490W: What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation

Objective Taking recommendation technology to the masses Helping researchers and

Recommender Systems Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

A Performance Prediction Approach to Enhance Collaborative Filtering Performance Alejandro