Effjcient Similarity Computation for Collaborative Filtering in Dynamic Environments Olivier Jeunen 1 , Koen Verstrepen 2 , Bart Goethals 1,2 September 18th, 2019 1 Adrem Data Lab, University of Antwerp 2 Froomle olivier.jeunen@uantwerp.be 1
Introduction & Motivation
Setting the scene i 5 t 3 u 2 i 4 t 4 u 2 i 2 t 5 u 3 i 1 u 2 t 7 u 1 u 2 i 7 t 8 u 3 i 6 t 9 We deal with implicit feedback : a set of (user, item, timestamp) -triplets, representing clicks, views, sales, … Suppose we have a set of pageviews of this form. i 3 t 6 t 2 u 1 i 2 2 i 1 t 1 u 1 . . . . . . . . .
Problem statement 1 0 0 0 0 0 0 0 0 1 In neighbourhood-based collaborative 0 0 0 1 0 0 0 0 0 1 0 1 Still a very competitive baseline, but often deemed unscalable 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 3 0 0 user-item matrix P . high-dimensional columns in the Items are represented as sparse , between pairs of items . 0 0 0 1 fjltering 1 , we need to compute similarity 0 0 0 1 0 0 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A need for speed Typically, the model is periodically recomputed . 4 time-consuming and model recency is often sacrifjced . For ever-growing datasets, these iterative updates can become very Iterative model updates over time 60 50 40 runtime 30 20 ∆ t 10 ... ∆ t 0 0 20 40 60 80 100 120 140 160 time
Previous work Existing approaches tend to speed up computations through • Approximation. • Parallellisation . • Incremental computation. But currently existing exact solutions do not exploit the sparsity that is inherent to implicit-feedback data streams. 5
Contribution & Methodology
Incremental Similarity Computation In the binary setting, cosine -similarity simplifjes to the number of users recomputing the entire similarity with every update: As such, we can compute these building blocks incrementally instead of 6 that have seen both items, divided by the square root of their individual numbers . |U i ∩ U j | cos( i , j ) = = √ N i M i , j � � N j � |U i | |U j | N ∈ N n : N i = |U i | and M ∈ N n × n : M i , j = |U i ∩ U j | .
Dynamic Index Existing approaches tend to build inverted indices in a preprocessing step… we do this on-the-fmy ! Initialise a simple inverted index for every user , to hold their histories. 1. Increment item co-occurence for i and other items seen by u . 2. Update the item’s count . 3. Add the item to the user’s inverted index. 7 For every pageview ( u , i ) :
Online Learning As Dynamic Index consists of a single for-loop over the pageviews, 8 it can naturally handle streaming data. Impact of Online Learning × 10 7 4 runtime (s) 3 ∆ t 2 | ∆ P| 1 t i t i +1 0 0 1 2 3 4 5 6 × 10 3 |P|
Parallellisation Procedure We adopt a MapReduce -like parallellisation framework: • Mapping is the Dynamic Index algorithm. 9 • Reducing two models M = { M , N , L} and M ′ = { M ′ , N ′ , L ′ } is: 1. Summing up M , M ′ and N , N ′ 2. Cross-referencing ( u , i ) -pairs from L [ u ] with ( u , j ) -pairs from L ′ [ u ] . Step 2 is obsolete if M and M ′ are computed on disjoint sets of users !
Recommendability Often, the set of items that should be considered as recommendations is the set of recommendable items at time t , and argue that it is often much smaller than the full item collection . recommendable : To keep up-to-date with recommendability updates: add a second inverted index for every user. 10 constrained by recency, stock, licenses, seasonality, … We denote R t as �R t � ≪ �I� As such, we only need an up-to-date similarity sim ( i , j ) if either i or j is i ∈ R t ∨ j ∈ R t
Experimental Results
Datasets 5 e 6 sparsity item-item matrix sparsity user-item matrix mean users per item Table 1: Experimental dataset characteristics. 1 e 6 297 e 3 18 e 3 27 e 3 # items 113 e 6 mean items per user 480 e 3 Outbrain Movielens* Netfmix* 138 e 3 News 11 # “events” 100 e 6 # users 20 e 6 96 e 6 200 e 6 144 . 41 209 . 25 18 . 29 1 . 76 747 . 84 5654 . 50 242 . 51 184 . 50 99 . 46 % 98 . 82 % 99 . 99 % 99 . 99 % 59 . 90 % 0 . 22 % 99 . 83 % 99 . 98 %
RQ1: Are we more effjcient than the baselines? 12 Movielens Netflix × 10 3 × 10 4 1.05 1.2 0.90 runtime (s) 1.0 0.75 0.8 0.60 0.6 0.45 0.4 0.30 0.2 0.15 0.0 0.00 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00 × 10 7 × 10 8 News Outbrain × 10 3 × 10 3 7 1.2 6 1.0 runtime (s) 5 0.8 4 0.6 3 0.4 2 1 0.2 0 0.0 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 × 10 8 × 10 8 |P| |P| Sparse Baseline Dynamic Index
RQ1: Are we more effjcient than the baselines? Observations • More effjcient if M is sparse • More effjcient if users have shorter histories • Average number of processed interactions per second ranges from 14 500 to 834 000 13
RQ2: How effective is parallellisation? 14 Movielens Netflix × 10 3 8 × 10 3 1.4 7 1.2 6 runtime (s) 1.0 5 0.8 4 0.6 3 0.4 2 0.2 1 0.0 0 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 0.0 0.2 0.4 0.6 0.8 1.0 × 10 7 × 10 8 News Outbrain × 10 3 × 10 2 3.0 4 2.5 runtime (s) 3 2.0 1.5 2 1.0 1 0.5 0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 × 10 8 × 10 8 |P| |P| n = 1 n = 2 n = 4 n = 8
RQ2: How effective is parallellisation? Observations • Incremental updates complicate reduce procedure: • For suffjciently large batches , performance gains are tangible . • For small batches , single-core updates are preferred . 15 • Speedup factor of > 4 for Netfmix and News datasets with 8 cores
RQ3: What is the effect of constrained recommendability? 16 News ( n = 8) 10 3 runtime (s) 10 2 10 1 10 5 | R t | 10 4 10 3 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 time (h) × 10 2 δ = 6h δ = 18h δ = 48h δ = 168h δ = 12h δ = 24h δ = 96h δ = ∞
RQ3: What is the effect of constrained recommendability? Observations restrictions. • 24h • 6h 1.6% • Slope of increasing runtime with more data is fmattened , improving scalability . 17 • Clear effjciency gains for lower values of δ : • 48h only needs < 10% of the runtime needed without < 5%
Conclusion & Future Work
Conclusion We introduce Dynamic Index , which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incremental ly by design . • is easily parallellisable . • naturally handles and exploits recommendability of items. 18
Conclusion We introduce Dynamic Index , which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incremental ly by design . • is easily parallellisable . • naturally handles and exploits recommendability of items. 18
Conclusion We introduce Dynamic Index , which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incremental ly by design . • is easily parallellisable . • naturally handles and exploits recommendability of items. 18
Conclusion We introduce Dynamic Index , which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incremental ly by design . • is easily parallellisable . • naturally handles and exploits recommendability of items. 18
Questions? Source code is available: Academics hire too! PhD students + Post-docs 19
Future Work • More advanced similarity measures : • Jaccard index, Pointwise Mutual Information ( PMI ), Pearson correlation,… are all dependent on the co-occurrence matrix M . • Beyond item-item collaborative fjltering: • With relatively straightforward extensions … (e.g. including a value in the inverted indices to allow for non-binary data ) …we can tackle more general Information Retrieval use-cases. 20
Recommend
More recommend