k -Reciprocal Nearest Neighbors Algorithm for One-Class Collaborative Filtering Wei Cai a , b , c , Weike Pan a , b , c ∗ , Jixiong Liu a , b , c , Zixiang Chen a , b , c , Zhong Ming a , b , c ∗ { caiwei2016, liujixiong, chenzixiang2016 } @email.szu.edu.cn, { panweike, mingz } @szu.edu.cn a National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China b Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University, Shenzhen, China c College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China Cai et al., (SZU) k -RNN Neurocomputing 1 / 32
Introduction Problem Definition One-Class Collaborative Filtering (OCCF) Input: n users, m items and their associated one-class positive feedback in the form of (user, item) pairs. Goal: Learn users’ preferences and generate a top- N personalized ranked list of items from I\I u for each user u . Cai et al., (SZU) k -RNN Neurocomputing 2 / 32
Introduction Motivation As a commonly used neighborhood structure, k -nearest 1 neighborhood is usually asymmetric (i.e., a certain user may belong to the neighborhood of another user but the inverse is not necessarily true), which may cause the fact that some neighbors of a certain user make few contributions to the final recommendation. For some commonly used similarity measurements such as 2 Jaccard index and cosine similarity, active users or popular items may be included in the neighborhood easily. It may not conform to the real situation and thus degrades the recommendation performance. Cai et al., (SZU) k -RNN Neurocomputing 3 / 32
Introduction Our Contributions We identify the asymmetric issue of the neighborhood constructed 1 by a typical traditional neighborhood-based method. We exploit the reciprocal neighborhood and construct a better 2 neighborhood structure. We design a novel recommendation algorithm called k -reciprocal 3 nearest neighbors algorithm ( k -RNN). We conduct extensive empirical studies on two large and public 4 datasets to show the effectiveness of our k -RNN. Cai et al., (SZU) k -RNN Neurocomputing 4 / 32
Introduction Notations (1/2) Table: Some notations and their explanations. Notation Explanation n the number of users m the number of items u , u ′ , w ∈ { 1 , 2 , . . . , n } user ID j ∈ { 1 , 2 , . . . , m } item ID ˆ r uj predicted preference of user u to item j U the whole set of users U te a set of users in test data I the whole set of items R = { ( u , i ) } a set of one-class feedback in training data R va = { ( u , i ) } a set of one-class feedback in validation data R te = { ( u , i ) } a set of one-class feedback in test data I u = { i | ( u , i ) ∈ R} a set of items preferred by user u I te u = { i | ( u , i ) ∈ R te } a set of items preferred by user u in test data Cai et al., (SZU) k -RNN Neurocomputing 5 / 32
Introduction Notations (2/2) Table: Some notations and their explanations (cont.). Notation Explanation s uw the original similarity between user u and user w ˜ s uw the adjusted similarity between user u and user w γ a parameter used in the adjusted similarity N k the k -nearest neighborhood of user u u N k -r the k -reciprocal nearest neighborhood of user u u the expanded N k -r ˜ N ℓ u u ℓ = | ˜ the size of the expanded neighborhood ˜ N ℓ u | N ℓ u the position of user w in N k p ( w | u ) u the position of user w in ˜ p ( w | u ) ˜ N ℓ u q = |R| q Cai et al., (SZU) k -RNN Neurocomputing 6 / 32
Related Work One-Class Collaborative Filtering There are mainly two branches of recommendation methods for the studied one-class collaborative filtering problem, including: Neighborhood-based methods such as k -NN [Deshpande and Karypis, 2004]. Factorization-based methods such as FISM [Kabbur et al., 2013], CDAE [Wu et al., 2016], RBM [Jahrer and T¨ oscher, 2012], LogMF [Johnson, 2014], BPR [Rendle et al., 2009], and PrMC [Wang et al., 2018]. In this paper, we focus on developing novel neighborhood-based methods, which are usually appreciated for their simplicity and effectiveness in terms of development, deployment, interpretability and maintenance. Cai et al., (SZU) k -RNN Neurocomputing 7 / 32
Related Work k -Reciprocal Nearest Neighborhood The concepts of reciprocal neighborhood and k -reciprocal neighborhood were first described in [Benzcri, 1982] and [Lelu, 2004]. k -nearest neighborhood was extended to k -reciprocal nearest neighborhood for object retrieval [Qin et al., 2011]. Some new similarity measurements based on the original primary metric with k -reciprocal nearest neighbors were proposed for an image search task [Delvinioti et al., 2014]. An algorithm used to re-rank the initially ranked list with k reciprocal features calculated by encoding k -reciprocal nearest neighbors was developed for person re-identification [Zhong et al., 2017]. In this paper, we make a significant extension of the previous works on k -reciprocal nearest neighborhood, and apply it to an important recommendation problem. Cai et al., (SZU) k -RNN Neurocomputing 8 / 32
Background k -Nearest Neighborhood (1/2) Similarity Measurement The Jaccard index between a user u and a user w can then be written as follows, s uw = | I u ∩ I w | / | I u ∪ I w | , (1) where I u and I w denote the sets of items preferred by the user u and the user w , respectively. Another popular similarity measurement is the well-known cosine similarity, which can be interpreted as a normalized version of � � Jaccard index, i.e., | I u ∩ I w | / |I u | |I w | . Cai et al., (SZU) k -RNN Neurocomputing 9 / 32
Background k -Nearest Neighborhood (2/2) Neighborhood Construction We first denote the position of user w in the neighborhood of user u as follows, � p ( w | u ) = δ ( s uu ′ > s uw ) + 1 , (2) u ′ ∈U\{ u } where δ ( x ) = 1 if x is true and δ ( x ) = 0 otherwise. We can then construct the k -nearest neighborhood of user u as follows, N k u = { w | p ( w | u ) ≤ k , w ∈ U\{ u }} , (3) which contains k nearest neighbors of user u . Cai et al., (SZU) k -RNN Neurocomputing 10 / 32
Method k -Reciprocal Nearest Neighborhood In order to construct the neighborhood more accurately, we propose to make use of k -reciprocal nearest neighborhood, which can be defined as follows, N k -r = { w | u ∈ N k w , w ∈ N k u } , (4) u where the size of the k -reciprocal nearest neighborhood of user u may be smaller than k , i.e., |N k -r u | ≤ k . We can see that the k -reciprocal nearest neighborhood N k -r in Eq.(4) is defined by the k -nearest u neighborhood N k u shown in Eq.(3). Cai et al., (SZU) k -RNN Neurocomputing 11 / 32
Method Similarity Adjustment We adjust the original similarity as follows, � u ∈ N k -r ( 1 + γ ) s uw , w ˜ s uw = , (5) u �∈ N k -r s uw , w where ˜ s uw is the adjusted similarity between user u and user w . Notice that γ ≥ 0 is a parameter which determines the magnitude of the adjustment. In particular, when γ > 0 and u ∈ N k -r w , the similarity is amplified in order to emphasize its importance; and when γ = 0, the adjusted similarity is reduced to the original one. Cai et al., (SZU) k -RNN Neurocomputing 12 / 32
Method Neighborhood Expansion We have the position of user w in the context of user u with the new similarity as follows, � ˜ δ (˜ s uu ′ > ˜ p ( w | u ) = s uw ) + 1 . (6) u ′ ∈U\{ u } With the new position ˜ p ( w | u ) , we can construct an expanded neighborhood, N ℓ ˜ u = { w | ˜ p ( w | u ) ≤ ℓ, w ∈ U\{ u }} , (7) where | ˜ N ℓ u | = ℓ . Notice that when s uw or ℓ is large enough, we may have w ∈ ˜ u even though w �∈ N k -r N ℓ u . This is actually very important, because it provides an additional opportunity to the screened-out but high-value users to re-enter the neighborhood again. Cai et al., (SZU) k -RNN Neurocomputing 13 / 32
Method Prediction Rule We hope that the users in the k -reciprocal nearest neighborhood N k -r u will have higher impacts on the prediction process. Hence, we directly use the adjusted similarity ˜ s uw to predict the preferences of user u to the un-interacted items instead of using the original similarity s uw . Mathematically, we have the new prediction rule as follows, � ˆ ˜ r uj = s uw . (8) w ∈ ˜ N ℓ u ∩U j Cai et al., (SZU) k -RNN Neurocomputing 14 / 32
Method The Algorithm (1/3) Step 1: Construct the k -nearest neighborhood. 1: Input : R = { ( u , i ) } 2: Output : A personalized ranked list of items for each user 3: for u = 1 to n do for w = 1 to n do 4: s uw = |I u ∩ I w | / |I u ∪ I w | 5: end for 6: Take k users with largest s uw as N k u for user u 7: 8: end for Cai et al., (SZU) k -RNN Neurocomputing 15 / 32
Method The Algorithm (2/3) Step 2: Construct the expanded k -reciprocal nearest neighborhood. 1: for u = 1 to n do for w = 1 to n do 2: s uw = |I u ∩ I w | / |I u ∪ I w | 3: if u ∈ N k u then w and w ∈ N k 4: // w ∈ N k -r ˜ s uw = ( 1 + γ ) s uw 5: u else 6: // w �∈ N k -r ˜ s uw = s uw 7: u end if 8: end for 9: s uw as ˜ Take ℓ users with largest ˜ N ℓ u for user u 10: 11: end for Cai et al., (SZU) k -RNN Neurocomputing 16 / 32
Recommend
More recommend