Nearest-Biclusters Collaborative Filtering Philadelphia, 20 August - PowerPoint PPT Presentation

Nearest-Biclusters Collaborative Filtering Philadelphia, 20 August 2006 Speaker : Panagiotis Symeonidis PhD Candidate Scholar of the State Scholarships Foundation Aristotle University of Thessaloniki, Greece symeon@delab.csd.auth.gr http:/ / delab.csd.auth.gr/ ~ symeon Authors: Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, Yannis Manolopoulos. 1

What is Collaborative Filtering (CF)? CF is a successful recommendation technique used the last decade to confront the “information overload” in the internet. CF helps a customer to find what he interested in. 2

Related work on CF In 1994, GroupLens implemented a CF algorithm based on users’ similarities. It is well-known as user-based algorithm(UB) . In 2001, item-based algorithm (IB) is proposed. (Sarwar et al.) It is based on the items’ similarities. Several model-based approaches (mainly k- means clustering). They develop a model of user ratings. 3

Basic Challenges for CF algorithms Accuracy in recommendations: Users must be satisfied from items’ suggestions. Scalability: Algorithms face performance problems as the volume of data increases. 4

Motivation of our work(1) Nearest Neighbors algorithms(UB, IB) cannot handle scalability to large volumes of data. e.g. 5

Motivation of our work(2) UB and IB are both one-sided approaches. (ignore the duality between users and items) e.g. I1 I2 I3 U1 U2 U3 I1 0 0.1 0.2 U1 0 0.5 0.2 I2 0.1 0 0.7 U2 0.5 0 0.1 I3 0.2 0.7 0 U3 0.2 0.1 0 (User-User similarity matrix) (Item-Item similarity matrix) 6

Motivation of our work(3) UB and IB cannot not detect partial matching. (they just find the less dissimilar users/items) e.g. I1 I2 I3 I4 I5 U1 5 5 1 1 1 (1-5 rating scale) U2 5 5 5 5 5 The above users would have negative similarity in UB and IB. SO, WE MISS THEIR PARTIAL MATCHING.. 7

Motivation of our work(4) Traditional model-based algorithms (k-means, H- clustering) place each item/user in one cluster. Sports Computers e.g. I1 I2 I3 I4 I5 (bookstore) - U1 5 5 5 5 The above user can have many different preferences or an item can belong in many different item categories. 8

Motivation of our work(5) K-means and H-clustering algorithms again ignore the duality of data. (one sided approach) U 8 e.g. U 6 U 5 U 9 I 7 I 2 Create clusters only of users I 3 or only of items 9

What we propose Biclustering to disclose the duality between users and items by grouping them in both dimensions simultaneously. a novel nearest-biclusters CF algorithm which uses a new similarity measure to achieve partial matching of users’ preferences. 10

Related work in Biclustering Cheng and Church algorithm – uses mean square residue score to construct biclusters. xMotif algorithm - extracts motifs. Bimax : finds binary maximal-inclusion bicliques. 11

Related work in CF No related work has applied an exact biclustering algorithm. Hoffman and Puzicha proposed just a latent class model where clustering is performed seperately for users and for items. 12

Our Contribution Apply an exact biclustering algorithm in CF. Propose a novel nearest-biclusters CF algorithm. Use a new similarity measure for partial matching. Provide extensive experimental results. 13

Our Methodology a. The data preprocessing step(optional). b. The biclustering process. c. The nearest-biclusters algorithm. 14

15 (Test Set) Running Example ( Training Set) Rating scale : 1-5

a . The data preprocessing step (optional) Binary discretization of the Training Set with P t > 2 Training Set. P t : Positive Rating Threshold 16

b. The biclustering process(1) Use Bimax algorithm : Binary inclusion-maximal algorithm. A bicluster b(U b , I b ) corresponds to a subset of users U b that jointly present positively rating behavior across a subset of items I b. In other words for Bimax, the pair (U b , I b ) defines a submatrix for which all elements equal to 1 and is not entirely contained in any other bicluster. 17

b. The biclustering process(2) • Four biclusters found. • overlapping between Applying Bimax to Training Set. biclusters. Input parameters: 1. Min. number of users • well-tuning of 2. Min. number of items overlapping. (here is 2 for both) 18

c. The nearest-Biclusters algorithm(1) It consists of two basic operations: • The formation of the test user neighborhood, i.e. to find the k-nearest biclusters. • The generation of the top-N recommendation list. 19

c. The nearest-Biclusters algorithm(2) To find the k-nearest biclusters of a test user: We divide items they have in common to the sum of items they have in common and the number of items they differ. Similarity values range between [0,1]. 20

c. The nearest-Biclusters algorithm(3) To generate the top-N recommendation list : Weighted Frequency (WF) of an item in a bicluster is the product between and the similarity measure We weight the contribution of each bicluster with its size, in addition to its similarity with the test user. 21

Evaluating the CF process Evaluation is done through Precision, Recall and F1 metric. Note that, MAE is not indicative for the quality of the top-N list, but only for the quality of the similarity measure. 22

Experimental Configuration Compare nearest-biclusters , UB and IB algorithms in three real datasets. (Movielens 100k and 1M, Eachmovie) We present results for Movielens 100k. Top- N list : 20 items k -nearest neighbors: 1-100 23

Tuning of users’ initial parameter (1) *avg. #Users in a bicluster 0.4 0.38 * 5.77 * 0.36 * 4.6 6.4 0.34 * 8.9 0.32 0.3 F 1 0.28 * 10.63 0.26 0.24 0.22 0.2 2 4 6 8 10 n Tuning of the minimum number of users parameter in a bicluster. (n= 4 users in a bicluster) 24

Tuning of items’ initial parameter(2) *avg. #Items in a bicluster 0.4 0.38 * 14.2 * 10.32 0.36 * * 8.64 15.3 0.34 0.32 0.3 16.19 F 1 0.28 0.26 0.24 0.22 0.2 6 8 10 12 14 m Tuning of the minimum number of items parameter in a bicluster. (m= 10 items in a bicluster) 25

Tuning of overlapping factor (3) *number of biclusters * 85723 0.4 * 42009 0.38 * * 4185 1214 0.36 * 0.34 512 0.32 0.3 F 1 0.28 0.26 * 11 0.24 0.22 0.2 0% 25% 35% 50% 75% 100% overlapping Tuning of the number of overlapping biclusters. (35% overlapping) 26

Comparative Results for accuracy(1) UB IB Nearest-Biclusters 70 60 50 precision 40 30 20 10 0 10 20 30 40 50 60 70 80 90 100 k (30% more precision) 27

Comparative Results for accuracy(2) UB IB Nearest-Biclusters 30 25 20 Recall 15 10 5 0 10 20 30 40 50 60 70 80 90 100 k (10% more recall) 28

Comparative Results for execution time UB IB Nearest-Biclusters 100 80 Milliseconds 60 40 20 0 10 20 30 40 50 60 70 80 90 100 k (Nearest-biclusters is faster than IB algorithm) 29

Examination of additional factors (1) UB IB Nearest-Biclusters 90 80 Precision 70 60 50 Precision vs. 40 30 recommendation list 20 10 size (N). 0 10 20 30 40 50 N UB IB Nearest-Biclusters 35 30 Recall vs. 25 Recall 20 recommendation list 15 size (N). 10 5 0 10 20 30 40 50 N 30

Examination of additional factors (2) UB IB Nearest-Biclusters 0.4 0.35 0.3 0.25 F 1 0.2 0.15 0.1 0.05 0 15 30 45 60 75 90 training set size (perc.) F 1 metric vs. training set size. Note that a 15% of the training set of nearest- biclusters algorithm gives better F 1 than gives the 75% of the training set for the UB and IB cases. 31

Conclusions Our approach shows more than 30% improvement in terms of precision than UB and IB. Our approach shows improvement in terms of efficiency (beats even the IB algorithm). We introduced a novel similarity measure for the user’s neighborhood formation and Weighted Frequency for the top-N list generation. 32

Future Work Examine other classes of biclustering algorithms as well. (coherent algorithms etc.) Test different similarity measures between a user and a bicluster. THANK YOU. symeon@delab.csd.auth.gr http:/ / delab.csd.auth.gr/ ~ symeon 33

Nearest-Biclusters Collaborative Filtering Philadelphia, 20 August - PowerPoint PPT Presentation

Nearest-Biclusters Collaborative Filtering Philadelphia, 20 August 2006 Speaker : Panagiotis Symeonidis PhD Candidate Scholar of the State Scholarships Foundation Aristotle University of Thessaloniki, Greece symeon@delab.csd.auth.gr http:/ /

CS490W: What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation

Home Repair Program North Philadelphia & West Philadelphia Repair Program Who is Habitat

Preparing Philadelphia for Climate Change Sarah Wu and Rich Freeh Philadelphia Office of

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

Collaborative Filtering Yun-Ta Tsai 1 , Markus Steinberger 2 , Dawid Pajk 3 , Kari Pulli 4 1

Computer Graphics Texture Filtering Philipp Slusallek Filtering Magnification (Zoom-in)

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

k -Reciprocal Nearest Neighbors Algorithm for One-Class Collaborative Filtering Wei Cai a , b , c

Collaborative Filtering Presentation by Alex Hugger Filtering Documents Mittwoch, 28. April 2010

t rts t t

Title page Corporate Finance Liaison May / June 2011 Statistics 160 Quarterly M&A

Sterile neutrinos: the dark side of the light fermions Sterile neutrino: a well-motivated dark

Tukey classes of ultrafilters on David Milovich 8th Annual Graduate Student Conference in

Collaborative Filtering Radek Pel anek Notes on Lecture the most technical lecture of the

IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta

9. Sequential Neural Models CS 519 Deep Learning, Winter 2018 Fuxin Li With materials from

Design of S-boxes Defined with CA Rules CF 2017 / Mal-IoT Siena Stjepan Picek 1 , Luca Mariot

Nearest-Biclusters Collaborative Filtering Philadelphia, 20 August - PowerPoint PPT Presentation

Nearest-Biclusters Collaborative Filtering Philadelphia, 20 August 2006 Speaker : Panagiotis Symeonidis PhD Candidate Scholar of the State Scholarships Foundation Aristotle University of Thessaloniki, Greece symeon@delab.csd.auth.gr http:/ /

CS490W: What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation

Home Repair Program North Philadelphia &amp; West Philadelphia Repair Program Who is Habitat

Preparing Philadelphia for Climate Change Sarah Wu and Rich Freeh Philadelphia Office of

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

Collaborative Filtering Yun-Ta Tsai 1 , Markus Steinberger 2 , Dawid Pajk 3 , Kari Pulli 4 1

Computer Graphics Texture Filtering Philipp Slusallek Filtering Magnification (Zoom-in)

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

k -Reciprocal Nearest Neighbors Algorithm for One-Class Collaborative Filtering Wei Cai a , b , c

Collaborative Filtering Presentation by Alex Hugger Filtering Documents Mittwoch, 28. April 2010

t rts t t

Title page Corporate Finance Liaison May / June 2011 Statistics 160 Quarterly M&amp;A

Sterile neutrinos: the dark side of the light fermions Sterile neutrino: a well-motivated dark

Tukey classes of ultrafilters on David Milovich 8th Annual Graduate Student Conference in

Collaborative Filtering Radek Pel anek Notes on Lecture the most technical lecture of the

IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta

9. Sequential Neural Models CS 519 Deep Learning, Winter 2018 Fuxin Li With materials from

Design of S-boxes Defined with CA Rules CF 2017 / Mal-IoT Siena Stjepan Picek 1 , Luca Mariot

Home Repair Program North Philadelphia & West Philadelphia Repair Program Who is Habitat

Title page Corporate Finance Liaison May / June 2011 Statistics 160 Quarterly M&A