Accurate Recommendations of Online Movie Ratings: Large Data Sets with Low Dimensions and Span of Multiple Years Sudip Bhattacharjee* Mikhail Bragin Dmitry Zhdanov Dept. of OPIM School of Business University of Connecticut 2100 Hillside Road, U-1041 IM Storrs CT 06269 {sbhattacharjee, mbragin, dzhdanov} @business.uconn.edu *Contact author Submitted to “ The 2010 Winter Conference on Business Intelligence ” November 2009
Accurate Recommendations of Online Movie Ratings: Large Data Sets with Low Dimensions and Span of Multiple Years 1. Research Problem Recommender systems are widely used in businesses, and the growth of e-commerce has seen an explosion of such systems in various online communities and business environments. Significant research exists in this area (Adomavicius and Tuzhilin 2005) 1 , and applications can be found in several product categories, including movies, music, books, electronic products, and news. It has been argued that while recommender systems positively impact the popularity of niche products (Tucker and Zhang 2009), they can also lead to an overall reduction of popular product diversity in sales (Fleder and Hosanagar 2009). We focus here on a variation of a problem that attracted significant attention among academics and practitioners of recommendation systems. Netflix ran a contest to improve its own customer-movie rating accuracy. The Netflix Prize aimed to improve the accuracy of user movie rating prediction based on users’ previous movie preferences and ratings (Netflix) by 10% over its current prediction. While the stated goal may be arbitrary for a problem with no known optimal value, it is nevertheless instructive that achieving it was not an easy feat. The contest ended in summer of 2009. The dominant approach used by the leading teams, including the winners of Netflix Prize, is Singular Value Decomposition (SVD). The SVD approach is similar to a factor analysis as it tries to identify a hidden, endogenous structure of the problem. Larger number of factors leads to better predictions; however, these factors are obviously difficult (if not impossible) to interpret as meaningful objects that can be used in making business decisions. In contrast, our approach is designed to explain the rationale of the solution methodology. This would enable decision makers to understand the system and its inner workings, and not treat the recommender system as a black box. This would allow managers to make informed decisions based on a deeper knowledge of the underlying model and problem characteristics. 2. Modeling Approach The training dataset consists of a set of ratings that 480,189 Netflix users gave to 17,700 movies, totaling over 100 million ratings, over a period of six years. For each rating, there is associated data on anonymous user ID, movie ID, and rating date – and no other information. For the contest, a test set is provided to validate the model, and a prediction set provided (withholding the associated rating) to judge the contestants. The goal is to create an algorithm that, given a (user ID, movie ID, rating date) tuple, generates a rating prediction on the prediction set that is 10% better than the current model. Given absence of product content knowledge, and a large data set which spans over six years (which involves changing user tastes), collaborative filtering is a promising approach. Our model explores a combination of non-machine learning algorithms to explain the phenomenon to managers. 1 All references available upon request 1
Parameter optimization We predict the rating ( ) ˆ , , r user movie date for a given user i , movie j and date k . i j k We begin by considering a model that makes predictions based on two factors: previous behavior of a given user i (expression below with parameter b ) and current behavior of other users (expression with parameter a ) with respect to movie j on a given date k : t t , { | T T k } 1 2 Here date t1 and date t2 are defined as intervals leading up to the rating date ( date k ) of user i for movie j . These can be of different durations, and are determined empirically. The optimization problem is given by: The result provides us with the capability to build prediction models with the estimation parameters a and b . This is followed by fine-grained segment generation, and further by segment-based variations of the above parameter optimization model. 3. Data and Results – Main Findings A histogram of the data (Figure 1) shows that the ratings are slightly skewed to the right. Hence a naïve prediction approach is predicting the largest class, or the mean, which would not improve prediction accuracy given the nature of the data. Figures 2 and 3 plot the relationship between average rating and standard deviation of ratings for users and movies, respectively. Figure 1: Histogram of ratings 2
Fig 2. Relationship between average rating Fig 3. Relationship between average rating and standard deviation of ratings for users and standard deviation of ratings for movies We run the parameter optimization on the training set. Segments emerged during the optimization and highlight the different parameter weights of a and b . Table 1 shows three movies, one in each category or “segment”. Table 2 presents results for a sub-sample of 45 movies predicted using the parameter optimization method. Table 1: Sample optimization solution for three movies and segments Movie Average Standard N RMSE a B a+b >1 a>b id rating deviation 14961 73335 4.7232 0.6097 0.5782 0.672 0.4026 T T 11222 12699 4.0002 0.9695 0.8269 0.0735 0.9943 T F 14467 17897 2.4419 1.0633 1.0151 -0.3332 0.9693 F F Table 2: Segments and illustrative improvement in prediction Initial standard Cluster Count % improvement deviation TT 1 0.6655 1.70% TF 9 1.0422 13.16% FF 35 1.1750 13.87% FT 0 N/A N/A We then estimate the ratings for those movies that need to be predicted in the test dataset. Figure 4 presents the results. We ran four variations of a base method; the horizontal axis represents the method index. The solid line represents the RMSE obtained on the prediction set via the Netflix submission system; the dashed line corresponds to the test data RMSE. We see that these lines follow the same pattern, meaning that some of our adjustments are uniformly better than others, and our method gives the robust and stable results overall. It should be noted 3
that the Netflix prediction RMSE (solid line) is calculated by Netflix on a closely guarded subset of the prediction set, not the whole set. So while our overall predictions may have satisfied Netflix guidelines based on results on the whole test set (10% reduction), our parsimonious and easily understood model does not show similar results from the prediction set submission to Netflix. Our conjecture is that our model does not work uniformly well on the subset tested by Netflix, but achieves better results overall. 1.2 1 0.8 0.6 1 2 3 4 Figure 4. Test RMSE vs. Quiz RMSE 4. Current Status Our model incorporates a parameter optimization method, followed by a segment generation method, to improve on the prediction (see Figure 5). Our current research involves creating more fine-grained segments and segment-specific variations of prediction methods. Based on these, we are further improving the prediction accuracy, and would be able to present the results and receive valuable feedback during the conference. Figure 5: Overall methodology for prediction analysis 4
Recommend
More recommend