The BellKor solution to the Netflix Prize
نویسندگان
چکیده
Our final solution (RMSE=0.8712) consists of blending 107 individual results. Since many of these results are close variants, we first describe the main approaches behind them. Then, we will move to describing each individual result. The core components of the solution are published in our ICDM'2007 paper [1] (or, KDD-Cup'2007 paper [2]), and also in the earlier KDD'2007 paper [3]. We assume that the reader is familiar with these works and our terminology there. A movie-oriented k-NN approach was thoroughly described in our KDD-Cup'2007 paper [kNN]. We apply it as a post-processor for most other models. Interestingly, it was most effective when applied on residuals of RBMs [5], thereby driving the Quiz RMSE from 0.9093 to 0.8888. An earlier k-NN approach was described in the KDD'2007 paper ([3], Sec. 3) [Slow-kNN]. It appears that this earlier approach can achieve slightly more accurate results than the newer one, at the expense of a significant increase in running time. Consequently, we dropped the older approach, though some results involving it survive within the final blend. We also tried more naïve k-NN models, where interpolation weights are based on pairwise similarities between movies (see [2], Sec. 2.2). Specifically, we based weights on corr 2 /(1-corr 2) [Corr-kNN], or on mse-10 [MSE-kNN]. Here, corr is the Pearson correlation coefficient between the two respective movies, and mse is the mean squared distance between two movies (see definition of s ij in Sec. 4.1 of [2]). We also tried taking the interpolation weights as the "support-based similarities", which will be defined shortly [Supp-kNN]. Other variants that we tried for computing the interpolation coefficients are: (1) using our KDD-Cup'2007 [2] method on a binary user-movie matrix, which replaces every rating with " 1 " , and sets non-rated user-movie pairs to " 0 " [Bin-kNN]. (2) Taking results of factorization, and regressing the factors associated with the target movie on the factors associated with its neighbors. Then, the resulting regression coefficients are used as interpolation weights [Fctr-kNN]. As explained in our papers, we also tried user-oriented k-NN approaches. Either in a profound way (see: [1], Sec. 4.3; [3], Sec. 5) [User-kNN], or by just taking weights as pairwise similarities among users [User-MSE-kNN], which is the user-oriented parallel of the aforementioned [MSE-kNN]. Prior to computing interpolation weights, one has to choose the set of neighbors. We find the most similar neighbors based on an appropriate similarity measure. In …
منابع مشابه
The BellKor 2008 Solution to the Netflix Prize
Our RMSE=0.8643 2 solution is a linear blend of over 100 results. Some of them are new to this year, whereas many others belong to the set that was reported a year ago in our 2007 Progress Prize report [3]. This report is structured accordingly. In Section 2 we detail methods new to this year. In general, our view is that those newer methods deliver a superior performance compared to the method...
متن کاملThe BellKor Solution to the Netflix Grand Prize
This article describes part of our contribution to the “BellKor’s Pragmatic Chaos” final solution, which won the Netflix Grand Prize. The other portion of the contribution was created while working at AT&T with Robert Bell and Chris Volinsky, as reported in our 2008 Progress Prize report [3]. The final solution includes all the predictors described there. In this article we describe only the ne...
متن کاملThe BigChaos Solution to the Netflix Prize 2008
The team “BellKor in BigChaos” is a combined team of team BellKor and BigChaos. The solution with a RMSE of 0.8616 is created by a linear blend of the results from both teams. In the following paper we describe the results of BigChaos. 1 Preface During the last 2 years of research we tried a variety of different collaborative filtering algorithms. In the following we describe all methods which ...
متن کاملThe Netflix Prize High Performance Computing Neural Networks Final Report
A solution for the Netflix Prize was developed based on back propagation neural networks. The solution is different than most other Collaborative Filtering techniques in that rather than perform a global dimensionality reduction, this method focuses on each desired prediction by creating an entirely new neural network for each prediction. The implementation was parallelized using MPI, achieving...
متن کاملOn the Gravity Recommendation System
The Netflix Prize is a collaborative filtering problem. This subfield of machine learning has become popular from the late 1990s with the spread of online services that use recommendation systems, such as e.g. Amazon, Yahoo! Music, and of course Netflix. The aim of such a system is to predict what items a user might like based on his/her and other users previous ratings. The dataset of Netflix ...
متن کاملMatrix factorization for the Netflix Prize
I compare two common techniques to compute matrix factorizations for recommender systems, specifically using the Netflix prize data set. Accuracy, run-time, and scalability are discussed for stochastic gradient descent and non-linear conjugate gradient.
متن کامل