Evaluating Recommender Systems
The long tail theory is introduced and its impact onRecommender systems and a comprehensive review of the different datasets used to evaluate collaborative filtering recommender systems techniques and algorithms is provided.
Abstract
Recommender systems are considered as an answer to the information overload in a Web environment. Such systems recommend items (movies, music, books, news, web pages, etc.) that the user should be interested in. Collaborative filtering recommender systems have a huge success in commercial applications. The sales in these applications follow a power law distribution. However, with the increase of the number of recommendation techniques and algorithms in the literature, there is no indication that the datasets used for the evaluation follow a real world distribution. This paper introduces the long tail theory and its impact on recommender systems. It also provides a comprehensive review of the different datasets used to evaluate collaborative filtering recommender systems techniques and algorithms (EachMovie, MovieLens, Jester, BookCrossing, and Netflix). Finally, it investigates which of these datasets present a distribution that follows this power law distribution and which distribution would be the most relevant.