A comparative study on the different algorithms used to do recommendation popularly and build a hybrid model out of them to look at the latest techniques which incorporate the factor of time while recommending products.
Recommendation systems are widely used in e-commerce companies like Amazon and Netflix to help users discover items that might interest them. Due to their wide applicability, recommendation systems have become an area of active research. We do a comparative study on the different algorithms used to do recommendation popularly and build a hybrid model out of them. Introduction Various algorithms for recommending choices to users have been used by companies involved in selling products and services online like E-commerce sites, movie renting sites, housing and restaurant recommendation sites etc. These algorithms can be majorly classified into two types content-based techniques and collaborative-filtering based techniques. The former rely on the computing the similarity of items based on the meta data provided about the items i.e. incase of movies data like the genre, date of release, IMDB rating etc. For example if the user has watched a lot of comic thrillers, he is recommended a suitable comic thriller. There are also demographic content based techniques which compute similar users based on their same demography to do recommendation. The collaborative filtering techniques rely on computing the similarity based upon similarity of the user ratings for multiple items i.e. if two users have liked similar movies in the past they are likely to have same preferences. In collaborative filtering, we will also look at the latest techniques which incorporate the factor of time while recommending products. This is very essential especially, in todays time when the users have way too many choices and their preferences change frequently over time. Dataset Used We will be using the MovieLens 100k dataset for testing and training purposes. The reason for using this particular dataset is that the data is made available by a reputed and trustworthy source (University of Minnesota) and contains the information about the movies and the demographic data for users which we require in our content-based analysis. Also, each of the user-item ratings has a timestamp as the ratings have been collected over a period of 7 months. The dataset contains 100,000 ratings from 943 MovieLens users on 1682 movies from September 19, 1997 through April 22, 1998. We will use 80k for training and 20k for validation wherever such a partition is required.