A hybrid approach that uses content-based methods with collaborative filtering is discussed, which avoids the so-called new item or item cold start problem and also avoids the inability to recommend to new users whose profile is not known.
X R D S • S P R I N G 2 0 2 0 • V O L . 2 6 • N O . 3 Recommendation systems are sine qua non for ecommerce, social networks, and media streaming service providers [1]. Product search—browsing through the product taxonomy graph or entering a search query to retrieve relevant products—is often the first hook for the customer to locate a product in e-retailer’s inventory. The recommendation system then takes over and acts as a friend to guide the customer throughout the shopping journey by recommending the most relevant products the customer may buy in every step, without the customer having to search for them explicitly. Given that we are submerged in an ever-growing amount of information, having a recommendation system that sifts through information and presents the most relevant pieces helps overcome information overload, and short circuits the shopping journey by helping customers find what they are looking for faster. The past two decades have seen a significant amount of research on recommendation systems. Large state-of-the-art recommendation systems have been built in various domains, for example, Walmart’s grocery offerings, Netflix’s movie choices, Amazon’s product suggestions, and YouTube’s video recommendation systems. It is no exaggeration to say that recommendation systems are pervasive in our daily lives through the gadgets we use. The news we read on our smartphones, or the jobs suggested by LinkedIn, all come from recommendation systems. In this article, we discuss the underlying principles and technologies behind a recommendation system. We also describe the key principles behind building a recommendation system with example code snippets. We will use an e-retailer, such as Walmart, as an example in our discussion. There are three parties to a recommendation system: customers (or users), products, and the recommendation system itself. Based on how a recommendation system views and interprets the customer–product interactions, two broad categories of recommendation systems have emerged. The first one is the content-based recommendation system, which recommends similar items the user has liked in the past. The similarity of the items is determined by their properties, also known as features. Here are the steps involved: (1) For each item, represent it as an item feature vector, called the “item profile.” (2) Find similar items based on feature vector matching. (3) Represent each user as a feature vector, called the “user/customer profile” that represents the user’s preferences. (4) Recommend items that correlate the most with the user profile through another similarity measure. While the main advantage of a content-based approach is that it does not need other user data to do a similar item search, thereby avoiding the so-called new item or item cold start problem, on the flip side lies its inability to recommend to new users whose profile is not known. The second class of recommendation systems is based on collaborative filtering. The collaborative filtering algorithm recommends to a user those items that are preferred by similar users. Therefore the process is a collaborative one, in which a group of similar users help each other in making good recommendations in an anonymous but collaborative manner. While collaborative filtering is widely used and works on any item set, IT nevertheless suffers from cold start (new user), data sparsity (users only express opinion such as like or bought on a limited subset of items), non-recommendation of unrated or unbought items, and bias toward bestselling or popular items. Here we discuss a hybrid approach that uses content-based methods with collaborative filtering. It is called “itemto-item collaborative filtering” [2, 3]. The idea here is for every item i we find the item j that was bought more frequently by the users who also bought i. Once the item-to-item similarity matrix is built—an expensive computation often done offline—the recommendation list generation is a real-time search of the similarity matrix to compose a recommendation list.