The roles of exploration in recommender systems are examined in three facets: 1) system exploration to reduce system uncertainty in regions with sparse feedback; 2) user exploration to introduce users to new interests/tastes; and 3) online exploration to take into account real-time user feedback.
In the era of increasing choices, recommender systems are becoming indispensable in helping users navigate the million or billion pieces of content on recommendation platforms. As the focus of these systems shifts from attracting short-term user attention toward optimizing long term user experience on these platforms, reinforcement learning (and bandits) have emerged as appealing techniques to power these systems [5, 9, 26, 27]. The exploration-exploitation tradeoff, being the foundation of bandits and RL research, has been extensively studied [1, 2, 4, 6, 8, 10, 11, 18, 20, 21, 22, 23]. An agent is incentivized to exploit to maximize its return, i.e., by repeating actions taken in the past that produced high rewards. On the other hand, the agent needs to explore previously unseen actions in order to discover potentially better ones. Exploration has been shown to be extremely useful in solving tasks of long horizons or sparse reward in many RL applications [2, 14, 15, 16, 19]. While effective exploration is believed to positively influence the user experience on the platform, the exact value of exploration in recommender systems has not been well established. In this talk, we examine the roles of exploration in recommender systems in three facets: 1) system exploration to reduce system uncertainty in regions with sparse feedback; 2) user exploration to introduce users to new interests/tastes; and 3) online exploration to take into account real-time user feedback. We showcase how each aspect of exploration contributes to the long term user experience through offline and live experiments on industrial recommendation platforms. We hope this talk can inspire more follow up work in understanding and improving exploration in recommender systems.