Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation?

Next basket recommendation (NBR) is a special type of sequential recommendation that is increasingly receiving attention. So far, most NBR studies have focused on optimizing the accuracy of the recommendation, whereas optimizing for beyond-accuracy metrics, e.g., item fairness and diversity remains largely unexplored. Recent studies into NBR have found a substantial performance difference between recommending repeat items and explore items. Repeat items contribute most of the users' perceived accuracy compared with explore items. Informed by these findings, we identify a potential "short-cut" to optimize for beyond-accuracy metrics while maintaining high accuracy. To leverage and verify the existence of such short-cuts, we propose a plug-and-play two-step repetition-exploration (TREx) framework that treats repeat items and explores items separately, where we design a simple yet highly effective repetition module to ensure high accuracy, while two exploration modules target optimizing only beyond-accuracy metrics. Experiments are performed on two widely-used datasets w.r.t. a range of beyond-accuracy metrics, viz. five fairness metrics and three diversity metrics. Our experimental results verify the effectiveness of TREx. Prima facie, this appears to be good news: we can achieve high accuracy and improved beyond-accuracy metrics at the same time. However, we argue that the real-world value of our algorithmic solution, TREx, is likely to be limited and reflect on the reasonableness of the evaluation setup. We end up challenging existing evaluation paradigms, particularly in the context of beyond-accuracy metrics, and provide insights for researchers to navigate potential pitfalls and determine reasonable metrics to consider when optimizing for accuracy and beyond-accuracy metrics.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here