1 code implementation • 16 Apr 2024 • Ruibo Yang, Jiazhou Wang, Andrew Mullhaupt
In this paper, we study the stochastic multi-armed bandit problem, where the reward is driven by an unknown random variable.
Recommendation Systems