no code implementations • 1 Mar 2018 • Henry WJ Reeve, Joe Mellor, Gavin Brown
In addition, focusing on the case of bounded rewards, we give corresponding regret bounds for the k-Nearest Neighbour KL-UCB algorithm, which is an analogue of the KL-UCB algorithm adapted to the setting of multi-armed bandits with covariates.