no code implementations • 26 Sep 2013 • Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini
For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.