no code implementations • 12 Nov 2019 • Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, Lijun Chen
We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward.