no code implementations • 11 Aug 2022 • Jerome Taupin, Yassir Jedra, Alexandre Proutiere
We investigate the problem of best policy identification in discounted linear Markov Decision Processes in the fixed confidence setting under a generative model.