no code implementations • 13 Feb 2024 • Berk Bozkurt, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang
How well does an optimal policy $\hat{\pi}^{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$?