no code implementations • 1 Mar 2024 • Emile Anand, Guannan Qu
We show that the learned policy converges to the optimal policy in the order of $\tilde{O}(1/\sqrt{k}+\epsilon_{k, m})$ as the number of sub-sampled agents $k$ increases, where $\epsilon_{k, m}$ is the Bellman noise, by proving a novel generalization of the Dvoretzky-Kiefer-Wolfowitz inequality to the regime of sampling without replacement.
no code implementations • 27 Nov 2022 • Emile Anand, Charles Steinhardt, Martin Hansen
Civilizations have tried to make drinking water safe to consume for thousands of years.