no code implementations • 27 May 2024 • Xiaoou Cheng, Jonathan Weare
We quantify the efficiency of temporal difference (TD) learning over the direct, or Monte Carlo (MC), estimator for policy evaluation in reinforcement learning, with an emphasis on estimation of quantities related to rare events.
no code implementations • 8 Oct 2023 • Atsushi Shimizu, Xiaoou Cheng, Christopher Musco, Jonathan Weare
We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non-independent sampling strategies that promote spatial coverage.