no code implementations • 16 Jan 2023 • Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans
Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance.
no code implementations • 29 Sep 2021 • Maryam Hashemzadeh, Wesley Chung, Martha White
To enable better performance, we investigate the offline-online setting: The agent has access to a batch of data to train on but is also allowed to learn during the evaluation phase in an online manner.
no code implementations • 31 Aug 2020 • Wesley Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux
Traditionally, stochastic optimization theory predicts that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates.
no code implementations • 5 Jul 2019 • Brendan Bennett, Wesley Chung, Muhammad Zaheer, Vincent Liu
Temporal difference methods enable efficient estimation of value functions in reinforcement learning in an incremental fashion, and are of broader interest because they correspond learning as observed in biological systems.
2 code implementations • NeurIPS 2019 • Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White
Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning.
no code implementations • ICLR 2019 • Wesley Chung, Somjit Nath, Ajin Joseph, Martha White
A key component for many reinforcement learning agents is to learn a value function, either for policy evaluation or control.
no code implementations • 27 Sep 2018 • Matthew Schlegel, Wesley Chung, Daniel Graves, Martha White
We propose Importance Resampling (IR) for off-policy learning, that resamples experience from the replay buffer and applies a standard on-policy update.
no code implementations • 28 Aug 2018 • Touqir Sajed, Wesley Chung, Martha White
We provide experiments investigating the number of samples required by this offline algorithm in simple benchmark reinforcement learning domains, and highlight that there are still many open questions to be solved for this important problem.