no code implementations • 7 May 2024 • Zhifa Ke, Zaiwen Wen, Junyu Zhang
Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks.
no code implementations • 25 Feb 2023 • Zhifa Ke, Junyu Zhang, Zaiwen Wen
Under mild conditions, non-asymptotic finite-sample convergence to the globally optimal Q function is derived for various nonlinear function approximations.