no code implementations • 13 Mar 2024 • Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky
We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent.
no code implementations • 8 Dec 2023 • Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky
Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation.