no code implementations • 16 May 2024 • Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton
We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average.
no code implementations • NeurIPS 2021 • Yi Wan, Abhishek Naik, Richard S. Sutton
We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs.
no code implementations • 17 Apr 2021 • Katya Kudashkina, Yi Wan, Abhishek Naik, Richard S. Sutton
Our algorithms and experiments are the first to treat MBRL with expectation models in a general setting.
no code implementations • 2 Oct 2020 • Anirban Santara, Sohan Rudra, Sree Aditya Buridi, Meha Kaushik, Abhishek Naik, Bharat Kaul, Balaraman Ravindran
In this work, we present MADRaS, an open-source multi-agent driving simulator for use in the design and evaluation of motion planning algorithms for autonomous driving.
1 code implementation • 29 Jun 2020 • Yi Wan, Abhishek Naik, Richard S. Sutton
We introduce learning and planning algorithms for average-reward MDPs, including 1) the first general proven-convergent off-policy model-free control algorithm without reference states, 2) the first proven-convergent off-policy model-free prediction algorithm, and 3) the first off-policy learning algorithm that converges to the actual value function rather than to the value function plus an offset.
no code implementations • 4 Oct 2019 • Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton
Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks.
1 code implementation • 20 Jul 2017 • Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul
Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories.