no code implementations • 15 Jun 2018 • Ajin George Joseph, Shalabh Bhatnagar
In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i. e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set.
no code implementations • 31 Jan 2018 • Ajin George Joseph, Shalabh Bhatnagar
In this paper, we consider a modified version of the control problem in a model free Markov decision process (MDP) setting with large state and action spaces.
no code implementations • 31 Jan 2018 • Ajin George Joseph, Shalabh Bhatnagar
The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure.