no code implementations • 28 Sep 2021 • Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant
Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation.
no code implementations • 29 Jan 2021 • Joseph Lubars, Anna Winnicki, Michael Livesay, R. Srikant
We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic.