no code implementations • 13 Jul 2019 • Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Ji Liu
In this paper, we present a probability one convergence proof, under suitable conditions, of a certain class of actor-critic algorithms for finding approximate solutions to entropy-regularized MDPs using the machinery of stochastic approximation.
1 code implementation • 15 Mar 2019 • Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Basar, Ji Liu
This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy.