1 code implementation • 24 Sep 2021 • Baturay Saglam, Furkan Burak Mutlu, Dogan Can Cicek, Suleyman Serdar Kozat
We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias.