1 code implementation • 9 Feb 2024 • Simone Parisi, Montaser Mohammedalamen, Alireza Kazemipour, Matthew E. Taylor, Michael Bowling
In this paper, we formalize a novel but general RL framework - Monitored MDPs - where the agent cannot always observe rewards.