no code implementations • 7 Feb 2024 • GuoJian Wang, Faguo Wu, Xiao Zhang, Jianxiang Liu
However, existing methods often require these experiences to be successful and may overly exploit them, which can cause the agent to adopt suboptimal behaviors.