no code implementations • 29 Oct 2023 • Zelai Xu, Chao Yu, Fei Fang, Yu Wang, Yi Wu
To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates.
no code implementations • 7 Oct 2023 • Jiayu Chen, Zelai Xu, Yunfei Li, Chao Yu, Jiaming Song, Huazhong Yang, Fei Fang, Yu Wang, Yi Wu
In this work, we present a novel subgame curriculum learning framework for zero-sum games.
no code implementations • 5 Oct 2023 • Zelai Xu, Yancheng Liang, Chao Yu, Yu Wang, Yi Wu
Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w. r. t.
no code implementations • 15 Jun 2022 • Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu
Despite all the advantages, we revisit these two principles and show that in certain scenarios, e. g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.
Multi-agent Reinforcement Learning reinforcement-learning +2