no code implementations • NeurIPS 2021 • Brandon Cui, Hengyuan Hu, Luis Pineda, Jakob N. Foerster
The standard problem setting in cooperative multi-agent settings is self-play (SP), where the goal is to train a team of agents that works well together.
no code implementations • 13 Jul 2022 • Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob N. Foerster
Fully cooperative, partially observable multi-agent problems are ubiquitous in the real world.
1 code implementation • 17 Sep 2021 • Chris Cummins, Bram Wasti, Jiadong Guo, Brandon Cui, Jason Ansel, Sahir Gomez, Somya Jain, Jia Liu, Olivier Teytaud, Benoit Steiner, Yuandong Tian, Hugh Leather
What is needed is an easy, reusable experimental infrastructure for real world compiler optimization tasks that can serve as a common benchmark for comparing techniques, and as a platform to accelerate progress in the field.
2 code implementations • NeurIPS 2021 • Kevin Yang, Tianjun Zhang, Chris Cummins, Brandon Cui, Benoit Steiner, Linnan Wang, Joseph E. Gonzalez, Dan Klein, Yuandong Tian
Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function.
5 code implementations • 6 Mar 2021 • Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster
Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.
no code implementations • ICLR 2021 • Brandon Cui, Yin-Lam Chow, Mohammad Ghavamzadeh
We first formulate a LCE model to learn representations that are suitable to be used by a policy iteration style algorithm in the latent space.
Model-based Reinforcement Learning reinforcement-learning +2
no code implementations • 9 Jun 2020 • Yin-Lam Chow, Brandon Cui, MoonKyung Ryu, Mohammad Ghavamzadeh
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.