no code implementations • ICML 2020 • Hengyuan Hu, Alexander Peysakhovich, Adam Lerer, Jakob Foerster
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e. g. humans).
Multi-agent Reinforcement Learning Reinforcement Learning (RL)
no code implementations • 3 Nov 2023 • Hengyuan Hu, Suvir Mirchandani, Dorsa Sadigh
Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency.
no code implementations • 14 Jun 2023 • Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh
We additionally illustrate our approach with a robot on 2 carefully designed surfaces.
no code implementations • 25 Apr 2023 • Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown
Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent.
no code implementations • 13 Apr 2023 • Hengyuan Hu, Dorsa Sadigh
One of the fundamental quests of AI is to produce agents that coordinate well with humans.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 11 Oct 2022 • Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown
First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams.
no code implementations • NeurIPS 2021 • Brandon Cui, Hengyuan Hu, Luis Pineda, Jakob N. Foerster
The standard problem setting in cooperative multi-agent settings is self-play (SP), where the goal is to train a team of agents that works well together.
no code implementations • 13 Jul 2022 • Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob N. Foerster
Fully cooperative, partially observable multi-agent problems are ubiquitous in the real world.
no code implementations • 14 Dec 2021 • Athul Paul Jacob, David J. Wu, Gabriele Farina, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Jacob Andreas, Noam Brown
We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior.
1 code implementation • NeurIPS 2021 • Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown
Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker.
no code implementations • ICLR 2022 • Samuel Sokota, Hengyuan Hu, David J Wu, J Zico Kolter, Jakob Nicolaus Foerster, Noam Brown
Furthermore, because this specialization occurs after the action or policy has already been decided, BFT does not require the belief model to process it as input.
no code implementations • 16 Jun 2021 • Hengyuan Hu, Adam Lerer, Noam Brown, Jakob Foerster
Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games.
5 code implementations • 6 Mar 2021 • Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster
Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.
no code implementations • NeurIPS 2020 • Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster
In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs).
2 code implementations • 6 Mar 2020 • Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e. g. humans).
no code implementations • 27 Jan 2020 • Tristan Cazenave, Yen-Chi Chen, Guan-Wei Chen, Shi-Yu Chen, Xian-Dong Chiu, Julien Dehos, Maria Elsa, Qucheng Gong, Hengyuan Hu, Vasil Khalidov, Cheng-Ling Li, Hsin-I Lin, Yu-Jin Lin, Xavier Martinet, Vegard Mella, Jeremy Rapin, Baptiste Roziere, Gabriel Synnaeve, Fabien Teytaud, Olivier Teytaud, Shi-Cheng Ye, Yi-Jun Ye, Shi-Jim Yen, Sergey Zagoruyko
Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games.
10 code implementations • 5 Dec 2019 • Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown
The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy.
4 code implementations • ICLR 2020 • Hengyuan Hu, Jakob N. Foerster
Learning to be informative when observed by others is an interesting challenge for Reinforcement Learning (RL): Fundamentally, RL requires agents to explore in order to discover good policies.
1 code implementation • NeurIPS 2019 • Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, Mike Lewis
We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making.
no code implementations • ICLR 2018 • Hengyuan Hu, Ruslan Salakhutdinov
There have been numerous recent advancements on learning deep generative models with latent variables thanks to the reparameterization trick that allows to train deep directed models effectively.
no code implementations • 15 Nov 2016 • Hengyuan Hu, Lisheng Gao, Quanbin Ma
The most famous ones among them are deep belief network, which stacks multiple layer-wise pretrained RBMs to form a hybrid model, and deep Boltzmann machine, which allows connections between hidden units to form a multi-layer structure.
7 code implementations • 12 Jul 2016 • Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang
We alternate the pruning and retraining to further reduce zero activations in a network.