Search Results for author: Jiaxuan Gao

Found 6 papers, 4 papers with code

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

no code implementations • 16 Apr 2024 • Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).

Code Generation

Paper
Add Code

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

1 code implementation • 23 Dec 2023 • Jijia Liu, Chao Yu, Jiaxuan Gao, Yuqing Xie, Qingmin Liao, Yi Wu, Yu Wang

AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination.

Code Generation

Paper
Code

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

1 code implementation • 3 Feb 2023 • Chao Yu, Jiaxuan Gao, Weilin Liu, Botian Xu, Hao Tang, Jiaqi Yang, Yu Wang, Yi Wu

A crucial limitation of this framework is that every policy in the pool is optimized w. r. t.

Multi-agent Reinforcement Learning

Paper
Code

Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative Exploration

2 code implementations • 9 Jan 2023 • Chao Yu, Xinyi Yang, Jiaxuan Gao, Jiayu Chen, Yunfei Li, Jijia Liu, Yunfei Xiang, Ruixin Huang, Huazhong Yang, Yi Wu, Yu Wang

Simply waiting for every robot being ready for the next action can be particularly time-inefficient.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

Learning Efficient Multi-Agent Cooperative Visual Exploration

no code implementations • 12 Oct 2021 • Chao Yu, Xinyi Yang, Jiaxuan Gao, Huazhong Yang, Yu Wang, Yi Wu

In this paper, we extend the state-of-the-art single-agent visual navigation method, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based planning module, Multi-agent Spatial Planner (MSP). MSP leverages a transformer-based architecture, Spatial-TeamFormer, which effectively captures spatial relations and intra-agent interactions via hierarchical spatial self-attentions.

Reinforcement Learning (RL) Visual Navigation

Paper
Add Code

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

16 code implementations • 2 Mar 2021 • Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, Yi Wu

This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems.

Multi-agent Reinforcement Learning reinforcement-learning +3

2,663

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.