no code implementations • 29 May 2024 • Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han
To boost the learning loop, we propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
no code implementations • 29 May 2024 • Fengshuo Bai, Mingzhi Wang, Zhaowei Zhang, Boyuan Chen, Yinda Xu, Ying Wen, Yaodong Yang
This paper explores an efficient method for aligning black-box large models using smaller models, introducing a model-agnostic and lightweight Bayesian Persuasion Alignment framework.
no code implementations • 20 Feb 2024 • Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang, Haoyang Ye, Chengdong Ma, Yaodong Yang
The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety.
no code implementations • 30 Sep 2023 • Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang
We argue that truly understanding values in LLMs requires considering both "know what" and "know why".
no code implementations • 6 Jun 2023 • Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li
In this paper, we propose a novel zero-shot preference-based RL algorithm that leverages labeled preference data from source tasks to infer labels for target tasks, eliminating the requirement for human queries.