no code implementations • 15 Feb 2024 • Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang
Then, based on this framework, we introduce the IBN to analyze generalization in the reward modeling stage of RLHF.
1 code implementation • 19 Oct 2023 • Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang
However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training.