Search Results for author: Xinbo Xu

Found 1 papers, 1 papers with code

Safe RLHF: Safe Reinforcement Learning from Human Feedback

1 code implementation • 19 Oct 2023 • Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training.

reinforcement-learning Safe Reinforcement Learning

1,165

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.