Search Results for author: Xinbo Xu

Found 1 papers, 1 papers with code

Safe RLHF: Safe Reinforcement Learning from Human Feedback

1 code implementation19 Oct 2023 Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training.

reinforcement-learning Safe Reinforcement Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.