no code implementations • 11 Jan 2024 • Yuanzhao Zhai, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Ding Bo, Huaimin Wang
ORPO generates Optimistic model Rollouts for Pessimistic offline policy Optimization.
no code implementations • 20 Aug 2022 • Xudong Gong, Qinlin Feng, Yuan Zhang, Jiangling Qin, Weijie Ding, Biao Li, Peng Jiang, Kun Gai
However, as users continue to watch videos and feedback, the changing context leads the ranking of the server-side recommendation system inaccurate.
6 code implementations • RecSys 2020 • Hongyan Tang, Junning Liu, Ming Zhao, Xudong Gong
Moreover, through extensive experiments across SOTA MTL models, we have observed an interesting seesaw phenomenon that performance of one task is often improved by hurting the performance of some other tasks.