no code implementations • 22 Dec 2023 • Timo Kaufmann, Paul Weng, Viktor Bengs, Eyke Hüllermeier
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function.