no code implementations • 26 Feb 2024 • Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, Jan Batzner, Hassan Sajjad, Frank Rudzicz
Approaches to aligning large language models (LLMs) with human values has focused on correcting misalignment that emerges from pretraining.