Search Results for author: Jan Batzner

Immunization against harmful fine-tuning attacks

Approaches to aligning large language models (LLMs) with human values has focused on correcting misalignment that emerges from pretraining.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.