Search Results for author: Binyamin Rothberg

Found 1 papers, 1 papers with code

Tradeoffs Between Alignment and Helpfulness in Language Models with Representation Engineering

1 code implementation • 29 Jan 2024 • Yotam Wolf, Noam Wies, Dorin Shteyman, Binyamin Rothberg, Yoav Levine, Amnon Shashua

Second, we show that helpfulness is harmed quadratically with the norm of the representation engineering vector, while the alignment increases linearly with it, indicating a regime in which it is efficient to use representation engineering.

Language Modelling

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.