Search Results for author: Max Kaufmann

Found 5 papers, 3 papers with code

Visibility into AI Agents

no code implementations • 23 Jan 2024 • Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks.

Informativeness

Paper
Add Code

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

2 code implementations • 21 Sep 2023 • Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans

If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A".

Data Augmentation Sentence

250

Paper
Code

Taken out of context: On measuring situational awareness in LLMs

1 code implementation • 1 Sep 2023 • Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans

At test time, we assess whether the model can pass the test.

Data Augmentation In-Context Learning

Paper
Code

Testing Robustness Against Unforeseen Adversaries

3 code implementations • 21 Aug 2019 • Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks.

Adversarial Defense Adversarial Robustness

Paper
Code

JMaxAlign: A Maximum Entropy Parallel Sentence Alignment Tool

no code implementations • COLING 2012 • Max Kaufmann

Machine Translation Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.