Search Results for author: Shiye Su

Found 1 papers, 1 papers with code

ALMANACS: A Simulatability Benchmark for Language Model Explainability

1 code implementation • 20 Dec 2023 • Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations.

Language Modelling

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.